2603.25158v1 Mar 26, 2026 cs.AI

Trace2Skill: 실행 경로 정보를 활용하여 전이 가능한 에이전트 기술을 구축하는 방법

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Jingwei Ni

ETH Zürich

Citations: 404

h-index: 10

Mengyu Zhou

Citations: 6

h-index: 2

Xiaoxi Jiang

Citations: 39

h-index: 3

Guanjun Jiang

Citations: 37

h-index: 3

Xinpeng Liu

Citations: 57

h-index: 2

Yutao Sun

Citations: 16

h-index: 2

Yihao Liu

Citations: 19

h-index: 3

Pengyu Cheng

Citations: 30

h-index: 4

Dexin Wang

Citations: 4

h-index: 1

대규모 언어 모델(LLM) 에이전트에게 특정 분야의 기술을 부여하는 것은 복잡한 작업을 해결하는 데 매우 중요합니다. 그러나 수동으로 기술을 제작하는 것은 심각한 확장성 문제를 야기합니다. 반면, 자동화된 기술 생성은 종종 불안정하거나 단편적인 결과를 초래하는데, 이는 피상적인 파라미터 지식에 의존하거나 순차적으로 일반화되지 않는 실행 경로에 국한된 정보를 과적합하기 때문입니다. 이러한 문제를 해결하기 위해, 우리는 Trace2Skill이라는 프레임워크를 제안합니다. Trace2Skill은 인간 전문가가 기술을 작성하는 방식을 모방합니다. 즉, 개별 실행 경로에 대한 반응이 아닌, 전체적인 실행 경험을 분석한 후 이를 하나의 포괄적인 가이드로 요약합니다. Trace2Skill은 다양한 실행 경로를 분석하기 위해 여러 개의 하위 에이전트를 병렬로 활용합니다. 이를 통해 실행 경로별로 얻은 지식을 추출하고, 유도적 추론을 통해 계층적으로 통합하여 충돌 없는 통합 기술 저장소를 구축합니다. Trace2Skill은 기존의 인간이 작성한 기술을 개선하는 것뿐만 아니라, 처음부터 새로운 기술을 생성하는 데에도 활용될 수 있습니다. 스프레드시트, VisionQA, 수학적 추론 등 어려운 분야에서의 실험 결과는 Trace2Skill이 Anthropic의 공식 xlsx 기술을 포함한 강력한 기준 모델보다 성능이 뛰어나다는 것을 보여줍니다. 더욱 중요한 점은, 이러한 실행 경로 기반의 진화는 단순히 작업 인스턴스나 모델별 특성을 암기하는 것이 아니라, LLM의 규모에 상관없이 전이 가능하며, 일반화된 환경에서도 작동합니다. 예를 들어, Qwen3.5-35B 모델이 자체 실행 경로를 통해 학습한 기술은 Qwen3.5-122B 에이전트의 WikiTableQuestions 성능을 최대 57.65% 향상시켰습니다. 궁극적으로, 우리의 연구 결과는 복잡한 에이전트 경험을 매우 전이 가능한 선언적 기술로 패키징할 수 있음을 보여줍니다. 이는 파라미터 업데이트가 필요 없으며, 외부 검색 모듈이 필요 없고, 350억 개의 파라미터로 구성된 오픈 소스 모델을 사용할 수 있습니다.

Original Abstract

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

5 Citations

1 Influential

5 Altmetric

32.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!