2604.24594v1 Apr 27, 2026 cs.CL

에이전트형 AI를 위한 기술 검색 증강

Skill Retrieval Augmentation for Agentic AI

Yiqun Liu

Citations: 1,716

h-index: 22

Qingyao Ai

Citations: 1,764

h-index: 22

Yiteng Tu

Citations: 76

h-index: 3

Weihang Su

Citations: 797

h-index: 18

Changyue Wang

Citations: 325

h-index: 11

Jianming Long

Citations: 54

h-index: 3

Yichen Tang

Citations: 199

h-index: 6

대규모 언어 모델(LLM)이 문제 해결 에이전트로 발전함에 따라, 자체적인 파라미터 기반 능력을 넘어서는 작업을 처리하기 위해 외부에서 가져온 재사용 가능한 기술에 대한 의존도가 높아지고 있습니다. 기존 에이전트 시스템에서 기술을 통합하는 주요 전략은 사용 가능한 기술을 컨텍스트 창 내에 명시적으로 나열하는 것입니다. 그러나 이 전략은 확장성이 떨어집니다. 기술 데이터베이스가 확장됨에 따라 컨텍스트 용량이 빠르게 소모되고, 에이전트는 적절한 기술을 식별하는 정확도가 현저히 떨어집니다. 이에 본 논문에서는 Skill Retrieval Augmentation (SRA), 즉 에이전트가 필요에 따라 대규모 외부 기술 데이터베이스에서 관련 기술을 동적으로 검색, 통합 및 적용하는 새로운 패러다임을 제안합니다. 이 문제를 측정 가능하게 만들기 위해, 대규모 기술 데이터베이스를 구축하고 SRA 파이프라인의 전체적인 평가를 위한 최초의 벤치마크인 SRA-Bench를 소개합니다. SRA-Bench는 5,400개의 능동적인 테스트 인스턴스와 636개의 수동으로 제작된 정답 기술을 포함하며, 이들은 웹에서 수집된 주의를 분산시키는 기술과 혼합되어 총 26,262개의 기술 데이터베이스를 형성합니다. 광범위한 실험 결과, 검색 기반 기술 증강이 에이전트 성능을 크게 향상시킬 수 있으며, 이는 SRA 패러다임의 잠재력을 입증합니다. 동시에, 기술 통합에 있어 근본적인 격차가 존재한다는 것을 발견했습니다. 현재 LLM 에이전트는 정답 기술이 검색되었는지, 또는 작업이 실제로 외부 능력을 필요로 하는지 여부에 관계없이 유사한 속도로 기술을 로드하는 경향이 있습니다. 이는 기술 증강의 병목 현상이 검색뿐만 아니라 기본 모델의 어떤 기술을 로드할지, 그리고 외부 로드가 실제로 필요한 시점을 결정하는 능력에 있다는 것을 보여줍니다. 이러한 연구 결과는 SRA를 별도의 연구 문제로 규정하고, 향후 에이전트 시스템에서 기능 확장을 위한 기반을 마련합니다.

Original Abstract

As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks beyond their native parametric capabilities. In existing agent systems, the dominant strategy for incorporating skills is to explicitly enumerate available skills within the context window. However, this strategy fails to scale: as skill corpora expand, context budgets are consumed rapidly, and the agent becomes markedly less accurate in identifying the right skill. To this end, this paper formulates Skill Retrieval Augmentation (SRA), a new paradigm in which agents dynamically retrieve, incorporate, and apply relevant skills from large external skill corpora on demand. To make this problem measurable, we construct a large-scale skill corpus and introduce SRA-Bench, the first benchmark for decomposed evaluation of the full SRA pipeline, covering skill retrieval, skill incorporation, and end-task execution. SRA-Bench contains 5,400 capability-intensive test instances and 636 manually constructed gold skills, which are mixed with web-collected distractor skills to form a large-scale corpus of 26,262 skills. Extensive experiments show that retrieval-based skill augmentation can substantially improve agent performance, validating the promise of the paradigm. At the same time, we uncover a fundamental gap in skill incorporation: current LLM agents tend to load skills at similar rates, regardless of whether a gold skill is retrieved or whether the task actually requires external capabilities. This shows that the bottleneck in skill augmentation lies not only in retrieval but also in the base model's ability to determine which skill to load and when external loading is actually needed. These findings position SRA as a distinct research problem and establish a foundation for the scalable augmentation of capabilities in future agent systems.

14 Citations

2 Influential

11 Altmetric

73.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!