2601.19697v1 Jan 27, 2026 cs.SE

AlignCoder: 대상 의도를 고려한 검색을 통해 리포지토리 수준의 코드 자동 완성

AlignCoder: Aligning Retrieval with Target Intent for Repository-Level Code Completion

Jiachi Chen

Citations: 1,898

h-index: 22

Ensheng Shi

Citations: 1,039

h-index: 16

Yuchi Ma

Citations: 599

h-index: 13

Tianyue Jiang

Citations: 46

h-index: 3

Yanlin Wang

Citations: 651

h-index: 13

Daya Guo

Citations: 140

h-index: 4

Zibin Zheng

Citations: 1,144

h-index: 20

기존의 코드 대규모 언어 모델(LLM)은 리포지토리별 맥락과 전문 지식에 대한 이해 부족으로 인해 리포지토리 수준의 코드 자동 완성 작업에서 어려움을 겪고 있습니다. 검색 증강 생성(RAG) 방식은 관련 코드 스니펫을 파일 간 맥락으로 가져오는 방식으로 유망한 결과를 보여주었지만, 검색 과정에서 쿼리와 대상 코드 간의 불일치, 그리고 기존 검색 방법이 추론 정보를 효과적으로 활용하지 못하는 두 가지 근본적인 문제가 있습니다. 이러한 문제를 해결하기 위해, 우리는 쿼리 향상 메커니즘과 강화 학습 기반 검색기 훈련 방법을 도입한 리포지토리 수준의 코드 자동 완성 프레임워크인 AlignCoder를 제안합니다. 우리의 접근 방식은 초기 쿼리와 대상 코드 간의 의미 격차를 해소하는 향상된 쿼리를 생성하기 위해 여러 후보 완성 결과를 활용합니다. 또한, 강화 학습을 사용하여 향상된 쿼리에 포함된 추론 정보를 활용하여 더욱 정확한 검색을 수행하도록 AlignRetriever를 훈련합니다. 우리는 두 가지 널리 사용되는 벤치마크(CrossCodeEval 및 RepoEval)에서 다섯 가지 주요 코드 LLM을 사용하여 AlignCoder를 평가한 결과, CrossCodeEval 벤치마크에서 기준 모델보다 EM 점수가 18.1% 향상되었습니다. 결과는 우리의 프레임워크가 우수한 성능을 달성하며 다양한 코드 LLM 및 프로그래밍 언어에 대한 높은 일반화 능력을 보여준다는 것을 나타냅니다.

Original Abstract

Repository-level code completion remains a challenging task for existing code large language models (code LLMs) due to their limited understanding of repository-specific context and domain knowledge. While retrieval-augmented generation (RAG) approaches have shown promise by retrieving relevant code snippets as cross-file context, they suffer from two fundamental problems: misalignment between the query and the target code in the retrieval process, and the inability of existing retrieval methods to effectively utilize the inference information. To address these challenges, we propose AlignCoder, a repository-level code completion framework that introduces a query enhancement mechanism and a reinforcement learning based retriever training method. Our approach generates multiple candidate completions to construct an enhanced query that bridges the semantic gap between the initial query and the target code. Additionally, we employ reinforcement learning to train an AlignRetriever that learns to leverage inference information in the enhanced query for more accurate retrieval. We evaluate AlignCoder on two widely-used benchmarks (CrossCodeEval and RepoEval) across five backbone code LLMs, demonstrating an 18.1% improvement in EM score compared to baselines on the CrossCodeEval benchmark. The results show that our framework achieves superior performance and exhibits high generalizability across various code LLMs and programming languages.

2 Citations

0 Influential

11 Altmetric

57.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!