2602.18137v1 Feb 20, 2026 cs.CL

도메인 특화 LLM 개선을 위한 에이전트 기반 적대적 질의응답

Agentic Adversarial QA for Improving Domain-Specific LLMs

Vincent Grari

Citations: 264

h-index: 8

Ciprian Tomoiagă

Citations: 11

h-index: 2

Sylvain Lamprier

Citations: 25

h-index: 4

Tatsunori Hashimoto

Citations: 285

h-index: 5

Marcin Detyniecki

Citations: 31

h-index: 4

대형 언어 모델(LLM)은 방대한 인터넷 말뭉치에 대한 광범위한 사전 학습에도 불구하고, 특화된 도메인에 효과적으로 적응하는 데 종종 어려움을 겪는다. 이러한 도메인에 맞춰 모델을 미세조정하려는 관심이 커지고 있으나, 고품질의 작업 관련 데이터가 부족하고 적용 범위가 제한적이어서 그 발전이 제약받고 있다. 이를 해결하기 위해 패러프레이징이나 지식 추출과 같은 합성 데이터 생성 방법이 흔히 적용된다. 이러한 접근법은 사실적 회상과 개념적 지식 측면에서는 뛰어나지만, 다음 두 가지 치명적인 단점을 지닌다. 첫째, 특화된 도메인에서의 해석적 추론 능력에 대한 지원이 미미하다. 둘째, 종종 지나치게 방대하고 중복된 합성 말뭉치를 생성하여 표본 효율성을 떨어뜨린다. 이러한 한계를 극복하기 위해, 우리는 의미론적으로 까다로운 질문들로 구성된 콤팩트한 세트를 생성하는 적대적 질문 생성 프레임워크를 제안한다. 이 질문들은 이해도의 격차를 드러내고 해결하도록 설계된 반복적이고 피드백 중심적인 프로세스를 통해, 적응 대상 모델과 참조 문서에 기반한 강력한 전문가 모델의 출력을 비교함으로써 구성된다. LegalBench 말뭉치의 특화된 하위 집합에 대한 평가 결과, 우리의 방법은 훨씬 적은 수의 합성 표본으로도 더 높은 정확도를 달성함을 입증한다.

Original Abstract

Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is constrained by the scarcity and limited coverage of high-quality, task-relevant data. To address this, synthetic data generation methods such as paraphrasing or knowledge extraction are commonly applied. Although these approaches excel at factual recall and conceptual knowledge, they suffer from two critical shortcomings: (i) they provide minimal support for interpretive reasoning capabilities in these specialized domains, and (ii) they often produce synthetic corpora that are excessively large and redundant, resulting in poor sample efficiency. To overcome these gaps, we propose an adversarial question-generation framework that produces a compact set of semantically challenging questions. These questions are constructed by comparing the outputs of the model to be adapted and a robust expert model grounded in reference documents, using an iterative, feedback-driven process designed to reveal and address comprehension gaps. Evaluation on specialized subsets of the LegalBench corpus demonstrates that our method achieves greater accuracy with substantially fewer synthetic samples.

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!