2602.21857v1 Feb 25, 2026 cs.AI

향상된 주장 검증을 위한 증류 및 정렬 분해

Distill and Align Decomposition for Enhanced Claim Verification

Jabez Magomere

Citations: 117

h-index: 4

E. Kochkina

Citations: 1

h-index: 1

Samuel Mensah

University of Sheffield

Citations: 828

h-index: 12

Simerjot Kaur

Citations: 277

h-index: 7

Fernando Acero

Citations: 8

h-index: 1

Arturo Oncevay

Citations: 26

h-index: 3

Charese H. Smiley

Citations: 126

h-index: 6

Manuela Veloso

Citations: 11

h-index: 1

Xiaomo Liu

Citations: 54

h-index: 5

복잡한 주장 검증은 문장을 검증 가능한 하위 주장으로 분해하는 것을 필요로 하지만, 기존 방법들은 분해 품질과 검증 성능의 일관성을 확보하는 데 어려움을 겪습니다. 본 논문에서는 그룹 상대 정책 최적화(Group Relative Policy Optimization, GRPO)를 사용하여 분해 품질과 검증기 정렬을 동시에 최적화하는 강화 학습(Reinforcement Learning, RL) 접근 방식을 제안합니다. 제안하는 방법은 다음과 같은 요소를 통합합니다: (i) 구조화된 순차적 추론; (ii) 교사-증류된 예제를 활용한 지도 미세 조정; 및 (iii) 형식 준수, 검증기 정렬, 분해 품질을 균형 있게 반영하는 다중 목적 보상 체계. 6가지 평가 환경에서 학습된 80억 매개변수 분해 모델은 71.75%의 macro-F1 점수를 달성하여 하위 프롬프트 기반 접근 방식보다 (+1.99, +6.24) 성능이 뛰어나며, 기존 RL 방법보다 (+5.84) 우수한 성능을 보였습니다. 인간 평가 결과, 생성된 하위 주장의 높은 품질이 확인되었습니다. 본 프레임워크는 검증 정확도와 분해 품질을 동시에 최적화함으로써, 더 작은 언어 모델이 최첨단 수준의 주장 검증 성능을 달성할 수 있도록 합니다.

Original Abstract

Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment using Group Relative Policy Optimization (GRPO). Our method integrates: (i) structured sequential reasoning; (ii) supervised finetuning on teacher-distilled exemplars; and (iii) a multi-objective reward balancing format compliance, verifier alignment, and decomposition quality. Across six evaluation settings, our trained 8B decomposer improves downstream verification performance to (71.75%) macro-F1, outperforming prompt-based approaches ((+1.99), (+6.24)) and existing RL methods ((+5.84)). Human evaluation confirms the high quality of the generated subclaims. Our framework enables smaller language models to achieve state-of-the-art claim verification by jointly optimising for verification accuracy and decomposition quality.

1 Citations

0 Influential

6 Altmetric

31.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!