2605.02011v1 May 03, 2026 cs.CL

에이전트 기반 법률 정보 수집 및 평가 기준 기반 최적화를 통한 판결문 생성 성능 향상

Enhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimization

Yiqun Liu

Citations: 1,716

h-index: 22

Qingyao Ai

Citations: 1,764

h-index: 22

Weihang Su

Citations: 797

h-index: 18

Yue Wu

Citations: 295

h-index: 6

Xuanyi Chen

Citations: 1

h-index: 1

판결문 자동 작성은 사법 효율성 향상에 매우 중요하지만, 포괄적인 법률 정보 검색과 엄격한 논리적 추론이라는 이중적인 요구 사항 때문에 여전히 어려운 과제입니다. 기존의 대부분의 접근 방식은 표준적인 검색 증강 생성(Retrieval-Augmented Generation) 및 지도 학습(Supervised Fine-Tuning)에 의존하지만, 종종 충분하지 못한 증거 검색, 환각적인 법률 조항 인용, 그리고 논리적으로 결함이 있는 법률 추론 문제를 안고 있습니다. 이러한 격차를 해소하기 위해, 우리는 LLM 기반 판결문 생성 성능을 향상시키기 위해 법률 정보 수집과 판결문 생성을 동시에 개선하는 통합 프레임워크인 Judge-R1을 제안합니다. 첫째, 동적인 계획 에이전트를 활용하여 여러 소스에서 정확한 법률 조항과 판례를 검색하는 에이전트 기반 법률 정보 수집(Agentic Legal Information Collection)을 도입합니다. 둘째, 그룹 상대 정책 최적화(Group Relative Policy Optimization, GRPO)를 사용하여 포괄적인 법률 보상 함수를 활용한 강화 학습 단계를 구현하여, 판결 기준 및 추론 논리에 대한 준수를 강화하는 평가 기준 기반 최적화(Rubric-Guided Optimization)를 구현합니다. JuDGE 벤치마크에 대한 광범위한 실험 결과는 Judge-R1이 법률 정확성과 생성 품질 측면에서 최첨단 모델보다 현저히 우수한 성능을 보임을 보여줍니다.

Original Abstract

Automating the drafting of judgment documents is pivotal to judicial efficiency, yet it remains challenging due to the dual requirements of comprehensive retrieval of legal information and rigorous logical reasoning. Existing approaches, typically relying on standard Retrieval-Augmented Generation and Supervised Fine-Tuning, often suffer from insufficient evidence recall, hallucinated statutory references, and logically flawed legal reasoning. To bridge this gap, we propose Judge-R1, a unified framework designed to enhance LLM-based judgment document generation by jointly improving legal information collection and judgment document generation. First, we introduce Agentic Legal Information Collection, which employs a dynamic planning agent to retrieve precise statutes and precedents from multiple sources. Second, we implement Rubric-Guided Optimization, a reinforcement learning phase utilizing Group Relative Policy Optimization (GRPO) with a comprehensive legal reward function to enforce adherence to judicial standards and reasoning logic. Extensive experiments on the JuDGE benchmark demonstrate that Judge-R1 significantly outperforms state-of-the-art baselines in both legal accuracy and generation quality.

3 Citations

0 Influential

11 Altmetric

58.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!