2603.09723v1 Mar 10, 2026 cs.CL

RbtAct: 반론을 활용한 실질적인 검토 피드백 생성 방법

RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation

Manasi S. Patwardhan

Citations: 285

h-index: 9

Arman Cohan

Citations: 1,791

h-index: 23

Sihong Wu

Citations: 13

h-index: 2

Yilun Zhao

Citations: 5

h-index: 1

Tiansheng Hu

Citations: 14

h-index: 2

Owen Jiang

Citations: 0

h-index: 0

Yi Ma

Citations: 145

h-index: 4

최근 대규모 언어 모델(LLM)은 과학 연구 프로세스 전반에 걸쳐 활용되고 있으며, 동료 검토 보고서 작성에도 사용됩니다. 그러나 많은 AI가 생성한 검토 보고서는 피상적이며, 구체적인 실행 가능한 지침을 제공하지 못하여, 이 연구는 이러한 문제를 해결하고자 합니다. 본 연구에서는 RbtAct을 제안합니다. RbtAct은 실질적인 검토 피드백 생성에 중점을 두며, 기존의 동료 검토 반론을 학습의 핵심 요소로 활용합니다. 반론은 어떤 검토 의견이 구체적인 수정이나 계획으로 이어졌는지, 그리고 어떤 의견은 단순히 반박되었는지 보여줍니다. 이러한 통찰력을 바탕으로, 우리는 반론을 암묵적인 지도(implicit supervision)로 활용하여, 피드백 생성 모델을 직접적으로 최적화하여 실질성을 높입니다. 이를 지원하기 위해, 우리는 '관점 기반 세그먼트 수준 검토 피드백 생성'이라는 새로운 작업을 제안합니다. 이 작업에서는 모델이 전체 논문과 실험 또는 작성과 같은 특정 관점을 기반으로 단일, 집중된 의견을 생성해야 합니다. 또한, 우리는 'RMR-75K'라는 대규모 데이터셋을 구축했습니다. 이 데이터셋은 검토 세그먼트를 해당 세그먼트에 대한 반론 세그먼트와 연결하며, 관점 레이블과 영향 범주를 포함하여, 저자가 어떻게 반응했는지에 대한 정보를 제공합니다. 마지막으로, 우리는 Llama-3.1-8B-Instruct 모델을 검토 세그먼트에 대한 지도 학습 방식으로 훈련하고, 반론에서 파생된 쌍을 사용하여 선호도 최적화를 수행했습니다. 인간 전문가 및 LLM을 활용한 실험 결과, 제안하는 방법은 기존의 강력한 방법보다 실질성과 구체성 측면에서 일관성 있는 성능 향상을 보였으며, 동시에 논리적 타당성과 관련성을 유지했습니다.

Original Abstract

Large language models (LLMs) are increasingly used across the scientific workflow, including to draft peer-review reports. However, many AI-generated reviews are superficial and insufficiently actionable, leaving authors without concrete, implementable guidance and motivating the gap this work addresses. We propose RbtAct, which targets actionable review feedback generation and places existing peer review rebuttal at the center of learning. Rebuttals show which reviewer comments led to concrete revisions or specific plans, and which were only defended. Building on this insight, we leverage rebuttal as implicit supervision to directly optimize a feedback generator for actionability. To support this objective, we propose a new task called perspective-conditioned segment-level review feedback generation, in which the model is required to produce a single focused comment based on the complete paper and a specified perspective such as experiments and writing. We also build a large dataset named RMR-75K that maps review segments to the rebuttal segments that address them, with perspective labels and impact categories that order author uptake. We then train the Llama-3.1-8B-Instruct model with supervised fine-tuning on review segments followed by preference optimization using rebuttal derived pairs. Experiments with human experts and LLM-as-a-judge show consistent gains in actionability and specificity over strong baselines while maintaining grounding and relevance.

0 Citations

0 Influential

11.5 Altmetric

57.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!