2604.13398v1 Apr 15, 2026 cs.CL

예측에서 정당화로: 강화 학습을 통한 감성 추론과 인간의 근거 사이의 일치

From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning

Jie Zhou

Citations: 18

h-index: 3

Liang He

Citations: 476

h-index: 13

Yulan Wu

Citations: 20

h-index: 2

Liyang Yu

Citations: 10

h-index: 2

Liang Dou

Citations: 27

h-index: 1

Shihao Zhang

Citations: 8

h-index: 2

Ziwei Wang

Citations: 29

h-index: 3

Qin Chen

Citations: 1

h-index: 1

Zhikai Lei

Citations: 208

h-index: 4

측면 기반 감성 분석(ABSA) 시스템은 감성 극성을 식별하는 데 높은 정확도를 달성했지만, 종종 인간의 감정 인지 특성과 같은 명시적인 추론 능력이 부족한 '블랙 박스'로 작동합니다. 인간은 단순히 감성을 분류하는 것이 아니라, 자신의 판단에 대한 인과적 설명을 구성합니다. 이러한 간극을 해소하기 위해, 우리는 '예측 이전에 이유를 먼저 생각한다'는 인간의 인지 과정을 모방하도록 설계된 대규모 언어 모델 프레임워크인 ABSA-R1을 제안합니다. 강화 학습(RL)을 활용하여 ABSA-R1은 감성 예측의 이유를 설명하고, 자연어 설명을 생성하여 예측을 뒷받침하도록 학습합니다. 우리는 생성된 추론 경로와 최종 감정 레이블 간의 일관성을 유지하는 '인지 정렬 보상 모델'(이전의 감성 인식 보상 모델)을 도입했습니다. 또한, 메타인지적 모니터링에서 영감을 받아, 모델의 내부 추론이 불확실하거나 일관되지 않은 어려운 경우를 선택적으로 타겟팅하는 성능 기반의 거부 샘플링 전략을 구현했습니다. 네 가지 벤치마크에 대한 실험 결과는 모델에 명시적인 추론 능력을 부여하면 해석 가능성을 향상시킬 뿐만 아니라, 추론 능력이 없는 기준 모델보다 감성 분류 및 삼중항 추출 성능이 우수하다는 것을 보여줍니다.

Original Abstract

While Aspect-based Sentiment Analysis (ABSA) systems have achieved high accuracy in identifying sentiment polarities, they often operate as "black boxes," lacking the explicit reasoning capabilities characteristic of human affective cognition. Humans do not merely categorize sentiment; they construct causal explanations for their judgments. To bridge this gap, we propose ABSA-R1, a large language model framework designed to mimic this ``reason-before-predict" cognitive process. By leveraging reinforcement learning (RL), ABSA-R1 learns to articulate the why behind the what, generating natural language justifications that ground its sentiment predictions. We introduce a Cognition-Aligned Reward Model (formerly sentiment-aware reward model) that enforces consistency between the generated reasoning path and the final emotional label. Furthermore, inspired by metacognitive monitoring, we implement a performance-driven rejection sampling strategy that selectively targets hard cases where the model's internal reasoning is uncertain or inconsistent. Experimental results on four benchmarks demonstrate that equipping models with this explicit reasoning capability not only enhances interpretability but also yields superior performance in sentiment classification and triplet extraction compared to non-reasoning baselines.

0 Citations

0 Influential

6.5 Altmetric

32.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!