2604.18003v1 Apr 20, 2026 cs.AI

SELF-EMO: 인식에서 일관된 표현으로의 감정적 자기 발전

SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression

Mengya Gao

Citations: 14

h-index: 1

Faqiang Qian

Citations: 4

h-index: 1

Ziliang Wang

Citations: 42

h-index: 2

Kang An

Citations: 42

h-index: 2

Shaowei Zhang

Citations: 31

h-index: 2

Yan Chen

Citations: 9

h-index: 1

Yong Dai

Citations: 30

h-index: 2

Yichao Wu

Citations: 42

h-index: 2

대화에서의 감정 인식(ERC)은 인간 중심 상호 작용을 위한 거대 언어 모델(LLM)의 핵심 기능이 되었습니다. 정확한 인식 외에도 일관된 감정 표현 또한 매우 중요하지만, 양쪽 모두 고품질의 주석이 달린 데이터의 부족과 정적인 특성으로 인해 제한됩니다. 본 연구에서는 더 나은 감정 예측이 더 일관된 감정적 반응으로 이어진다는 가설을 바탕으로 하는 자기 발전 프레임워크인 SELF-EMO를 제안합니다. 우리는 감정 이해와 감정 표현이라는 두 가지 보조 작업을 도입하고, 모델이 감정 인식기와 대화 응답자 역할을 모두 수행하는 역할 기반의 자기 학습 패러다임을 설계했습니다. 반복적인 상호 작용을 통해 모델은 다양한 대화 흐름을 생성하여 확장 가능한 데이터 생성을 가능하게 합니다. 품질을 보장하기 위해, 우리는 후보 예측 및 응답을 부드러운 IoU 기반 보상을 사용하여 필터링하고, 선택된 샘플을 외부 감독 없이 지속적인 자기 개선을 위해 다시 제공하는 데이터 플라이휠 메커니즘을 채택했습니다. 또한, 멀티 라벨 정렬 보상 및 그룹 수준의 일관성 신호를 사용하여 최적화를 안정화하는 강화 학습 알고리즘인 SELF-GRPO를 개발했습니다. IEMOCAP, MELD 및 EmoryNLP 데이터셋에 대한 실험 결과, SELF-EMO는 Qwen3-4B에서 정확도가 +6.33%, Qwen3-8B에서 +8.54% 향상되는 최고 수준의 성능을 달성했으며, 강력한 효과와 일반화 능력을 입증했습니다.

Original Abstract

Emotion Recognition in Conversation (ERC) has become a fundamental capability for large language models (LLMs) in human-centric interaction. Beyond accurate recognition, coherent emotional expression is also crucial, yet both are limited by the scarcity and static nature of high-quality annotated data. In this work, we propose SELF-EMO, a self-evolution framework grounded in the hypothesis that better emotion prediction leads to more consistent emotional responses. We introduce two auxiliary tasks, emotional understanding and emotional expression, and design a role-based self-play paradigm where the model acts as both an emotion recognizer and a dialogue responder. Through iterative interactions, the model generates diverse conversational trajectories, enabling scalable data generation. To ensure quality, we adopt a data flywheel mechanism that filters candidate predictions and responses using a smoothed IoU-based reward and feeds selected samples back for continuous self-improvement without external supervision. We further develop SELF-GRPO, a reinforcement learning algorithm that stabilizes optimization with multi-label alignment rewards and group-level consistency signals. Experiments on IEMOCAP, MELD, and EmoryNLP show that SELF-EMO achieves state-of-the-art performance, improving accuracy by +6.33% on Qwen3-4B and +8.54% on Qwen3-8B, demonstrating strong effectiveness and generalization.

0 Citations

0 Influential

1 Altmetric

5.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!