2603.16463v1 Mar 17, 2026 cs.AI

단서를 따라, 진실을 규명하다: 개방형 어휘 다중 모드 감정 인식에서의 하이브리드 증거 기반 연역 추론

Follow the Clues, Frame the Truth: Hybrid-evidential Deductive Reasoning in Open-Vocabulary Multimodal Emotion Recognition

Yu Liu

Citations: 7

h-index: 2

Lei Zhang

Citations: 122

h-index: 4

Haoxun Li

Citations: 7

h-index: 2

Hanlei Shi

Citations: 1

h-index: 1

Yuxuan Ding

Citations: 0

h-index: 0

Leyuan Qu

Citations: 193

h-index: 10

Taihao Li

Citations: 4

h-index: 1

개방형 어휘 다중 모드 감정 인식(OV-MER)은 모호한 다중 모드 단서로 인해 본질적으로 어려운 과제이며, 이러한 단서는 종종 관찰되지 않은 다양한 상황적 요인에서 비롯됩니다. 다중 모드 거대 언어 모델(MLLM)은 광범위한 의미론적 커버리지를 제공하지만, 종종 지배적인 데이터 사전 지식에 너무 의존하여 최적의 결과를 내지 못하며, 다양한 모드에서 중요한 상호 보완적인 감정적 단서를 간과하게 됩니다. 우리는 효과적인 감정 추론이 단순한 연관성 이상을 요구한다고 주장합니다. 즉, 다양한 잠재적 관점에서 관찰된 정보를 조화시키는 여러 증거 기반의 정당성을 종합하여 미묘한 감정 상태를 재구성해야 합니다. 우리는 제안-검증-결정 프로토콜을 통해 추론을 공식화하는 하이브리드 증거 기반 연역 추론 아키텍처인 HyDRA를 소개합니다. 이러한 추론 과정을 내부화하기 위해, 우리는 계층적 보상 형상을 사용한 강화 학습을 활용하여 추론 경로를 최종 작업 성능과 일치시키고, 관찰된 다중 모드 단서를 가장 잘 조화하도록 합니다. 체계적인 평가를 통해 우리의 설계 선택이 검증되었으며, HyDRA는 강력한 기본 모델보다 일관되게 우수한 성능을 보였으며, 특히 모호하거나 충돌적인 시나리오에서 뛰어난 성능을 보였습니다. 또한, HyDRA는 해석 가능하고 진단적인 증거 추적을 제공합니다.

Original Abstract

Open-Vocabulary Multimodal Emotion Recognition (OV-MER) is inherently challenging due to the ambiguity of equivocal multimodal cues, which often stem from distinct unobserved situational dynamics. While Multimodal Large Language Models (MLLMs) offer extensive semantic coverage, their performance is often bottlenecked by premature commitment to dominant data priors, resulting in suboptimal heuristics that overlook crucial, complementary affective cues across modalities. We argue that effective affective reasoning requires more than surface-level association; it necessitates reconstructing nuanced emotional states by synthesizing multiple evidence-grounded rationales that reconcile these observations from diverse latent perspectives. We introduce HyDRA, a Hybrid-evidential Deductive Reasoning Architecture that formalizes inference as a Propose-Verify-Decide protocol. To internalize this abductive process, we employ reinforcement learning with hierarchical reward shaping, aligning the reasoning trajectories with final task performance to ensure they best reconcile the observed multimodal cues. Systematic evaluations validate our design choices, with HyDRA consistently outperforming strong baselines--especially in ambiguous or conflicting scenarios--while providing interpretable, diagnostic evidence traces.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!