2604.10504v1 Apr 12, 2026 cs.AI

CARO: 연쇄적 유추 추론 최적화를 통한 강력한 콘텐츠 관리

CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Yuchen Mou

Citations: 4

h-index: 1

Bingzhe Wu

Citations: 35

h-index: 3

Haotian Lu

Citations: 3

h-index: 1

현재의 대규모 언어 모델(LLM), 특히 추론에 특화된 모델조차도, 문맥에 내재된 오해를 불러일으키는 "단순화된 의사 결정 방식"으로 인해 모호한 콘텐츠 관리 사례에서 어려움을 겪는 경우가 많습니다. 인지 심리학에서 전문가의 콘텐츠 관리 방식을 분석한 결과를 바탕으로, 본 논문에서는 LLM에서 강력한 유추적 추론 능력을 유도하기 위한 새로운 두 단계 훈련 프레임워크인 exttt{CARO} (Chain-of-Analogy Reasoning Optimization)를 제안합니다. 첫째, exttt{CARO}는 검색 증강 생성(RAG)을 사용하여 콘텐츠 관리 데이터에서 유추적 추론 체인을 구축하고, 지도 학습(SFT)을 수행합니다. 둘째, 유추적 추론 행동을 명시적으로 강화하기 위해 맞춤형 직접 선호도 최적화(DPO) 방식을 제안합니다. exttt{CARO}는 정적인 검색 방법과 달리, 추론 과정에서 상황에 맞는 유추 레퍼런스를 동적으로 생성하여 유해한 의사 결정 방식을 효과적으로 완화합니다. 광범위한 실험 결과, exttt{CARO}는 최첨단 추론 모델(DeepSeek R1, QwQ), 특화된 콘텐츠 관리 모델(LLaMA Guard), 그리고 고급 미세 조정 및 검색 증강 방법을 능가하며, 어려운 모호한 콘텐츠 관리 벤치마크에서 평균 F1 점수가 24.9% 향상되는 것을 확인했습니다.

Original Abstract

Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading "decision shortcuts" embedded in context. Inspired by cognitive psychology insights into expert moderation, we introduce \caro (Chain-of-Analogy Reasoning Optimization), a novel two-stage training framework to induce robust analogical reasoning in LLMs. First, \caro bootstraps analogical reasoning chains via retrieval-augmented generation (RAG) on moderation data and performs supervised fine-tuning (SFT). Second, we propose a customized direct preference optimization (DPO) approach to reinforce analogical reasoning behaviors explicitly. Unlike static retrieval methods, \caro dynamically generates tailored analogical references during inference, effectively mitigating harmful decision shortcuts. Extensive experiments demonstrate that \caro substantially outperforms state-of-the-art reasoning models (DeepSeek R1, QwQ), specialized moderation models (LLaMA Guard), and advanced fine-tuning and retrieval-augmented methods, achieving an average F1 score improvement of 24.9\% on challenging ambiguous moderation benchmarks.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!