2601.07178v1 Jan 12, 2026 cs.CV

DIVER: 다중 모달 가짜 뉴스 탐지를 위한 동적 반복 시각 증거 추론

DIVER: Dynamic Iterative Visual Evidence Reasoning for Multimodal Fake News Detection

Chunlei Meng

Citations: 0

h-index: 0

Quanchen Zou

Citations: 144

h-index: 5

Deyue Zhang

Citations: 133

h-index: 4

Xiangzheng Zhang

Citations: 12

h-index: 2

Weilin Zhou

Citations: 0

h-index: 0

Zonghao Ying

Citations: 10

h-index: 2

Jiahui Liu

Citations: 32

h-index: 3

Hengyang Zhou

Citations: 37

h-index: 3

Dongdong Yang

Citations: 58

h-index: 2

다중 모달 가짜 뉴스 탐지는 악의적인 허위 정보 확산을 완화하는 데 매우 중요합니다. 기존 방법들은 정적인 융합 또는 LLM에 의존하며, 시각적 기반이 약하기 때문에 계산적 중복과 환각 위험에 직면합니다. 이러한 문제를 해결하기 위해, 우리는 점진적이고 증거 기반의 추론 패러다임을 기반으로 하는 프레임워크인 DIVER (Dynamic Iterative Visual Evidence Reasoning)를 제안합니다. DIVER는 먼저 언어 분석을 통해 강력한 텍스트 기반 기준을 확립하고, 모달 내 일관성을 활용하여 신뢰할 수 없거나 환각된 주장을 필터링합니다. 텍스트 증거가 불충분한 경우에만 프레임워크는 시각 정보를 도입하며, 모달 간 정렬 검증을 통해 더 깊은 시각적 검사가 필요한지 여부를 적응적으로 결정합니다. 뚜렷한 모달 간 의미 차이를 보이는 샘플의 경우, DIVER는 선택적으로 세분화된 시각 도구(예: OCR 및 밀집 캡셔닝)를 사용하여 작업과 관련된 증거를 추출하고, 불확실성을 고려한 융합을 통해 이러한 증거를 반복적으로 통합하여 다중 모달 추론을 개선합니다. Weibo, Weibo21, 및 GossipCop 데이터셋에 대한 실험 결과, DIVER는 최첨단 기준 모델보다 평균 2.72% 더 우수한 성능을 보였으며, 지연 시간을 4.12초 단축하여 추론 효율성을 최적화했습니다.

Original Abstract

Multimodal fake news detection is crucial for mitigating adversarial misinformation. Existing methods, relying on static fusion or LLMs, face computational redundancy and hallucination risks due to weak visual foundations. To address this, we propose DIVER (Dynamic Iterative Visual Evidence Reasoning), a framework grounded in a progressive, evidence-driven reasoning paradigm. DIVER first establishes a strong text-based baseline through language analysis, leveraging intra-modal consistency to filter unreliable or hallucinated claims. Only when textual evidence is insufficient does the framework introduce visual information, where inter-modal alignment verification adaptively determines whether deeper visual inspection is necessary. For samples exhibiting significant cross-modal semantic discrepancies, DIVER selectively invokes fine-grained visual tools (e.g., OCR and dense captioning) to extract task-relevant evidence, which is iteratively aggregated via uncertainty-aware fusion to refine multimodal reasoning. Experiments on Weibo, Weibo21, and GossipCop demonstrate that DIVER outperforms state-of-the-art baselines by an average of 2.72\%, while optimizing inference efficiency with a reduced latency of 4.12 s.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!