2604.04174v1 Apr 05, 2026 cs.AI

CoALFake: 인간-LLM 공동 어노테이션을 활용한 능동 학습 기반의 다중 도메인 가짜 뉴스 탐지

CoALFake: Collaborative Active Learning with Human-LLM Co-Annotation for Cross-Domain Fake News Detection

D. Sallami

Citations: 150

h-index: 7

E. Aimeur

Citations: 155

h-index: 6

Gilles Brassard

Citations: 635

h-index: 3

다양한 분야에서 가짜 뉴스의 확산은 현재의 탐지 시스템의 심각한 한계를 드러냅니다. 이러한 시스템은 종종 좁은 도메인 특이성을 보이며 일반화 능력이 부족합니다. 기존의 다중 도메인 접근 방식은 다음과 같은 두 가지 주요 과제에 직면합니다. (1) 레이블이 지정된 데이터에 대한 의존성이 높으며, 이는 종종 부족하며 획득하는 데 많은 자원이 필요하고, (2) 경직된 도메인 분류 또는 도메인별 특성을 간과하여 정보 손실이 발생합니다. 이러한 문제를 해결하기 위해, 우리는 인간-대규모 언어 모델(LLM) 공동 어노테이션과 도메인 인지 능동 학습(AL)을 통합한, 다중 도메인 가짜 뉴스 탐지를 위한 새로운 접근 방식인 CoALFake를 제안합니다. 우리의 방법은 LLM을 사용하여 확장 가능하고 저렴한 어노테이션을 수행하면서, 인간의 감독을 유지하여 레이블의 신뢰성을 보장합니다. CoALFake는 도메인 임베딩 기술을 통합하여, 도메인별 미묘한 차이점과 다중 도메인 패턴을 동적으로 포착하여, 도메인에 독립적인 모델을 학습할 수 있도록 합니다. 또한, 도메인 인지 샘플링 전략은 다양한 도메인 범위를 우선적으로 고려하여 샘플 획득을 최적화합니다. 여러 데이터 세트에 대한 실험 결과는 제안된 접근 방식이 다양한 기준 성능보다 일관되게 우수함을 보여줍니다. 우리의 결과는 인간-LLM 공동 어노테이션이 높은 비용 효율성을 제공하며 뛰어난 성능을 달성할 수 있음을 강조합니다. 여러 데이터 세트에 대한 평가 결과, CoALFake는 최소한의 인간 감독으로도 기존의 다양한 기준 성능보다 일관되게 우수한 성능을 보였습니다.

Original Abstract

The proliferation of fake news across diverse domains highlights critical limitations in current detection systems, which often exhibit narrow domain specificity and poor generalization. Existing cross-domain approaches face two key challenges: (1) reliance on labelled data, which is frequently unavailable and resource intensive to acquire and (2) information loss caused by rigid domain categorization or neglect of domain-specific features. To address these issues, we propose CoALFake, a novel approach for cross-domain fake news detection that integrates Human-Large Language Model (LLM) co-annotation with domain-aware Active Learning (AL). Our method employs LLMs for scalable, low-cost annotation while maintaining human oversight to ensure label reliability. By integrating domain embedding techniques, the CoALFake dynamically captures both domain specific nuances and cross-domain patterns, enabling the training of a domain agnostic model. Furthermore, a domain-aware sampling strategy optimizes sample acquisition by prioritizing diverse domain coverage. Experimental results across multiple datasets demonstrate that the proposed approach consistently outperforms various baselines. Our results emphasize that human-LLM co-annotation is a highly cost-effective approach that delivers excellent performance. Evaluations across several datasets show that CoALFake consistently outperforms a range of existing baselines, even with minimal human oversight.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!