2601.15301v2 Jan 09, 2026 cs.CL

LLM 탐지기의 신뢰성은 얼마나 될까?

Can We Trust LLM Detectors?

Jivnesh Sandhan

Citations: 1

h-index: 1

Harsh Jaiswal

Citations: 5

h-index: 1

Fei Cheng

Graduate School of Informatics, Kyoto University

Citations: 909

h-index: 15

Yugo Murawaki

Kyoto University

Citations: 494

h-index: 11

LLM의 급속한 확산은 신뢰할 수 있는 AI 텍스트 탐지의 필요성을 증가시켰지만, 기존 탐지기들은 종종 통제된 벤치마크 환경을 벗어나 성능이 저하되는 경우가 많습니다. 본 연구에서는 두 가지 주요 패러다임(학습 기반 및 지도 학습)을 체계적으로 평가하고, 두 가지 모두 데이터 분포 변화, 새로운 생성 모델, 그리고 간단한 스타일 변화에 취약함을 보여줍니다. 이러한 한계를 극복하기 위해, 우리는 판별 가능한 스타일 임베딩을 학습하는 지도 학습 대비 학습(Supervised Contrastive Learning, SCL) 프레임워크를 제안합니다. 실험 결과, 지도 학습 기반 탐지기는 특정 영역에서는 우수한 성능을 보이지만, 영역 밖에서는 급격히 성능이 저하되는 반면, 학습 기반 방법은 여전히 프록시 선택에 매우 민감하다는 것을 알 수 있습니다. 전반적으로, 본 연구 결과는 도메인에 독립적인 탐지기를 구축하는 데 있어 근본적인 과제를 드러냅니다. 본 연구의 코드는 다음 주소에서 확인할 수 있습니다: https://github.com/HARSHITJAIS14/DetectAI

Original Abstract

The rapid adoption of LLMs has increased the need for reliable AI text detection, yet existing detectors often fail outside controlled benchmarks. We systematically evaluate 2 dominant paradigms (training-free and supervised) and show that both are brittle under distribution shift, unseen generators, and simple stylistic perturbations. To address these limitations, we propose a supervised contrastive learning (SCL) framework that learns discriminative style embeddings. Experiments show that while supervised detectors excel in-domain, they degrade sharply out-of-domain, and training-free methods remain highly sensitive to proxy choice. Overall, our results expose fundamental challenges in building domain-agnostic detectors. Our code is available at: https://github.com/HARSHITJAIS14/DetectAI

0 Citations

0 Influential

27.5 Altmetric

137.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!