2604.23593v1 Apr 26, 2026 cs.AI

AI가 과학 논문을 평가할 때: 심사위원의 신뢰성을 어떻게 확보할 것인가?

When AI reviews science: Can we trust the referee?

Linan Yue

Citations: 12

h-index: 2

Min-Ling Zhang

Citations: 27

h-index: 3

Jialiang Wang

Citations: 8

h-index: 2

Kui Ren

Citations: 235

h-index: 7

Lei Chen

Citations: 41

h-index: 3

Yuchen Liu

Citations: 4

h-index: 1

Hang Xu

Citations: 3

h-index: 1

Kaichun Hu

Citations: 11

h-index: 2

Shimin Di

Citations: 361

h-index: 11

Wangze Ni

Citations: 45

h-index: 3

과학 논문 투고 건수는 계속 증가하고 있으며, 이는 자격을 갖춘 인간 심사위원의 역량을 초과하여 편집 일정에 부담을 주고 있습니다. 동시에, 현대의 대규모 언어 모델(LLM)은 요약, 사실 확인, 문헌 검토 등에서 뛰어난 기능을 제공하여 AI를 동료 평가에 통합하는 것이 점점 더 매력적이고, 실제로 피할 수 없는 선택이 되고 있습니다. 그러나 초기 적용 사례와 비공식적인 도입 과정에서 심각한 문제점들이 드러났습니다. 최근 사례들은 논문에 숨겨진 프롬프트 주입 공격이 LLM이 생성한 평가를 부당하게 긍정적인 방향으로 유도할 수 있다는 것을 보여주었습니다. 또한, 관련 연구들은 적대적인 표현, 권위 및 길이 편향, 그리고 환각 현상에 대한 취약성을 입증했습니다. 이러한 사례들은 학문적 커뮤니케이션에 있어 중요한 질문을 제기합니다. AI가 과학 논문을 평가할 때, 우리는 AI 심사위원을 신뢰할 수 있는가? 본 논문은 AI 동료 평가의 보안 및 신뢰성 중심 분석을 제공합니다. 우리는 심사 과정 전반에 걸쳐 공격 유형을 분석합니다. 여기에는 학습 및 데이터 검색, 서면 검토, 심층 검토, 반론, 그리고 시스템 수준이 포함됩니다. 본 연구는 ICLR 2025에 제출된 계층화된 논문 집합에 대해, 두 개의 고급 LLM 기반 심사위원을 사용하여 명성 프레임, 주장 강도, 반론에 대한 아첨, 그리고 맥락적 오염이 평가 점수에 미치는 인과적 영향을 분석하는 네 가지 실험을 수행합니다. 이러한 분류 체계와 실험적 검증은 AI 동료 평가의 신뢰성을 평가하고 추적하기 위한 증거 기반의 기준을 제공하며, 구체적인 문제점을 제시하여 효과적인 해결책을 개발하는 데 도움을 줄 것입니다.

Original Abstract

The volume of scientific submissions continues to climb, outpacing the capacity of qualified human referees and stretching editorial timelines. At the same time, modern large language models (LLMs) offer impressive capabilities in summarization, fact checking, and literature triage, making the integration of AI into peer review increasingly attractive -- and, in practice, unavoidable. Yet early deployments and informal adoption have exposed acute failure modes. Recent incidents have revealed that hidden prompt injections embedded in manuscripts can steer LLM-generated reviews toward unjustifiably positive judgments. Complementary studies have also demonstrated brittleness to adversarial phrasing, authority and length biases, and hallucinated claims. These episodes raise a central question for scholarly communication: when AI reviews science, can we trust the AI referee? This paper provides a security- and reliability-centered analysis of AI peer review. We map attacks across the review lifecycle -- training and data retrieval, desk review, deep review, rebuttal, and system-level. We instantiate this taxonomy with four treatment-control probes on a stratified set of ICLR 2025 submissions, using two advanced LLM-based referees to isolate the causal effects of prestige framing, assertion strength, rebuttal sycophancy, and contextual poisoning on review scores. Together, this taxonomy and experimental audit provide an evidence-based baseline for assessing and tracking the reliability of AI peer review and highlight concrete failure points to guide targeted, testable mitigations.

3 Citations

0 Influential

5.5 Altmetric

30.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!