2605.15000v1 May 14, 2026 cs.CL

최첨단 LLM에서 발생하는 조기 결론 도출 현상: 정량화 및 완화 방안

Quantifying and Mitigating Premature Closure in Frontier LLMs

Nigam H. Shah

Citations: 16

h-index: 3

Suhana Bedi

Citations: 771

h-index: 10

Rebecca Handler

Citations: 17

h-index: 1

조기 결론 도출, 즉 충분한 정보가 확보되지 않은 상태에서 결론에 도달하는 현상은 진단 오류의 원인으로 알려져 있지만, 대규모 언어 모델(LLM)에서는 아직 제대로 연구되지 않았습니다. 본 연구에서는 LLM의 조기 결론 도출을 불확실한 상황에서의 부적절한 결론으로 정의하며, 안전한 응답은 명확화, 회피, 상위 기관 보고, 거부 등이 있어야 함에도 불구하고, 답변, 추천, 또는 임상 지침을 제공하는 경우를 의미합니다. 본 연구에서는 구조화된 작업과 개방형 작업 모두에서 5가지 최첨단 LLM을 평가했습니다. MedQA(n=500) 및 AfriMed-QA(n=490) 데이터셋에서 정답이 제거된 질문에 대해 모델들은 여전히 높은 비율로 답변을 선택했는데, 기준 모델의 오답률은 각각 55-81% 및 53-82%였습니다. 개방형 평가에서는 모델들이 HealthBench의 861개 질문 중 평균 30%에서, 의사들이 작성한 191개의 적대적 질문 중 78%에서 부적절한 답변을 제공했습니다. 안전을 강조하는 프롬프팅은 모델 전반에 걸쳐 조기 결론 도출을 줄이는 데 도움이 되었지만, 여전히 오류가 발생하여 의료 LLM이 답변하지 않아야 할 때를 인지하는지 여부를 평가해야 할 필요성을 강조합니다.

Original Abstract

Premature closure, or committing to a conclusion before sufficient information is available, is a recognized contributor to diagnostic error but remains underexamined in large language models (LLMs). We define LLM premature closure as inappropriate commitment under uncertainty: providing an answer, recommendation, or clinical guidance when the safer response would be clarification, abstention, escalation, or refusal. We evaluated five frontier LLMs across structured and open-ended medical tasks. In MedQA (n = 500) and AfriMed-QA (n = 490) questions where the correct choice had been removed, models still selected an answer at high rates, with baseline false-action rates of 55-81% and 53-82%, respectively. In open-ended evaluation, models gave inappropriate answers on an average of 30% of 861 HealthBench questions and 78% of 191 physician-authored adversarial queries. Safety-oriented prompting reduced premature closure across models, but residual failure persisted, highlighting the need to evaluate whether medical LLMs know when not to answer.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!