2602.06176v1 Feb 05, 2026 cs.AI

대규모 언어 모델의 추론 실패

Large Language Model Reasoning Failures

Pengrui Han

Citations: 91

h-index: 7

Peiyang Song

Citations: 20

h-index: 2

Noah D. Goodman

Citations: 285

h-index: 7

대규모 언어 모델(LLM)은 놀라운 추론 능력을 보여주며 광범위한 작업에서 인상적인 결과를 달성했습니다. 이러한 발전에도 불구하고, 겉보기에 단순한 시나리오에서도 발생하는 중대한 추론 실패가 여전히 존재합니다. 이러한 단점을 체계적으로 이해하고 해결하기 위해, 본 논문은 LLM의 추론 실패에 초점을 맞춘 최초의 포괄적인 조사를 제시합니다. 우리는 추론을 신체화된(embodied) 유형과 비신체화된(non-embodied) 유형으로 구분하고, 후자를 다시 비형식적(직관적) 추론과 형식적(논리적) 추론으로 세분화하는 새로운 분류 프레임워크를 소개합니다. 이와 병행하여, 우리는 보완적인 축을 따라 추론 실패를 세 가지 유형으로 분류합니다. 즉, 다운스트림 작업에 광범위하게 영향을 미치는 LLM 아키텍처 고유의 근본적인 실패, 특정 도메인에서 나타나는 애플리케이션별 한계, 그리고 사소한 변형에도 일관되지 않은 성능으로 특징지어지는 견고성 문제입니다. 각 추론 실패에 대해 명확한 정의를 제공하고, 기존 연구를 분석하며, 근본 원인을 탐구하고, 완화 전략을 제시합니다. 파편화된 연구 노력을 통합함으로써, 본 조사는 LLM 추론의 구조적 약점에 대한 체계적인 관점을 제공하며, 더 강력하고 신뢰할 수 있으며 견고한 추론 능력을 구축하기 위한 미래 연구의 방향과 귀중한 통찰력을 제공합니다. 또한 우리는 이 분야에 쉽게 접근할 수 있도록 LLM 추론 실패에 관한 포괄적인 연구 자료 모음을 GitHub 저장소(https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failures)에 공개합니다.

Original Abstract

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To systematically understand and address these shortcomings, we present the first comprehensive survey dedicated to reasoning failures in LLMs. We introduce a novel categorization framework that distinguishes reasoning into embodied and non-embodied types, with the latter further subdivided into informal (intuitive) and formal (logical) reasoning. In parallel, we classify reasoning failures along a complementary axis into three types: fundamental failures intrinsic to LLM architectures that broadly affect downstream tasks; application-specific limitations that manifest in particular domains; and robustness issues characterized by inconsistent performance across minor variations. For each reasoning failure, we provide a clear definition, analyze existing studies, explore root causes, and present mitigation strategies. By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities. We additionally release a comprehensive collection of research works on LLM reasoning failures, as a GitHub repository at https://github.com/Peiyang-Song/Awesome-LLM-Reasoning-Failures, to provide an easy entry point to this area.

8 Citations

1 Influential

23.5 Altmetric

127.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!