2602.02416v1 Feb 02, 2026 cs.AI

구조는 LLM의 효과적인 오류 자체 국소화를 가능하게 한다

Structure Enables Effective Self-Localization of Errors in LLMs

Ankur Samanta

Citations: 17

h-index: 3

Akshayaa Magesh

Citations: 98

h-index: 6

Kavosh Asadi

Citations: 1,258

h-index: 13

Boris Vidolov

Citations: 15

h-index: 3

Kaveh Hassani

Citations: 2,433

h-index: 13

Paul Sajda

Citations: 29

h-index: 2

Jalaj Bhandari

Citations: 762

h-index: 8

Yonathan Efroni

Citations: 1,684

h-index: 21

Ayush Jain

Citations: 8

h-index: 2

Youliang Yu

Citations: 8

h-index: 2

Daniel Jiang

Citations: 25

h-index: 3

언어 모델의 자기 수정은 여전히 어려운 과제로 남아 있다. 본 연구에서는 스스로를 효과적으로 수정할 수 있는 AI 시스템을 구축하기 위한 방안으로, 언어 모델이 잘못된 추론 과정에서 오류를 명시적으로 국소화할 수 있는지 탐구한다. 우리는 추론을 개별적이고 의미적으로 일관된 '사고 단계(thought steps)'로 구조화하는 프롬프팅 방법을 소개하며, 모델이 기존의 비구조화된 '사고의 사슬(chain-of-thought)' 추론에서는 실패했던 것과 달리 이 구조 내에서는 오류를 신뢰성 있게 국소화할 수 있음을 보인다. 인간의 뇌가 불연속적인 결정 지점에서 오류를 모니터링하고 대안을 다시 샘플링하는 방식에서 영감을 받아, 자기 수정 프레임워크인 '사고의 반복적 수정 샘플링(Thought-ICS)'을 제안한다. Thought-ICS는 모델이 한 번에 하나의 개별적이고 완전한 생각을 생성하도록 반복적으로 유도하는데, 여기서 각 생각은 모델의 신중한 결정을 나타내며 정밀한 오류 국소화를 위한 자연스러운 경계를 형성한다. 검증 시 모델은 첫 번째 오류 단계를 찾아내고, 시스템은 마지막으로 올바른 지점으로 역추적하여 대체 추론을 생성한다. 오라클에 의해 잘못된 것으로 검증된 추론을 수정하도록 했을 때, Thought-ICS는 20-40%의 자기 수정 성능 향상을 달성했다. 또한 외부 검증이 없는 완전 자율 설정에서도 최신 자기 수정 베이스라인들을 능가하는 성능을 보였다.

Original Abstract

Self-correction in language models remains elusive. In this work, we explore whether language models can explicitly localize errors in incorrect reasoning, as a path toward building AI systems that can effectively correct themselves. We introduce a prompting method that structures reasoning as discrete, semantically coherent thought steps, and show that models are able to reliably localize errors within this structure, while failing to do so in conventional, unstructured chain-of-thought reasoning. Motivated by how the human brain monitors errors at discrete decision points and resamples alternatives, we introduce Iterative Correction Sampling of Thoughts (Thought-ICS), a self-correction framework. Thought-ICS iteratively prompts the model to generate reasoning one discrete and complete thought at a time--where each thought represents a deliberate decision by the model--creating natural boundaries for precise error localization. Upon verification, the model localizes the first erroneous step, and the system backtracks to generate alternative reasoning from the last correct point. When asked to correct reasoning verified as incorrect by an oracle, Thought-ICS achieves 20-40% self-correction lift. In a completely autonomous setting without external verification, it outperforms contemporary self-correction baselines.

0 Citations

0 Influential

10.5 Altmetric

52.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!