2602.02605v1 Feb 02, 2026 cs.NE

언어 모델이 무엇을 알고 있는지 파악하기 위한 미세 조정

Fine-Tuning Language Models to Know What They Know

Risto Miikkulainen

Citations: 70

h-index: 5

Xin Qiu

Citations: 36

h-index: 3

Sangjun Park

Citations: 11

h-index: 2

Elliot Meyerson

Citations: 1,947

h-index: 16

메타인지 능력은 지능의 중요한 구성 요소이며, 특히 자신의 지식에 대한 인식을 포함합니다. 인간은 질문에 답하고 자신의 지식 상태를 보고할 때 공유된 내부 메모리에 의존하지만, LLM(Large Language Models)에서의 이러한 의존성은 아직 충분히 연구되지 않았습니다. 본 연구는 이중 프롬프트 방법을 사용하여 메타인지 능력을 $d_{ m{type2}}'$로 측정하는 프레임워크를 제안하고, 모델의 내부 지식을 명시적인 행동과 연결하기 위해 Evolution Strategy for Metacognitive Alignment (ESMA)를 소개합니다. ESMA는 다양한 훈련되지 않은 환경에서 강력한 일반화 성능을 보여주며, 이는 모델이 자신의 지식을 참조하는 능력이 향상되었음을 나타냅니다. 또한, 파라미터 분석 결과, 이러한 개선은 소수의 중요한 수정 사항에 기인하는 것으로 나타났습니다.

Original Abstract

Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.

1 Citations

0 Influential

8 Altmetric

41.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!