2605.07806v1 May 08, 2026 cs.CL

신뢰를 넘어: LLM 성능 예측을 위한 자기 평가 재고

Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs

L. Chen

Citations: 649

h-index: 3

Samarth Khanna

Citations: 15

h-index: 2

Sree Bhattacharyya

Citations: 29

h-index: 3

Lucas Craig

Citations: 5

h-index: 1

Tharun Dilliraj

Citations: 1

h-index: 1

James Z. Wang

Citations: 25

h-index: 2

대규모 언어 모델(LLM)은 신뢰성 있는 자기 평가가 중요한 환경에서 점점 더 많이 사용되고 있습니다. 모델의 신뢰성 평가는 확률적 정확성 추정에서 시작하여 최근에는 언어화된 자신감 추정으로 발전했습니다. 그러나 자신감은 모델의 정확성을 예측하는 일관성이 없고 지나치게 낙관적인 지표로 나타났습니다. 본 연구에서는 인간 심리학의 자기 평가 이론을 바탕으로, 자기 평가를 여러 구성 요소로 분해하는 다차원적 관점을 제안합니다. 우리는 자신감 외에 6가지 평가 기반의 자기 평가 차원을 도출하고, 12개의 LLM과 38개의 작업(8개 도메인)에 걸쳐 모델 오류를 예측하는 데 있어 이러한 차원의 유용성을 평가했습니다. 연구 결과, 특히 노력과 능력과 관련된 평가 차원은 대부분의 환경에서 자신감과 일치하거나 능가하는 성능을 보였습니다. 또한, 노력은 모델 크기에 관계없이 안정적인 예측을 제공하며, 지나치게 낙관적인 추정치를 줄이는 데 기여합니다. 반면, 감정 관련 차원은 예측력이 미미했습니다. 더욱이, 가장 유용한 차원은 작업 특성에 따라 체계적으로 달라집니다. 노력은 추론 집약적인 작업에서 가장 예측력이 높으며, 능력과 자신감은 정보 검색 중심적인 작업에서 우위를 나타냅니다. 전반적으로, 본 연구의 결과는 구조화된 다차원적 자기 평가가 다양한 실제 환경에서 언어 모델의 신뢰성과 안전성을 향상시키는 유망한 접근 방식임을 시사합니다.

Original Abstract

Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Confidence, however, has been shown to be an inconsistent and overoptimistic predictor of model correctness. Drawing on cognitive appraisal theory, a framework from human psychology that decomposes self-evaluation into multiple components, we propose a multidimensional perspective on model self-assessment. We elicit six appraisal-based dimensions of self-assessment, alongside confidence, and evaluate their utility for predicting model failure across 12 LLMs and 38 tasks spanning eight domains. We find that competence-related appraisal dimensions, particularly effort and ability, consistently match or outperform confidence across most settings. Effort additionally yields less overoptimistic estimates that remain stable across model sizes. In contrast, affective dimensions provide marginally predictive signals. Furthermore, the most informative dimension varies systematically with task characteristics: effort is most predictive for reasoning-intensive tasks, while ability and confidence dominate on retrieval-oriented tasks. Broadly, our findings indicate that structured multidimensional self-assessment is a promising approach to improving the reliability and safety of language model deployment across diverse real-world settings.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!