2604.23027v1 Apr 24, 2026 cs.AI

대규모 언어 모델 디버깅을 위한 체계적인 접근 방식

A Systematic Approach for Large Language Models Debugging

Sungeun An

Citations: 12

h-index: 2

Yuya Jeremy Ong

Citations: 0

h-index: 0

Farhan Ahmed

Citations: 9

h-index: 1

Chad DeLuca

Citations: 34

h-index: 3

Shailja Thakur

Citations: 1,083

h-index: 11

Hima Patel

Citations: 146

h-index: 3

Basel Shbita

Citations: 0

h-index: 0

A. Gentile

Citations: 113

h-index: 5

Bing Zhang

Citations: 32

h-index: 4

Shubhi Asthana

Citations: 208

h-index: 7

Yi Zhou

Citations: 139

h-index: 3

Saptha Surendran

Citations: 144

h-index: 3

Rohan Kulkarni

Citations: 9

h-index: 1

대규모 언어 모델(LLM)은 현대 AI 워크플로우의 핵심으로, 개방형 텍스트 생성부터 복잡한 에이전트 기반 추론에 이르기까지 다양한 응용 분야를 지원합니다. 그러나 이러한 모델의 디버깅은 투명성이 부족하고 확률적인 특성, 그리고 다양한 작업 및 환경에서 오류를 진단하기 어렵기 때문에 지속적인 과제로 남아 있습니다. 본 논문에서는 LLM 디버깅을 위한 체계적인 접근 방식을 제시하며, 모델을 관찰 가능한 시스템으로 간주하고, 문제 감지부터 모델 개선에 이르기까지 구조화되고 모델에 독립적인 방법을 제공합니다. 본 연구는 평가, 해석 가능성, 오류 분석 방식을 통합하여, 실무자들이 모델의 약점을 반복적으로 진단하고, 프롬프트 및 모델 파라미터를 개선하며, 미세 조정 또는 평가를 위한 데이터를 조정할 수 있도록 지원합니다. 특히, 표준화된 벤치마크 및 평가 기준이 부족한 환경에서도 효과적입니다. 우리는 이러한 체계적인 방법론이 문제 해결 속도를 가속화할 뿐만 아니라, LLM 기반 시스템의 배포 과정에서 재현성, 투명성 및 확장성을 향상시킨다고 주장합니다.

Original Abstract

Large language models (LLMs) have become central to modern AI workflows, powering applications from open-ended text generation to complex agent-based reasoning. However, debugging these models remains a persistent challenge due to their opaque and probabilistic nature and the difficulty of diagnosing errors across diverse tasks and settings. This paper introduces a systematic approach for LLM debugging that treats models as observable systems, providing structured, model-agnostic methods from issue detection to model refinement. By unifying evaluation, interpretability, and error-analysis practices, our approach enables practitioners to iteratively diagnose model weaknesses, refine prompts and model parameters, and adapt data for fine-tuning or assessment, while remaining effective in contexts where standardized benchmarks and evaluation criteria are lacking. We argue that such a structured methodology not only accelerates troubleshooting but also fosters reproducibility, transparency, and scalability in the deployment of LLM-based systems.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!