2603.02578v1 Mar 03, 2026 cs.CL

대규모 언어 모델은 얼마나 제어 가능한가? 행동의 세부 수준에 따른 통합 평가

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Huajun Chen

Citations: 616

h-index: 12

Ning Zhang

Citations: 239

h-index: 6

Shumin Deng

Citations: 6,148

h-index: 39

Ziwen Xu

Citations: 622

h-index: 8

Haiwen Hong

Citations: 238

h-index: 5

Longtao Huang

Citations: 179

h-index: 7

Haoming Xu

Citations: 207

h-index: 4

Kewei Xu

Citations: 27

h-index: 3

Hui Xue

Citations: 209

h-index: 4

Yongliang Shen

Citations: 54

h-index: 3

Guozhou Zheng

Citations: 535

h-index: 13

대규모 언어 모델(LLM)은 점점 더 많은 사회적으로 민감한 영역에서 사용되고 있지만, 의도 불일치부터 일관성 없는 성격까지, 예측 불가능한 행동은 상당한 위험을 초래합니다. 본 연구에서는 언어 특징, 감성, 그리고 성격의 세 가지 영역에서 LLM의 제어 가능성을 평가하기 위한 계층적 벤치마크인 SteerEval을 소개합니다. 각 영역은 세 가지 수준(L1: 무엇을 표현할 것인지, L2: 어떻게 표현할 것인지, L3: 어떻게 구체적으로 구현할 것인지)으로 구성되어, 고수준의 행동 의도를 구체적인 텍스트 출력과 연결합니다. SteerEval을 사용하여 현대적인 제어 방법을 체계적으로 평가한 결과, 더 세분화된 수준에서 제어 성능이 저하되는 경향이 있음을 확인했습니다. 본 벤치마크는 안전하고 제어 가능한 LLM 행동을 위한 원칙적이고 해석 가능한 프레임워크를 제공하며, 향후 연구의 기반이 될 것입니다.

Original Abstract

Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.

0 Citations

0 Influential

19.5 Altmetric

97.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!