2603.07837v1 Mar 08, 2026 cs.CL

AI Steerability 360: 대규모 언어 모델 제어를 위한 도구 모음

AI Steerability 360: A Toolkit for Steering Large Language Models

K. Ramamurthy

Citations: 7,161

h-index: 33

Erik Miehling

Citations: 314

h-index: 8

P. Venkateswaran

Citations: 139

h-index: 6

I. Ko

Citations: 0

h-index: 0

Pierre L. Dognin

IBM

Citations: 880

h-index: 16

Moninder Singh

Citations: 67

h-index: 4

Tejaswini Pedapati

Citations: 971

h-index: 13

Avinash Balakrishnan

Citations: 1,054

h-index: 8

Matthew Riemer

Citations: 102

h-index: 5

Dennis Wei

Citations: 142

h-index: 5

Inge Vejsbjerg

Citations: 141

h-index: 7

Elizabeth M. Daly

Citations: 153

h-index: 7

Kush R. Varshney

Citations: 9,967

h-index: 44

AI Steerability 360은 대규모 언어 모델(LLM)의 제어를 위한 확장 가능하고 오픈 소스인 Python 라이브러리입니다. 이 라이브러리는 모델 제어의 네 가지 주요 영역(입력: 프롬프트 수정, 구조: 모델 가중치 또는 아키텍처 수정, 상태: 모델의 활성화 및 어텐션 수정, 출력: 디코딩 또는 생성 과정 수정)을 중심으로 설계되었습니다. 다양한 제어 방법은 '제어 파이프라인'이라는 공통 인터페이스를 통해 모델에 적용되며, 이를 통해 여러 제어 방법을 조합하여 사용할 수 있습니다. 사용 사례 클래스(작업 정의)와 벤치마크 클래스(특정 작업에 대한 성능 비교)를 통해 제어 방법 및 파이프라인의 종합적인 평가 및 비교가 가능합니다. 이 도구 모음은 제어 방법 개발 및 종합적인 평가의 진입 장벽을 크게 낮추어 줍니다. 이 도구 모음은 Hugging Face 환경에 최적화되어 있으며, Apache 2.0 라이선스에 따라 https://github.com/IBM/AISteer360 에서 제공됩니다.

Original Abstract

The AI Steerability 360 toolkit is an extensible, open-source Python library for steering LLMs. Steering abstractions are designed around four model control surfaces: input (modification of the prompt), structural (modification of the model's weights or architecture), state (modification of the model's activations and attentions), and output (modification of the decoding or generation process). Steering methods exert control on the model through a common interface, termed a steering pipeline, which additionally allows for the composition of multiple steering methods. Comprehensive evaluation and comparison of steering methods/pipelines is facilitated by use case classes (for defining tasks) and a benchmark class (for performance comparison on a given task). The functionality provided by the toolkit significantly lowers the barrier to developing and comprehensively evaluating steering methods. The toolkit is Hugging Face native and is released under an Apache 2.0 license at https://github.com/IBM/AISteer360.

0 Citations

0 Influential

63.783544133448 Altmetric

318.9 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!