2603.19022v1 Mar 19, 2026 cs.AI

LLM 엔드포인트의 안정성과 식별을 위한 행동 지문

Behavioral Fingerprints for LLM Endpoint Stability and Identity

Daniel Kang

Citations: 165

h-index: 6

Jonah Leshin

Citations: 49

h-index: 3

Ian Timmis

Citations: 19

h-index: 2

Manish Shah

Citations: 279

h-index: 4

AI 기반 애플리케이션의 안정성은 해당 애플리케이션을 구동하는 모델 엔드포인트의 행동 일관성에 달려 있습니다. 업타임, 지연 시간, 처리량과 같은 기존의 안정성 지표는 행동 변화를 포착하지 못하며, 엔드포인트가 업데이트된 가중치, 토크나이저, 양자화, 추론 엔진, 커널, 캐싱, 라우팅 또는 하드웨어로 인해 '정상' 상태를 유지하면서도 실제 모델의 특성이 변할 수 있습니다. 본 논문에서는 Stability Monitor라는 블랙박스 안정성 모니터링 시스템을 소개합니다. 이 시스템은 정해진 프롬프트 세트로부터 출력 결과를 주기적으로 샘플링하고, 시간에 따른 출력 분포를 비교하여 엔드포인트의 행동 지문을 생성합니다. 생성된 지문들은 프롬프트에 따른 에너지 거리 통계량의 합을 사용하여 비교하며, 순열 검정(permutation test)을 통해 얻은 p-값을 사용하여 분포 변화를 감지하고, 변화 이벤트와 안정성 기간을 정의합니다. 제어된 검증 환경에서 Stability Monitor는 모델 패밀리, 버전, 추론 스택, 양자화 및 행동 매개변수의 변경 사항을 감지합니다. 또한, 동일한 모델을 여러 제공업체에서 운영하는 실제 환경 모니터링 결과, 제공업체 간 및 동일 제공업체 내에서 상당한 안정성 차이가 관찰되었습니다.

Original Abstract

The consistency of AI-native applications depends on the behavioral consistency of the model endpoints that power them. Traditional reliability metrics such as uptime, latency and throughput do not capture behavioral change, and an endpoint can remain "healthy" while its effective model identity changes due to updates to weights, tokenizers, quantization, inference engines, kernels, caching, routing, or hardware. We introduce Stability Monitor, a black-box stability monitoring system that periodically fingerprints an endpoint by sampling outputs from a fixed prompt set and comparing the resulting output distributions over time. Fingerprints are compared using a summed energy distance statistic across prompts, with permutation-test p-values as evidence of distribution shift aggregated sequentially to detect change events and define stability periods. In controlled validation, Stability Monitor detects changes to model family, version, inference stack, quantization, and behavioral parameters. In real-world monitoring of the same model hosted by multiple providers, we observe substantial provider-to-provider and within-provider stability differences.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!