2602.06443v1 Feb 06, 2026 cs.CR

TrajAD: 신뢰할 수 있는 LLM 에이전트를 위한 경로 이상 탐지

TrajAD: Trajectory Anomaly Detection for Trustworthy LLM Agents

Yibing Liu

Citations: 59

h-index: 4

Chong Zhang

Citations: 122

h-index: 5

Zhongyi Han

Citations: 676

h-index: 14

Han Liu

Citations: 278

h-index: 5

Yong Wang

Citations: 44

h-index: 3

Yang Yu

Citations: 3,360

h-index: 6

Xiaoyan Wang

Citations: 32

h-index: 4

Yilong Yin

Citations: 15

h-index: 3

본 연구에서는 신뢰할 수 있는 LLM 에이전트 개발에 필수적인 실시간 경로 이상 탐지 문제를 다룹니다. 현재 안전 조치는 주로 정적인 입력/출력 필터링에 집중되어 있습니다. 그러나 우리는 LLM 에이전트의 신뢰성을 확보하려면 중간 실행 과정을 감사해야 한다고 주장합니다. 본 연구에서는 경로 이상 탐지라는 과제를 정의하고, 단순히 이상을 탐지하는 것뿐만 아니라 정확한 오류 위치를 파악하는 것을 목표로 합니다. 이러한 능력은 효율적인 복구 및 재시도 기능을 가능하게 하는 데 필수적입니다. 이를 위해, 다양한 절차적 이상을 포괄하는 데이터셋인 TrajBench를 생성했습니다. 이 벤치마크를 사용하여 모델의 프로세스 감독 능력을 조사했습니다. 연구 결과, 일반적인 LLM은 제로샷 프롬프팅을 사용하더라도 이러한 이상을 식별하고 위치를 파악하는 데 어려움을 겪는 것으로 나타났습니다. 이는 일반적인 능력이 자동으로 프로세스 신뢰성으로 이어지지 않음을 보여줍니다. 이러한 문제를 해결하기 위해, 세분화된 프로세스 감독으로 학습된 특수 검증기인 TrajAD를 제안합니다. 제안하는 방법은 기존 방법보다 우수한 성능을 보이며, 특수화된 감독이 신뢰할 수 있는 에이전트를 구축하는 데 필수적임을 입증합니다.

Original Abstract

We address the problem of runtime trajectory anomaly detection, a critical capability for enabling trustworthy LLM agents. Current safety measures predominantly focus on static input/output filtering. However, we argue that ensuring LLM agents reliability requires auditing the intermediate execution process. In this work, we formulate the task of Trajectory Anomaly Detection. The goal is not merely detection, but precise error localization. This capability is essential for enabling efficient rollback-and-retry. To achieve this, we construct TrajBench, a dataset synthesized via a perturb-and-complete strategy to cover diverse procedural anomalies. Using this benchmark, we investigate the capability of models in process supervision. We observe that general-purpose LLMs, even with zero-shot prompting, struggle to identify and localize these anomalies. This reveals that generalized capabilities do not automatically translate to process reliability. To address this, we propose TrajAD, a specialized verifier trained with fine-grained process supervision. Our approach outperforms baselines, demonstrating that specialized supervision is essential for building trustworthy agents.

5 Citations

1 Influential

7 Altmetric

42.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!