2604.00842v1 Apr 01, 2026 cs.AI

적극적인 에이전트 연구 환경: 적극적인 어시스턴트 평가를 위한 활동적인 사용자 시뮬레이션

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

Deepak Nathani

University of California, Santa Barbara

Citations: 1,213

h-index: 8

Xin Eric Wang

Citations: 2

h-index: 1

Yinfei Yang

Citations: 2,237

h-index: 23

Chengquan Zhang

Citations: 12

h-index: 2

C. Huan

Citations: 0

h-index: 0

Jiaming Shan

Citations: 324

h-index: 3

Alkesh Patel

Citations: 105

h-index: 7

Zhe Gan

Citations: 1,529

h-index: 17

William Yang Wang

Citations: 70

h-index: 3

Michael Stephen Saxon

University of California, Santa Barbara

Citations: 1,093

h-index: 15

사용자의 요구를 예측하고 자율적으로 작업을 수행하는 적극적인 에이전트는 훌륭한 디지털 어시스턴트가 될 잠재력을 가지고 있지만, 현실적인 사용자 시뮬레이션 프레임워크의 부족은 이러한 에이전트의 개발을 방해합니다. 기존의 접근 방식은 앱을 단순한 API 호출로 모델링하여 디지털 환경에서 사용자의 상태 기반 및 순차적인 상호 작용을 제대로 반영하지 못하며, 이는 현실적인 사용자 시뮬레이션을 어렵게 만듭니다. 본 연구에서는 디지털 환경에서 적극적인 에이전트를 구축하고 평가하기 위한 프레임워크인 Proactive Agent Research Environment (Pare)를 소개합니다. Pare는 애플리케이션을 상태 기반 탐색 및 사용자 시뮬레이터의 상태 의존적인 동작 공간을 갖는 유한 상태 머신으로 모델링하여 적극적인 사용자 시뮬레이션을 가능하게 합니다. 이러한 기반을 바탕으로, 우리는 통신, 생산성, 일정 관리 및 라이프스타일 앱을 포괄하는 143개의 다양한 작업으로 구성된 벤치마크인 Pare-Bench를 제시합니다. Pare-Bench는 문맥 인식, 목표 추론, 개입 시점 및 멀티 앱 오케스트레이션을 테스트하도록 설계되었습니다.

Original Abstract

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

1 Citations

0 Influential

11.5 Altmetric

58.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!