2603.16777v1 Mar 17, 2026 cs.AI

다중 모드 AI 에이전트를 위한 예측적 계획 수립

Anticipatory Planning for Multimodal AI Agents

Yongyuan Liang

Citations: 43

h-index: 2

Franck Dernoncourt

Citations: 1,148

h-index: 15

Ryan A. Rossi

Citations: 1,178

h-index: 15

Shijie Zhou

Citations: 15

h-index: 3

Yuxuan Gu

Citations: 31

h-index: 3

Jihyung Kil

Citations: 52

h-index: 4

Ruiyi Zhang

Citations: 266

h-index: 6

Hao Tan

Citations: 26

h-index: 2

Gang Wu

Adobe Research

Citations: 529

h-index: 9

최근 다중 모드 에이전트 기술의 발전은 컴퓨터 사용 및 도구 활용 능력을 향상시켰지만, 대부분의 기존 시스템은 여전히 반응적으로 작동하며, 미래 상태나 장기 목표에 대한 고려 없이 개별적으로 행동을 최적화합니다. 이는 계획의 일관성을 저해하고 에이전트가 고수준의 다단계 작업을 안정적으로 해결하는 것을 방해합니다. 본 연구에서는 TraceR1이라는 두 단계 강화 학습 프레임워크를 소개합니다. TraceR1은 실행 전에 단기 경로를 예측하여 명시적으로 예측적 추론을 학습합니다. 첫 번째 단계에서는 경로 수준의 강화 학습을 수행하며, 예측된 행동 시퀀스 전체의 전반적인 일관성을 강화하는 보상을 사용합니다. 두 번째 단계에서는 정적 도구 에이전트로부터 얻은 실행 피드백을 활용하여 단계별 정확성과 실행 가능성을 개선하는 접지된 강화 학습 미세 조정을 적용합니다. TraceR1은 온라인 컴퓨터 사용, 오프라인 컴퓨터 사용 벤치마크, 다중 모드 도구 사용 추론 작업 등 7가지 벤치마크에서 평가되었으며, 반응형 및 단일 단계 기준 모델 대비 계획 안정성, 실행 견고성 및 일반화 성능에서 상당한 개선을 보였습니다. 이러한 결과는 예측적 경로 추론이 복잡한 실제 환경에서 효과적으로 추론, 계획 및 행동하는 다중 모드 에이전트를 구축하는 데 있어 중요한 원칙임을 보여줍니다.

Original Abstract

Recent advances in multimodal agents have improved computer-use interaction and tool-usage, yet most existing systems remain reactive, optimizing actions in isolation without reasoning about future states or long-term goals. This limits planning coherence and prevents agents from reliably solving high-level, multi-step tasks. We introduce TraceR1, a two-stage reinforcement learning framework that explicitly trains anticipatory reasoning by forecasting short-horizon trajectories before execution. The first stage performs trajectory-level reinforcement learning with rewards that enforce global consistency across predicted action sequences. The second stage applies grounded reinforcement fine-tuning, using execution feedback from frozen tool agents to refine step-level accuracy and executability. TraceR1 is evaluated across seven benchmarks, covering online computer-use, offline computer-use benchmarks, and multimodal tool-use reasoning tasks, where it achieves substantial improvements in planning stability, execution robustness, and generalization over reactive and single-stage baselines. These results show that anticipatory trajectory reasoning is a key principle for building multimodal agents that can reason, plan, and act effectively in complex real-world environments.

2 Citations

0 Influential

7.5 Altmetric

39.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!