2602.14229v1 Feb 15, 2026 cs.AI

CORPGEN: 다중 시계 과업 환경에서 자율 디지털 직원을 활용한 기업 환경 시뮬레이션

CORPGEN: Simulating Corporate Environments with Autonomous Digital Employees in Multi-Horizon Task Environments

Abubakarr Jaye

Citations: 3

h-index: 1

Nigel Boachie Kumankumah

Citations: 0

h-index: 0

Chidera Biringa

Citations: 10

h-index: 2

Anjel Patel

Citations: 5

h-index: 1

Sulaiman Vesal

Stanford.edu, fau.de

Citations: 1,666

h-index: 18

Dayquan Julienne

Citations: 0

h-index: 0

Charlotte Siska

Citations: 0

h-index: 0

Manuel Ra'ul Mel'endez Luj'an

Citations: 7

h-index: 1

Anthony Twum-Barimah

Citations: 0

h-index: 0

Mauricio Velazco

Citations: 7

h-index: 1

Tianwei Chen

Citations: 1

h-index: 1

장기 시계(Long-horizon) 추론은 자율 에이전트의 핵심 과제이지만, 기존 벤치마크는 에이전트를 단일 작업에 대해 고립적으로 평가합니다. 실제 조직 업무는 인터리빙(끼어들기), 의존성, 우선순위 재조정을 포함한 다수의 동시 장기 과업 관리를 필요로 합니다. 본 논문에서는 수 시간에 걸친 지속적 실행 문맥 내에서 수십 개의 인터리빙된 작업(45개 이상, 500-1500단계 이상)을 일관성 있게 실행해야 하는 독특한 문제 유형인 다중 시계 과업 환경(MHTEs)을 소개합니다. 우리는 부하가 25%에서 100%로 증가할 때 기준 CUA(Computer Using Agents)의 완료율을 16.7%에서 8.7%로 저하시키는 네 가지 실패 요인을 식별했으며, 이 패턴은 세 가지 독립적인 구현체에서 일관되게 나타났습니다. 이러한 실패 요인은 문맥 포화(O(1) 대 O(N) 증가), 메모리 간섭, 의존성 복잡도(DAG 대 체인), 우선순위 재조정 오버헤드입니다. 이에 우리는 다중 시계 목표 정렬을 위한 계층적 계획, 작업 간 오염을 방지하는 하위 에이전트 격리, 계층화된 메모리(작업용, 구조적, 의미론적), 적응형 요약을 통해 이러한 문제를 해결하는 아키텍처 불가지론적 프레임워크인 CorpGen을 제안합니다. CorpGen은 지속적인 정체성과 현실적인 일정을 가진 디지털 직원을 통해 기업 환경을 시뮬레이션합니다. OSWorld Office에서 세 가지 CUA 백엔드(UFO2, OpenAI CUA, 계층형)를 대상으로 실험한 결과, CorpGen은 기준 모델 대비 최대 3.5배의 성능 향상(15.2% 대 4.3%)을 달성하고 부하 증가 시에도 안정적인 성능을 유지하여, 성능 향상이 특정 CUA 구현이 아닌 아키텍처 메커니즘에서 비롯됨을 확인했습니다. 소거 연구(Ablation studies) 결과, 경험적 학습이 가장 큰 성능 향상을 제공하는 것으로 나타났습니다.

Original Abstract

Long-horizon reasoning is a key challenge for autonomous agents, yet existing benchmarks evaluate agents on single tasks in isolation. Real organizational work requires managing many concurrent long-horizon tasks with interleaving, dependencies, and reprioritization. We introduce Multi-Horizon Task Environments (MHTEs): a distinct problem class requiring coherent execution across dozens of interleaved tasks (45+, 500-1500+ steps) within persistent execution contexts spanning hours. We identify four failure modes that cause baseline CUAs to degrade from 16.7% to 8.7% completion as load scales 25% to 100%, a pattern consistent across three independent implementations. These failure modes are context saturation (O(N) vs O(1) growth), memory interference, dependency complexity (DAGs vs. chains), and reprioritization overhead. We present CorpGen, an architecture-agnostic framework addressing these failures via hierarchical planning for multi-horizon goal alignment, sub-agent isolation preventing cross-task contamination, tiered memory (working, structured, semantic), and adaptive summarization. CorpGen simulates corporate environments through digital employees with persistent identities and realistic schedules. Across three CUA backends (UFO2, OpenAI CUA, hierarchical) on OSWorld Office, CorpGen achieves up to 3.5x improvement over baselines (15.2% vs 4.3%) with stable performance under increasing load, confirming that gains stem from architectural mechanisms rather than specific CUA implementations. Ablation studies show experiential learning provides the largest gains.

0 Citations

0 Influential

9 Altmetric

45.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!