2601.10402v4 Jan 15, 2026 cs.AI

초장기 지향형 에이전트 기반 과학 연구를 향하여: 머신러닝 엔지니어링을 위한 인지 축적

Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering

Xinyu Zhu

Citations: 203

h-index: 6

Yuzhu Cai

Citations: 140

h-index: 6

Zexi Liu

Citations: 135

h-index: 7

Bingyang Zheng

Citations: 12

h-index: 1

Rui Ye

Citations: 67

h-index: 5

Hanrui Wang

Citations: 26

h-index: 3

Wei-chen Wang

Citations: 46

h-index: 4

Yuzhi Zhang

Citations: 1,732

h-index: 11

Linfeng Zhang

Citations: 14

h-index: 2

Di Jin

Citations: 13

h-index: 1

Siheng Chen

Citations: 49

h-index: 4

AIRA-dojo Neo

Citations: 11

h-index: 1

Cheng Wang

Citations: 23

h-index: 3

Jiaao Chen

Citations: 780

h-index: 17

인공지능이 에이전트 기반 과학으로 발전하는 과정은 현재 초장기 자율성이라는 문제에 직면해 있습니다. 이는 며칠 또는 몇 주에 걸친 실험 주기 동안 전략적 일관성을 유지하고 반복적인 수정 작업을 수행할 수 있는 능력과 관련됩니다. 대규모 언어 모델(LLM)은 단기적인 추론 능력에서 뛰어난 성능을 보이지만, 실제 연구 환경의 고차원적이고 지연된 피드백 환경에서 실행 세부 사항에 압도되어 희소한 피드백을 일관된 장기적인 지침으로 통합하는 데 실패합니다. 본 연구에서는 초장기 머신러닝 엔지니어링(MLE)을 마스터하는 자율 에이전트인 ML-Master 2.0을 소개합니다. 이는 과학적 발견의 대표적인 축소판입니다. 본 연구에서는 컨텍스트 관리를 인지 축적의 과정으로 재정의하고, 컴퓨터 시스템에서 영감을 받은 다층 구조인 계층적 인지 캐싱(HCC)을 도입하여 경험을 시간의 흐름에 따라 구조적으로 분리합니다. HCC는 일시적인 실행 추적을 안정적인 지식과 다양한 작업에 적용 가능한 지혜로 동적으로 변환하여 에이전트가 즉각적인 실행과 장기적인 실험 전략을 분리하도록 지원하며, 이를 통해 정적인 컨텍스트 창의 확장 한계를 효과적으로 극복합니다. OpenAI의 MLE-Bench에서 24시간의 예산을 사용하여 평가한 결과, ML-Master 2.0은 56.44%의 최고 수준의 메달 획득률을 달성했습니다. 본 연구 결과는 초장기 자율성이 인간의 선례를 뛰어넘는 복잡성을 자율적으로 탐색할 수 있는 AI를 위한 확장 가능한 청사진을 제공함을 보여줍니다.

Original Abstract

The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (LLMs) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate sparse feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.

11 Citations

1 Influential

8.5 Altmetric

55.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!