2603.01692v1 Mar 02, 2026 cs.LG

추론을 그래디언트로: 트리 검색을 넘어선 최대 가능 추정(MLE) 에이전트 확장

Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

Jian Wang

Citations: 13

h-index: 2

Xu Yang

Citations: 75

h-index: 4

Jiang Bian

Citations: 179

h-index: 6

Bowen Xian

Citations: 21

h-index: 2

Yifei Zhang

Citations: 131

h-index: 5

Qizheng Li

Citations: 38

h-index: 3

Xiao Yang

Citations: 271

h-index: 6

Weiqing Liu

Citations: 1,719

h-index: 18

Shikai Fang

Citations: 64

h-index: 4

Jingyuan Li

Citations: 21

h-index: 2

Mingrui Xu

Citations: 13

h-index: 2

머신러닝 엔지니어링(MLE)을 위한 LLM 기반 에이전트는 주로 트리 검색 방식을 사용하는데, 이는 스칼라 검증 점수를 사용하여 후보를 평가하는 그래디언트 기반 최적화 방식이 아닙니다. LLM의 추론 능력이 향상됨에 따라, 체계적인 업데이트는 정확한 그래디언트가 무작위 탐색을 통해 효율적인 하강을 가능하게 하는 것과 마찬가지로, 완전한 열거 방식보다 효율성이 높아집니다. 본 논문에서는 그래디언트 기반 최적화를 구현한 MLE 에이전트인 extsc{Gome}을 소개합니다. extsc{Gome}은 구조화된 진단 추론을 그래디언트 계산으로, 성공 기억을 모멘텀으로, 그리고 다중 추적 실행을 분산 최적화로 연결합니다. 아키텍처 효과를 외부 지식으로부터 분리하는 폐쇄형 환경에서, extsc{Gome}은 단일 V100 GPU에서 12시간이라는 제한된 시간 내에 MLE-Bench에서 최고 수준인 35.1%의 메달 획득률을 달성했습니다. 10개의 모델에 대한 확장 실험 결과, 성능이 낮은 모델의 경우 트리 검색이 신뢰할 수 없는 추론을 보완하는 방식으로 여전히 장점을 가지지만, 추론 능력이 향상될수록 그래디언트 기반 최적화가 점진적으로 더 뛰어난 성능을 보이며, 특히 최첨단 모델에서 이러한 성능 차이가 더욱 두드러집니다. LLM의 추론 능력 향상이 빠르게 진행됨에 따라, 그래디언트 기반 최적화는 점점 더 선호되는 패러다임으로 자리 잡을 것으로 예상됩니다. 저희는 코드와 GPT-5 추적 데이터를 공개합니다.

Original Abstract

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable efficient descent over random search. We introduce \textsc{Gome}, an MLE agent that operationalizes gradient-based optimization. \textsc{Gome} maps structured diagnostic reasoning to gradient computation, success memory to momentum, and multi-trace execution to distributed optimization. Under a closed-world protocol that isolates architectural effects from external knowledge, \textsc{Gome} achieves a state-of-the-art 35.1\% any-medal rate on MLE-Bench with a restricted 12-hour budget on a single V100 GPU. Scaling experiments across 10 models reveal a critical crossover: with weaker models, tree search retains advantages by compensating for unreliable reasoning through exhaustive exploration; as reasoning capability strengthens, gradient-based optimization progressively outperforms, with the gap widening at frontier-tier models. Given the rapid advancement of reasoning-oriented LLMs, this positions gradient-based optimization as an increasingly favorable paradigm. We release our codebase and GPT-5 traces.

2 Citations

0 Influential

9 Altmetric

47.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!