2604.16804v1 Apr 18, 2026 cs.LG

AutoOR: 자연어 기반 운영 연구 문제를 자동화하는 LLM의 확장 가능한 후처리 학습 방법

AutoOR: Scalably Post-training LLMs to Autoformalize Operations Research Problems

Philip Torr

Citations: 34

h-index: 3

S. Motwani

Citations: 931

h-index: 8

Chuan Du

Citations: 4

h-index: 1

A. Petrov

Citations: 74

h-index: 3

Christopher Davis

Citations: 195

h-index: 5

Antonio R. Papania-Davis

Citations: 7

h-index: 1

Wei Yan

Citations: 1

h-index: 1

최적화 문제는 제조, 물류, 스케줄링 및 기타 산업 환경에서의 의사 결정에 핵심적인 역할을 합니다. 이러한 문제의 복잡한 설명을 솔버가 사용할 수 있는 형태로 변환하는 데는 전문적인 운영 연구(OR) 지식이 필요하며, 이는 확장성을 저해합니다. 본 연구에서는 AutoOR를 제안합니다. AutoOR는 자연어로 기술된 최적화 문제를 선형, 혼합 정수, 비선형 범주에 걸쳐 자동화하도록 LLM을 학습시키는 확장 가능한 합성 데이터 생성 및 강화 학습 파이프라인입니다. AutoOR는 표준 최적화 형태로 검증된 학습 데이터를 생성하고, 솔버 실행 피드백을 강화 학습의 보상 신호로 활용합니다. 80억 개의 파라미터를 가진 모델에 적용한 AutoOR는 6개의 확립된 OR 벤치마크에서 최첨단 또는 경쟁력 있는 결과를 달성했으며, 훨씬 더 큰 최첨단 모델에 필적하는 성능을 보였습니다. 물리적 동역학을 포함하는 비선형 문제의 경우, 최첨단 모델의 성능이 거의 0%에 가까운 상황에서, AutoOR는 제한된 초기 학습 데이터로부터 시작하여 점진적으로 학습하는 커리큘럼 강화 학습 전략을 도입하여 해당 문제 유형을 후처리 학습으로 해결 가능하게 했습니다. AutoOR와 같은 방법은 인공 지능을 활용한 산업 의사 결정을 크게 가속화할 수 있다고 믿습니다.

Original Abstract

Optimization problems are central to decision-making in manufacturing, logistics, scheduling, and other industrial settings. Translating complicated descriptions of these problems into solver-ready formulations requires specialized operations research (OR) expertise, making it hard to scale. We present AutoOR, a scalable synthetic data generation and reinforcement learning pipeline that trains LLMs to autoformalize optimization problems specified in natural language across linear, mixed-integer, and non-linear categories. AutoOR generates verified training data from standard optimization forms and uses solver execution feedback as the reward signal for RL post-training. AutoOR applied to an 8B model achieves state-of-the-art or competitive results across six established OR benchmarks, matching significantly larger frontier models. For a non-linear problem class involving physical dynamics, where frontier models score near 0%, we introduce a curriculum RL strategy that bootstraps from limited initial training data to make this class tractable for post-training. We believe that methods such as AutoOR can significantly accelerate industrial decision-making with AI.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!