2602.10450v1 Feb 11, 2026 cs.LG

산업 규모 최적화 모델링 벤치마크 구축

Constructing Industrial-Scale Optimization Modeling Benchmark

Zaiwen Wen

Citations: 11

h-index: 2

Zhong Li

Citations: 6

h-index: 2

Hongliang Lu

Peking University

Citations: 59

h-index: 3

Tao Wei

Citations: 5

h-index: 2

Wenyue Liu

Citations: 32

h-index: 3

Yuxuan Chen

Citations: 39

h-index: 2

Yuan Lan

Citations: 6

h-index: 2

Fan Zhang

Citations: 218

h-index: 6

최적화 모델링은 물류, 제조, 에너지, 금융 등 다양한 분야의 의사 결정에 중요한 역할을 하지만, 자연어 요구 사항을 정확한 최적화 수식과 솔버가 실행할 수 있는 코드로 변환하는 과정은 여전히 많은 노동력을 필요로 합니다. 대규모 언어 모델(LLM)이 이러한 작업에 활용될 수 있지만, 현재 평가 방법은 대부분 작은 규모 또는 합성 데이터 벤치마크에 의존하여, $10^3$에서 $10^6$개(또는 그 이상)의 변수와 제약 조건을 가진 실제 산업 문제의 어려움을 제대로 반영하지 못합니다. 주요 문제는 실제 최적화 모델을 기반으로 한 참조 수식/솔버 코드와 자연어 설명을 연결하는 벤치마크의 부족입니다. 이러한 격차를 해소하기 위해, 우리는 실제 MIPLIB 2017에 포함된 혼합 정수 선형 프로그램(MIP)을 기반으로 구조를 고려한 역방향 구성 방법을 사용하여 MIPLIB-NL을 구축했습니다. 우리의 파이프라인은 (i) 평탄한 솔버 수식에서 간결하고 재사용 가능한 모델 구조를 추출하고, (ii) 통일된 모델-데이터 분리 형식을 사용하여 추출된 구조에 명시적으로 연결된 자연어 설명을 역방향으로 생성하며, (iii) 전문가 검토 및 인간-LLM 상호 작용을 통한 반복적인 의미 검증과 독립적인 재구현 검사를 수행합니다. 이를 통해 원래 인스턴스의 수학적 내용을 보존하면서도 현실적인 자연어-최적화 평가를 가능하게 하는 223개의 일대일 재구현을 얻었습니다. 실험 결과, 기존 벤치마크에서 높은 성능을 보이는 시스템들이 MIPLIB-NL에서는 상당한 성능 저하를 보이는 것으로 나타났으며, 이는 작은 규모에서는 보이지 않는 시스템의 오류 모드를 드러냅니다.

Original Abstract

Optimization modeling underpins decision-making in logistics, manufacturing, energy, and finance, yet translating natural-language requirements into correct optimization formulations and solver-executable code remains labor-intensive. Although large language models (LLMs) have been explored for this task, evaluation is still dominated by toy-sized or synthetic benchmarks, masking the difficulty of industrial problems with $10^{3}$--$10^{6}$ (or more) variables and constraints. A key bottleneck is the lack of benchmarks that align natural-language specifications with reference formulations/solver code grounded in real optimization models. To fill in this gap, we introduce MIPLIB-NL, built via a structure-aware reverse construction methodology from real mixed-integer linear programs in MIPLIB~2017. Our pipeline (i) recovers compact, reusable model structure from flat solver formulations, (ii) reverse-generates natural-language specifications explicitly tied to this recovered structure under a unified model--data separation format, and (iii) performs iterative semantic validation through expert review and human--LLM interaction with independent reconstruction checks. This yields 223 one-to-one reconstructions that preserve the mathematical content of the original instances while enabling realistic natural-language-to-optimization evaluation. Experiments show substantial performance degradation on MIPLIB-NL for systems that perform strongly on existing benchmarks, exposing failure modes invisible at toy scale.

3 Citations

0 Influential

3 Altmetric

18.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!