2601.01665v1 Jan 04, 2026 cs.LG

다중 목적 최적화를 위한 신경망 조합 최적화에서 적대적 인스턴스 생성 및 강건성 훈련

Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives

Yaoxin Wu

Citations: 80

h-index: 4

Yingqian Zhang

Citations: 501

h-index: 11

Wei Liu

Citations: 0

h-index: 0

Thomas Bäck

Citations: 3

h-index: 1

Yingjie Fan

Citations: 0

h-index: 0

심층 강화 학습(DRL)은 다중 목적 조합 최적화 문제(MOCOP)를 해결하는 데 큰 잠재력을 보여주었습니다. 그러나 이러한 학습 기반 솔버의 강건성은, 특히 다양한 복잡한 문제 분포에서, 충분히 연구되지 않았습니다. 본 논문에서는 MOCOP를 위한 선호도 기반 DRL 솔버의 강건성을 향상시키는 통합 프레임워크를 제안합니다. 이 프레임워크 내에서, 솔버의 약점을 드러내는 어려운 인스턴스를 생성하기 위한 선호도 기반 적대적 공격을 개발하고, 생성된 인스턴스가 파레토 최적해 품질에 미치는 영향을 정량화합니다. 또한, 과적합을 방지하고 일반화 성능을 향상시키기 위해, 선호도 선택 과정에 난이도 정보를 통합하는 방어 전략을 도입하여 적대적 훈련을 수행합니다. 다중 목적 여행하는 외판원 문제(MOTSP), 다중 목적 용량 제한 차량 경로 문제(MOCVRP), 다중 목적 배낭 문제(MOKP)에 대한 실험 결과는, 제안하는 공격 방법이 다양한 솔버에 대해 어려운 인스턴스를 효과적으로 학습한다는 것을 입증합니다. 또한, 제안하는 방어 방법은 신경망 솔버의 강건성과 일반화 성능을 크게 향상시켜, 어려운 또는 분포 외의 인스턴스에 대해 우수한 성능을 제공합니다.

Original Abstract

Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based solvers has remained insufficiently explored, especially across diverse and complex problem distributions. In this paper, we propose a unified robustness-oriented framework for preference-conditioned DRL solvers for MOCOPs. Within this framework, we develop a preference-based adversarial attack to generate hard instances that expose solver weaknesses, and quantify the attack impact by the resulting degradation on Pareto-front quality. We further introduce a defense strategy that integrates hardness-aware preference selection into adversarial training to reduce overfitting to restricted preference regions and improve out-of-distribution performance. The experimental results on multi-objective traveling salesman problem (MOTSP), multi-objective capacitated vehicle routing problem (MOCVRP), and multi-objective knapsack problem (MOKP) verify that our attack method successfully learns hard instances for different solvers. Furthermore, our defense method significantly strengthens the robustness and generalizability of neural solvers, delivering superior performance on hard or out-of-distribution instances.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!