2603.01667v1 Mar 02, 2026 cs.AI

컨텍스트 연쇄 학습: 다중 작업 차량 경로 문제에 대한 동적 제약 조건 이해

Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs

Zhiguang Cao

Citations: 5

h-index: 1

Shuangchun Gui

Citations: 72

h-index: 3

Suyu Liu

Citations: 27

h-index: 2

Xuehe Wang

Citations: 32

h-index: 3

다중 작업 차량 경로 문제(VRP)는 다양한 제약 조건을 만족시키면서 경로 비용을 최소화하는 것을 목표로 합니다. 기존 솔버는 일반적으로 다양한 작업에 걸쳐 일반화 가능한 패턴을 학습하기 위해 통일된 강화 학습(RL) 프레임워크를 채택합니다. 그러나 이러한 솔버는 의사 결정 과정에서 제약 조건 및 노드 역학을 간과하는 경향이 있어, 모델이 현재 컨텍스트에 정확하게 반응하지 못하는 경우가 발생합니다. 이러한 한계를 해결하기 위해, 우리는 점진적으로 변화하는 컨텍스트를 파악하여 정밀한 노드 적응을 유도하는 새로운 프레임워크인 컨텍스트 연쇄 학습(CCL)을 제안합니다. 구체적으로, CCL은 관련성 기반 컨텍스트 재구성(RGCR) 모듈을 통해 단계별 컨텍스트 정보를 구성하며, 이 모듈은 중요한 제약 조건을 적응적으로 우선시합니다. 이 컨텍스트는 이후 궤적 공유 노드 재임베딩(TSNR) 모듈을 통해 노드 업데이트를 안내하며, 이 모듈은 모든 궤적의 컨텍스트에서 공유되는 노드 특징을 집계하여 이를 사용하여 다음 단계의 입력값을 업데이트합니다. CCL은 강화 학습 에이전트의 변화하는 선호도를 모델링함으로써 순차적 의사 결정에서 단계별 의존성을 포착합니다. 우리는 CCL을 48개의 다양한 VRP 변형, 즉 16개의 동일 분포 작업과 32개의 이질 분포 작업(새로운 제약 조건 포함)에 대해 평가했습니다. 실험 결과는 CCL이 최첨단 기준 모델보다 우수한 성능을 보이며, 모든 동일 분포 작업과 대부분의 이질 분포 작업에서 최상의 성능을 달성했음을 보여줍니다.

Original Abstract

Multi-task Vehicle Routing Problems (VRPs) aim to minimize routing costs while satisfying diverse constraints. Existing solvers typically adopt a unified reinforcement learning (RL) framework to learn generalizable patterns across tasks. However, they often overlook the constraint and node dynamics during the decision process, making the model fail to accurately react to the current context. To address this limitation, we propose Chain-of-Context Learning (CCL), a novel framework that progressively captures the evolving context to guide fine-grained node adaptation. Specifically, CCL constructs step-wise contextual information via a Relevance-Guided Context Reformulation (RGCR) module, which adaptively prioritizes salient constraints. This context then guides node updates through a Trajectory-Shared Node Re-embedding (TSNR) module, which aggregates shared node features from all trajectories' contexts and uses them to update inputs for the next step. By modeling evolving preferences of the RL agent, CCL captures step-by-step dependencies in sequential decision-making. We evaluate CCL on 48 diverse VRP variants, including 16 in-distribution and 32 out-of-distribution (with unseen constraints) tasks. Experimental results show that CCL performs favorably against the state-of-the-art baselines, achieving the best performance on all in-distribution tasks and the majority of out-of-distribution tasks.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!