2601.04698v1 Jan 08, 2026 cs.AI

TourPlanner: 여행 계획을 위한 제약 조건 게이트 강화 학습 기반의 경쟁적 합의 프레임워크

TourPlanner: A Competitive Consensus Framework with Constraint-Gated Reinforcement Learning for Travel Planning

Mining Tan

Citations: 20

h-index: 2

Wenxiang Jiao

Citations: 96

h-index: 4

Xiaoxi Li

Citations: 23

h-index: 2

Yuan Lu

Citations: 84

h-index: 4

Yinuo Wang

Citations: 35

h-index: 4

Hao Wang

Citations: 6

h-index: 2

Xuanyu Zhang

Citations: 152

h-index: 4

Weiming Dong

Citations: 5

h-index: 1

여행 계획은 다면적인 정보를 종합하여 일정을 구성해야 하는 정교한 의사 결정 과정입니다. 그러나 기존의 여행 계획 접근 방식은 (1) 높은 재현율을 유지하면서 후보 관심 지점(POI)을 선별하는 문제, (2) 단일 추론 경로로 인해 여행 계획을 위한 실행 가능한 해 공간 내에서의 탐색 능력이 제한되는 문제, (3) 경성 제약 조건과 연성 제약 조건을 동시에 최적화하는 어려움 등 여러 과제에 직면해 있습니다. 이러한 문제를 해결하기 위해, 본 논문에서는 다중 경로 추론과 제약 조건 게이트 강화 학습을 특징으로 하는 포괄적인 프레임워크인 TourPlanner를 제안합니다. 구체적으로, 먼저 공간 정보를 고려한 후보 POI 집합을 구성하기 위해 개인화된 재현 및 공간 최적화(PReSO) 워크플로를 도입합니다. 이어서 실행 가능한 해 공간 탐색 능력을 향상시키는 다중 경로 추론 패러다임인 경쟁적 합의 생각의 사슬(CCoT)을 제안합니다. 또한 계획을 더욱 정교하게 만들기 위해 강화 학습 단계에 시그모이드 기반 게이팅 메커니즘을 통합하여, 경성 제약 조건이 충족된 후에만 연성 제약 조건 만족에 동적으로 우선순위를 부여합니다. 여행 계획 벤치마크에 대한 실험 결과, TourPlanner는 실행 가능성과 사용자 선호도 일치 측면 모두에서 기존 방법을 크게 능가하며 최고 수준의 성능을 달성함을 입증했습니다.

Original Abstract

Travel planning is a sophisticated decision-making process that requires synthesizing multifaceted information to construct itineraries. However, existing travel planning approaches face several challenges: (1) Pruning candidate points of interest (POIs) while maintaining a high recall rate; (2) A single reasoning path restricts the exploration capability within the feasible solution space for travel planning; (3) Simultaneously optimizing hard constraints and soft constraints remains a significant difficulty. To address these challenges, we propose TourPlanner, a comprehensive framework featuring multi-path reasoning and constraint-gated reinforcement learning. Specifically, we first introduce a Personalized Recall and Spatial Optimization (PReSO) workflow to construct spatially-aware candidate POIs' set. Subsequently, we propose Competitive consensus Chain-of-Thought (CCoT), a multi-path reasoning paradigm that improves the ability of exploring the feasible solution space. To further refine the plan, we integrate a sigmoid-based gating mechanism into the reinforcement learning stage, which dynamically prioritizes soft-constraint satisfaction only after hard constraints are met. Experimental results on travel planning benchmarks demonstrate that TourPlanner achieves state-of-the-art performance, significantly surpassing existing methods in both feasibility and user-preference alignment.

4 Citations

0 Influential

2 Altmetric

14.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!