2601.22662v1 Jan 30, 2026 cs.AI

의사결정 지원을 위한 적응형 의사결정 경로 기반의 과업 인식형 LLM 협의체

Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support

Wei Zhu

Citations: 20

h-index: 3

Lixing Yu

Citations: 6

h-index: 2

Zhiwen Tang

Citations: 7

h-index: 1

Kun Yue

Citations: 7

h-index: 1

Hao-Ren Yao

Citations: 36

h-index: 3

대규모 언어 모델(LLM)은 다양한 의사결정 과업 전반에서 강력한 성능을 보여주었습니다. 그러나 기존 접근 방식들은 사용 가능한 모델 간의 전문화 차이를 종종 간과하며, 과업의 특성과 관계없이 모든 LLM을 일률적으로 적용 가능한 것으로 취급합니다. 이는 다양한 추론 요구사항과 과업 복잡도에 적응하는 능력을 제한합니다. 본 연구에서는 동적 전문가 선택과 효율적인 다단계 계획을 가능하게 하기 위해 LLM 협의체와 몬테카를로 트리 탐색(MCTS)을 통합한 과업 적응형 의사결정 프레임워크인 TALC(Task-Aware LLM Council)를 제안합니다. 각 LLM은 이전 과업 궤적에서 도출된 구조화된 성공 메모리 프로파일을 갖추고 있어, 현재의 추론 문맥과 과거 성공 사례 간의 의미론적 매칭이 가능합니다. 각 의사결정 시점에서 TALC는 문맥상 가장 적절한 모델로 제어권을 넘기고, 모델 기반 평가와 과거 효용 점수를 융합한 이중 신호 메커니즘을 사용하여 노드 가치를 추정합니다. 이러한 신호들은 노드 내 분산에 따라 적응적으로 가중치가 부여되어 MCTS 선택을 유도하며, 시스템이 탐색 깊이와 계획 신뢰도 사이의 균형을 맞출 수 있게 합니다. WebShop, HumanEval, Game of 24에 대한 실험 결과, TALC는 강력한 베이스라인들과 비교하여 우수한 과업 성공률과 향상된 탐색 효율성을 달성하였으며, 이를 통해 전문화 인식 라우팅과 적응형 계획의 이점을 입증하였습니다.

Original Abstract

Large language models (LLMs) have shown strong capabilities across diverse decision-making tasks. However, existing approaches often overlook the specialization differences among available models, treating all LLMs as uniformly applicable regardless of task characteristics. This limits their ability to adapt to varying reasoning demands and task complexities. In this work, we propose Task-Aware LLM Council (TALC), a task-adaptive decision framework that integrates a council of LLMs with Monte Carlo Tree Search (MCTS) to enable dynamic expert selection and efficient multi-step planning. Each LLM is equipped with a structured success memory profile derived from prior task trajectories, enabling semantic matching between current reasoning context and past successes. At each decision point, TALC routes control to the most contextually appropriate model and estimates node value using a dual-signal mechanism that fuses model-based evaluations with historical utility scores. These signals are adaptively weighted based on intra-node variance and used to guide MCTS selection, allowing the system to balance exploration depth with planning confidence. Experiments on WebShop, HumanEval, and the Game of 24 demonstrate that TALC achieves superior task success rates and improved search efficiency compared to strong baselines, validating the benefits of specialization-aware routing and adaptive planning.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!