2602.23610v1 Feb 27, 2026 cs.CL

현실적인 추론을 위한 LLM 기반 다중 턴 지향 대화 생성

LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning

Citations: 117

h-index: 7

Citations: 167

h-index: 3

대규모 언어 모델(LLM)의 추론 능력은 입력 정보를 분석, 추론하고 의사 결정을 내릴 수 있는 능력으로, 지능적인 지향 대화 시스템 구축에 필수적입니다. 그러나 기존의 벤치마크는 실제 시나리오의 복잡성을 충분히 반영하지 못하여, 실제 환경에서 LLM의 추론 능력을 평가하고 향상시키는 데 한계가 있습니다. 많은 기존 추론 데이터셋은 지나치게 단순하고 추상적이며, 실제 작업 흐름, 도메인 제약 조건 및 운영 규칙과 동떨어져 있어 LLM의 논리적 추론 능력을 효과적으로 평가하기 어렵습니다. 또한, 사전 학습 코퍼스에서 비롯된 데이터 오염은 평가 결과의 신뢰성을 저해하며, 기존의 데이터셋 구축 방식인 크라우드소싱은 노동 집약적이고 확장성이 낮다는 단점이 있습니다. 이러한 문제점을 해결하기 위해, 우리는 현실적인 추론 시나리오에 기반한 다중 턴 지향 대화를 생성하는 LLM 기반 프레임워크를 제안합니다. 이 프레임워크는 3단계 최적화를 활용하여 대화의 품질을 향상시킵니다. 우리의 방법은 실제 작업 시나리오에 기반하고, 실제 정보를 풍부하게 담고 있으며, 강력한 문맥 일관성을 보이는 대화를 생성합니다. 생성된 대화와 연관된 추론 작업은 신중하게 설계되고 반복적으로 개선되어 작업의 품질과 난이도를 지속적으로 향상시킵니다. 결과적으로 생성된 데이터셋은 LLM의 현실적인 논리적 추론 능력을 평가하고 발전시키는 데 유용한 벤치마크 역할을 합니다. 실험 결과는 우리가 생성한 데이터 기반 추론 작업이 비자명한 추론 과제를 제시하며, LLM의 추론 능력을 향상시키는 데 의미 있는 도움을 줄 수 있음을 보여줍니다.

Original Abstract

The reasoning capability of large language models (LLMs), defined as their ability to analyze, infer, and make decisions based on input information, is essential for building intelligent task-oriented dialogue systems. However, existing benchmarks do not sufficiently reflect the complexity of real-world scenarios, which limits their effectiveness in evaluating and enhancing LLM reasoning in practical contexts. Many current reasoning datasets are overly simplistic and abstract, often disconnected from realistic task flows, domain constraints, and operational rules, making it difficult to effectively evaluate LLMs' logical reasoning ability. In addition, data contamination from pretraining corpora undermines the reliability of evaluation results, and traditional crowdsourcing methods for dataset construction are labor-intensive and difficult to scale. To address these challenges, we propose a LLM-driven framework for synthesizing multi-turn, task-oriented dialogues grounded in realistic reasoning scenarios, leveraging trilevel optimization to enhance dialogue quality. Our method generates dialogues grounded in authentic task scenarios, enriched with real-world information, and exhibiting strong contextual coherence. Corresponding reasoning tasks are carefully designed around these dialogues and iteratively refined to continuously improve the tasks' quality and challenge. The resulting dataset serves as a valuable benchmark for assessing and advancing the realistic logical reasoning capabilities of LLMs. Experimental results show that our synthetic data-based reasoning tasks introduce non-trivial reasoning challenges and provide meaningful support for improving the reasoning capabilities of LLMs.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!