2603.19097v1 Mar 19, 2026 cs.CL

DaPT: 다중 경로 프레임워크 - 다국어 다중 홉 질문 응답

DaPT: A Dual-Path Framework for Multilingual Multi-hop Question Answering

Jingbo Zhu

Citations: 515

h-index: 11

Ziming Zhu

Citations: 10

h-index: 2

Qiaozhi He

Citations: 58

h-index: 5

Yuchun Fan

Citations: 124

h-index: 6

Yilin Wang

Citations: 28

h-index: 2

Jiaoyang Li

Citations: 8

h-index: 1

Yongyu Mu

Citations: 220

h-index: 9

Tong Xiao

Citations: 14

h-index: 2

검색 증강 생성(RAG) 시스템은 영어 환경에서 복잡한 다중 홉 질문 응답(QA) 문제를 해결하는 데 상당한 발전을 이루었습니다. 그러나 RAG 시스템은 필연적으로 다국어 코퍼스 및 쿼리에 걸쳐 정보를 검색해야 하는 상황에 직면하며, 이로 인해 여러 가지 해결해야 할 과제가 남아 있습니다. 첫 번째 문제는 다국어 다중 홉(MM-hop) QA 환경에서 RAG 시스템의 성능을 평가하는 벤치마크가 부족하다는 점입니다. 두 번째 문제는 영어에서 LLM의 강력한 의미론적 이해에 대한 과도한 의존성이 다국어 환경에서의 효과를 저하시킨다는 점입니다. 이러한 문제점을 해결하기 위해, 우리는 먼저 영어 전용 벤치마크를 다섯 가지 언어로 번역하여 다국어 다중 홉 QA 벤치마크를 구축하고, 새로운 다국어 RAG 프레임워크인 DaPT를 제안합니다. DaPT는 원본 언어 쿼리와 영어 번역 쿼리에 대해 동시에 부분 질문 그래프를 생성한 다음, 이들을 병합하고, 이중 언어 검색 및 답변 전략을 사용하여 부분 질문을 순차적으로 해결합니다. 실험 결과, 고급 RAG 시스템은 다국어 환경에서 상당한 성능 불균형을 보이는 것으로 나타났습니다. 또한, 제안하는 방법은 기존 방식에 비해 더 정확하고 간결한 답변을 지속적으로 제공하며, 이 작업에서 RAG 성능을 크게 향상시킵니다. 예를 들어, 가장 어려운 MuSiQue 벤치마크에서 DaPT는 가장 강력한 기준 모델보다 평균 EM 점수가 18.3% 향상되었습니다.

Original Abstract

Retrieval-augmented generation (RAG) systems have made significant progress in solving complex multi-hop question answering (QA) tasks in the English scenario. However, RAG systems inevitably face the application scenario of retrieving across multilingual corpora and queries, leaving several open challenges. The first one involves the absence of benchmarks that assess RAG systems' capabilities under the multilingual multi-hop (MM-hop) QA setting. The second centers on the overreliance on LLMs' strong semantic understanding in English, which diminishes effectiveness in multilingual scenarios. To address these challenges, we first construct multilingual multi-hop QA benchmarks by translating English-only benchmarks into five languages, and then we propose DaPT, a novel multilingual RAG framework. DaPT generates sub-question graphs in parallel for both the source-language query and its English translation counterpart, then merges them before employing a bilingual retrieval-and-answer strategy to sequentially solve sub-questions. Our experimental results demonstrate that advanced RAG systems suffer from a significant performance imbalance in multilingual scenarios. Furthermore, our proposed method consistently yields more accurate and concise answers compared to the baselines, significantly enhancing RAG performance on this task. For instance, on the most challenging MuSiQue benchmark, DaPT achieves a relative improvement of 18.3\% in average EM score over the strongest baseline.

0 Citations

0 Influential

5.5 Altmetric

27.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!