2601.07528v1 Jan 12, 2026 cs.CL

RAG에서 에이전트 기반 RAG으로: 신뢰성 있는 이슬람 질문 응답을 위한 접근 방식

From RAG to Agentic RAG for Faithful Islamic Question Answering

Firoj Alam

Citations: 22

h-index: 2

Gagan Bhatia

Citations: 221

h-index: 7

Hamdy Mubarak

Citations: 5,300

h-index: 37

Mustafa Jarrar

Citations: 317

h-index: 12

George Mikros

Citations: 58

h-index: 3

Fadi Zaraket

Citations: 18

h-index: 3

Mahmoud Alhirthani

Citations: 7

h-index: 2

Mutaz al-Khatib

Citations: 28

h-index: 3

Kareem Darwish

Citations: 201

h-index: 4

Rashid Yahiaoui

Citations: 89

h-index: 7

Logan Cochrane

Citations: 21

h-index: 3

LLM은 이슬람 질문 응답에 점점 더 많이 사용되고 있지만, 근거 없는 답변은 심각한 종교적 결과를 초래할 수 있습니다. 그러나 기존의 객관식/단답형 평가 방식은 실제 환경에서의 주요 실패 요인, 특히 자유 형식의 환각 현상과 모델이 증거가 부족할 때 적절하게 답변을 거부하는지 여부를 제대로 반영하지 못합니다. 이러한 측면에 대한 이해를 높이기 위해, 우리는 원자적이고 명확한 정답을 가진 3,810개의 이중 언어(아랍어/영어) 생성 벤치마크인 ISLAMICFAITHQA를 소개합니다. 이를 통해 환각 현상과 답변 거부 여부를 직접적으로 측정할 수 있습니다. 또한, 우리는 다음과 같은 구성 요소를 포함하는 완전한 이슬람 모델링 시스템을 개발했습니다. (i) 25,000개의 아랍어 텍스트 기반 SFT 추론 쌍, (ii) 보상 기반 정렬을 위한 5,000개의 이중 언어 선호도 샘플, (iii) 약 6,000개의 원자적 구절(ayat)로 구성된 구절 단위의 꾸란 검색 코퍼스. 이러한 자원을 기반으로, 우리는 구조화된 도구 호출을 사용하여 반복적인 증거 탐색 및 답변 수정을 수행하는 에이전트 기반 꾸란 기반 프레임워크(에이전트 기반 RAG)를 개발했습니다. 아랍어 중심 및 다국어 LLM에 대한 실험 결과, 검색이 정확도를 향상시키고, 에이전트 기반 RAG가 표준 RAG보다 더 큰 성능 향상을 가져온다는 것을 확인했습니다. 특히, 작은 모델(예: Qwen3 4B)에서도 최첨단 성능과 강력한 아랍어-영어 처리 능력을 달성했습니다. 실험에 사용된 자원과 데이터셋은 커뮤니티에 공개될 예정입니다.

Original Abstract

LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real-world failure modes, notably free-form hallucinations and whether models appropriately abstain when evidence is lacking. To shed a light on this aspect we introduce ISLAMICFAITHQA, a 3,810-item bilingual (Arabic/English) generative benchmark with atomic single-gold answers, which enables direct measurement of hallucination and abstention. We additionally developed an end-to-end grounded Islamic modelling suite consisting of (i) 25K Arabic text-grounded SFT reasoning pairs, (ii) 5K bilingual preference samples for reward-guided alignment, and (iii) a verse-level Qur'an retrieval corpus of $\sim$6k atomic verses (ayat). Building on these resources, we develop an agentic Quran-grounding framework (agentic RAG) that uses structured tool calls for iterative evidence seeking and answer revision. Experiments across Arabic-centric and multilingual LLMs show that retrieval improves correctness and that agentic RAG yields the largest gains beyond standard RAG, achieving state-of-the-art performance and stronger Arabic-English robustness even with a small model (i.e., Qwen3 4B). We will make the experimental resources and datasets publicly available for the community.

5 Citations

0 Influential

18.5 Altmetric

97.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!