2602.07035v1 Feb 03, 2026 cs.AI

DLLM-Searcher: 확산 거대 언어 모델을 검색 에이전트에 적용

DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents

Jiahao Zhao

Citations: 424

h-index: 8

Shaoxuan Xu

Citations: 39

h-index: 4

ZhongXiang Sun

Renmin University of China

Citations: 1,266

h-index: 15

Feng Zhu

Citations: 72

h-index: 5

Jingyang Ou

Citations: 1,065

h-index: 4

Yuling Shi

Citations: 7

h-index: 2

Chongxuan Li

Citations: 1,814

h-index: 12

Xiao Zhang

Citations: 342

h-index: 8

Jun Xu

Citations: 361

h-index: 9

최근, 확산 거대 언어 모델(dLLMs)은 고유한 병렬 디코딩 메커니즘과 유연한 생성 패러다임을 통해 효율성 측면에서 독특한 장점을 보여주었습니다. 반면, 검색 에이전트의 빠른 발전에도 불구하고, 실제 적용에는 근본적인 제약이 존재합니다. 이러한 제약은 크게 두 가지로 나뉩니다. 첫째, ReAct 에이전트 패러다임에서 발생하는 다단계 추론, 도구 호출, 도구 응답 대기 과정으로 인한 심각한 지연(Latency Challenge)입니다. 직관적으로, dLLMs는 이러한 지연 문제를 해결하고 에이전트의 운영 효율성을 최적화하는 데 도움이 될 수 있습니다. 그러나 현재의 dLLM 기반 모델들은 실제로 2) 에이전트 능력(Agent Ability) 측면에서 심각한 한계를 가지고 있습니다. 즉, 기존 dLLMs는 추론 및 도구 호출 능력이 현저히 부족하여 이러한 장점이 실질적으로 구현되기 어렵습니다. 본 논문에서는 dLLM 기반 검색 에이전트를 위한 최적화 프레임워크인 DLLM-Searcher를 제안합니다. 에이전트 능력 문제를 해결하기 위해, 에이전트 중심의 지도 학습(Agentic SFT)과 에이전트 중심의 변동성 감소 선호 최적화(Agentic VRPO)를 포함하는 두 단계의 후속 학습 파이프라인을 설계하여, 기반 dLLM의 정보 탐색 및 추론 능력을 향상시켰습니다. 또한, 지연 문제를 완화하기 위해, dLLMs의 유연한 생성 메커니즘을 활용하여 병렬 추론 및 실행(Parallel-Reasoning and Acting, P-ReAct)이라는 새로운 에이전트 패러다임을 제안했습니다. P-ReAct는 모델이 도구 호출 명령을 우선적으로 디코딩하도록 유도하여, 모델이 도구 응답을 기다리는 동안에도 추론을 계속할 수 있도록 합니다. 실험 결과, DLLM-Searcher는 기존 LLM 기반 검색 에이전트와 유사한 성능을 달성했으며, P-ReAct는 약 15%의 추론 속도 향상을 보였습니다. 저희의 코드는 다음 링크에서 확인할 수 있습니다: https://anonymous.4open.science/r/DLLM-Searcher-553C

Original Abstract

Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Meanwhile, despite the rapid advancement of Search Agents, their practical deployment is constrained by a fundamental limitation, termed as 1) Latency Challenge: the serial execution of multi-round reasoning, tool calling, and tool response waiting under the ReAct agent paradigm induces severe end-to-end latency. Intuitively, dLLMs can leverage their distinctive strengths to optimize the operational efficiency of agents under the ReAct agent paradigm. Practically, existing dLLM backbones face the 2) Agent Ability Challenge. That is, existing dLLMs exhibit remarkably weak reasoning and tool-calling capabilities, preventing these advantages from being effectively realized in practice. In this paper, we propose DLLM-Searcher, an optimization framework for dLLM-based Search Agents. To solve the Agent Ability Challenge, we design a two-stage post-training pipeline encompassing Agentic Supervised Fine-Tuning (Agentic SFT) and Agentic Variance-Reduced Preference Optimization Agentic VRPO, which enhances the backbone dLLM's information seeking and reasoning capabilities. To mitigate the Latency Challenge, we leverage the flexible generation mechanism of dLLMs and propose a novel agent paradigm termed Parallel-Reasoning and Acting P-ReAct. P-ReAct guides the model to prioritize decoding tool_call instructions, thereby allowing the model to keep thinking while waiting for the tool's return. Experimental results demonstrate that DLLM-Searcher achieves performance comparable to mainstream LLM-based search agents and P-ReAct delivers approximately 15% inference acceleration. Our code is available at https://anonymous.4open.science/r/DLLM-Searcher-553C

4 Citations

1 Influential

7.5 Altmetric

43.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!