2601.14446v1 Jan 20, 2026 cs.CE

블랙박스 최적화를 위한 확산 거대 언어 모델

Diffusion Large Language Models for Black-Box Optimization

Ye Yuan

Citations: 19

h-index: 1

Can Chen

Citations: 494

h-index: 8

Zipeng Sun

Citations: 20

h-index: 1

Dinghuai Zhang

Mila

Citations: 2,633

h-index: 25

C. Pal

Citations: 60

h-index: 3

Xue Liu

Citations: 2

h-index: 1

오프라인 블랙박스 최적화(BBO)는 오프라인 데이터셋에 있는 설계 정보와 해당 레이블만을 사용하여 최적의 설계를 찾는 것을 목표로 합니다. 이러한 시나리오는 DNA 서열 설계 및 로봇 공학과 같은 분야에서 자주 발생하며, 이 경우 레이블이 있는 데이터 포인트가 제한적입니다. 기존 방법은 일반적으로 작업별 프록시 모델 또는 생성 모델에 의존하며, 사전 학습된 거대 언어 모델(LLM)의 문맥 학습 능력을 간과합니다. 최근 연구에서는 작업 설명을 오프라인 데이터셋을 자연어 프롬프트로 구성하여 자기 회귀 LLM을 BBO에 적용하고, 이를 통해 직접 설계 생성을 가능하게 했습니다. 그러나 이러한 설계는 종종 양방향 의존성을 가지며, 이는 좌우 방향 모델이 제대로 파악하기 어렵습니다. 본 논문에서는 BBO를 위한 확산 LLM을 탐구하며, 이 모델의 양방향 모델링 및 반복적인 개선 능력을 활용합니다. 이를 위해, 작업 설명과 자연어로 구성된 오프라인 데이터셋을 조건으로 하는 노이즈 제거 모듈을 개발했습니다. 이 모듈은 확산 LLM에게 마스크된 설계를 개선된 후보로 변환하도록 지시합니다. 생성 과정을 고성능 설계 방향으로 유도하기 위해, 마스크된 확산 트리 검색(masked diffusion tree search)을 도입했습니다. 이는 노이즈 제거 과정을 단계별 몬테카를로 트리 검색으로 구성하며, 탐색과 활용 사이의 균형을 동적으로 조절합니다. 각 노드는 부분적으로 마스크된 설계를 나타내며, 각 노이즈 제거 단계는 액션에 해당합니다. 후보는 오프라인 데이터셋에 훈련된 가우스 프로세스를 사용하여 평가됩니다. 제안하는 방법인 dLLM은 디자인 벤치마크에서 소량 데이터 환경에서 최첨단 결과를 달성했습니다.

Original Abstract

Offline black-box optimization (BBO) aims to find optimal designs based solely on an offline dataset of designs and their labels. Such scenarios frequently arise in domains like DNA sequence design and robotics, where only a few labeled data points are available. Traditional methods typically rely on task-specific proxy or generative models, overlooking the in-context learning capabilities of pre-trained large language models (LLMs). Recent efforts have adapted autoregressive LLMs to BBO by framing task descriptions and offline datasets as natural language prompts, enabling direct design generation. However, these designs often contain bidirectional dependencies, which left-to-right models struggle to capture. In this paper, we explore diffusion LLMs for BBO, leveraging their bidirectional modeling and iterative refinement capabilities. This motivates our in-context denoising module: we condition the diffusion LLM on the task description and the offline dataset, both formatted in natural language, and prompt it to denoise masked designs into improved candidates. To guide the generation toward high-performing designs, we introduce masked diffusion tree search, which casts the denoising process as a step-wise Monte Carlo Tree Search that dynamically balances exploration and exploitation. Each node represents a partially masked design, each denoising step is an action, and candidates are evaluated via expected improvement under a Gaussian Process trained on the offline dataset. Our method, dLLM, achieves state-of-the-art results in few-shot settings on design-bench.

1 Citations

0 Influential

12.5 Altmetric

63.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!