2601.17687v2 Jan 25, 2026 cs.LG

에이전트 기반 강화 학습이 차세대 화학 언어 모델을 혁신하여 분자 설계 및 합성을 가능하게 한다

Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis

Zijing Liu

Citations: 321

h-index: 7

He Cao

Citations: 58

h-index: 3

Bing Feng

Citations: 33

h-index: 4

Hao Li

Citations: 187

h-index: 7

Shen-Hui Peng

Citations: 2

h-index: 1

Yu Wang

Citations: 50

h-index: 5

Zhiyuan Yan

Citations: 473

h-index: 7

Yonghong Tian

Citations: 939

h-index: 9

Yu Li

Citations: 83

h-index: 4

Li Yuan

Citations: 47

h-index: 2

언어 모델은 생화학 분야에 혁명을 일으키고 있으며, 과학자들이 신약 설계 및 화학 합성을 높은 효율로 수행하도록 돕고 있습니다. 그러나 현재의 접근 방식은 환각 현상에 취약하고 지식 보존 능력이 제한적인 소형 언어 모델과, 개인 정보 위험 및 높은 추론 비용으로 인해 어려움을 겪는 대규모 클라우드 기반 언어 모델 사이에서 어려움을 겪습니다. 이러한 격차를 해소하기 위해, 우리는 화학적 추론과 지식 저장 기능을 분리하는 에이전트 기반 강화 학습을 활용하는 새로운 프레임워크인 ChemCRAFT를 소개합니다. 당사의 접근 방식은 모델이 방대한 화학 데이터를 암기하도록 강제하는 대신, 언어 모델이 정확한 정보 검색을 위한 샌드박스와 상호 작용할 수 있도록 지원합니다. 이러한 지식 외부화 덕분에, 로컬에서 배포 가능한 소형 모델이 최소한의 추론 비용으로 우수한 성능을 달성할 수 있습니다. 소형 언어 모델이 에이전트 호출 기능을 갖도록 하기 위해, 우리는 에이전트 경로 생성 파이프라인과 포괄적인 화학 에이전트 샌드박스를 구축했습니다. 샌드박스와의 상호 작용을 기반으로, 우리는 최초의 대규모 화학 도구 경로 데이터셋인 ChemToolDataset을 구축했습니다. 동시에, 우리는 모델이 화학 에이전트를 호출하는 능력을 향상시키는 밀집 화학 보상 함수인 SMILES-GRPO를 제안합니다. 약물 설계의 다양한 측면에 대한 평가 결과, ChemCRAFT는 분자 구조 분석, 분자 최적화 및 합성 경로 예측에서 현재의 클라우드 기반 LLM보다 우수한 성능을 보이는 것으로 나타났습니다. 이는 과학적 추론이 모델 크기에서 비롯되는 단순한 우연적인 능력이라기보다는, 도구 조율의 학습 가능한 정책이라는 것을 보여줍니다. 본 연구는 AI 지원 화학을 위한 비용 효율적이고 개인 정보 보호 기능을 갖춘 새로운 패러다임을 제시하며, 로컬에서 배포 가능한 에이전트를 활용하여 분자 발견을 가속화할 수 있는 새로운 길을 열어줍니다. 코드: https://github.com/HowardLi1984/ChemCraft

Original Abstract

Language models are revolutionizing the biochemistry domain, assisting scientists in drug design and chemical synthesis with high efficiency. Yet current approaches struggle between small language models prone to hallucination and limited knowledge retention, and large cloud-based language models plagued by privacy risks and high inference costs. To bridge this gap, we introduce ChemCRAFT, a novel framework leveraging agentic reinforcement learning to decouple chemical reasoning from knowledge storage. Instead of forcing the model to memorize vast chemical data, our approach empowers the language model to interact with a sandbox for precise information retrieval. This externalization of knowledge allows a locally deployable small model to achieve superior performance with minimal inference costs. To enable small language models for agent-calling ability, we build an agentic trajectory construction pipeline and a comprehensive chemical-agent sandbox. Based on sandbox interactions, we constructed ChemToolDataset, the first large-scale chemical tool trajectory dataset. Simultaneously, we propose SMILES-GRPO to build a dense chemical reward function, promoting the model's ability to call chemical agents. Evaluations across diverse aspects of drug design show that ChemCRAFT outperforms current cloud-based LLMs in molecular structure analysis, molecular optimization, and synthesis pathway prediction, demonstrating that scientific reasoning is not solely an emergent ability of model scale, but a learnable policy of tool orchestration. This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry, opening new avenues for accelerating molecular discovery with locally deployable agents. Code available at https://github.com/HowardLi1984/ChemCraft.

2 Citations

0 Influential

35.486122886681 Altmetric

179.4 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!