2604.12237v1 Apr 14, 2026 cs.LG

MolMem: 메모리 기반 강화 학습 에이전트를 활용한 효율적인 분자 최적화

MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization

Ziqing Wang

Citations: 35

h-index: 4

Abhishek Pandy

Citations: 2

h-index: 1

Kaize Ding

Citations: 5

h-index: 2

Yibo Wen

Citations: 28

h-index: 3

Han Liu

Citations: 20

h-index: 3

신약 개발 분야에서 분자 최적화는 리드 화합물의 분자 특성을 개선하면서 원래 분자와의 구조적 유사성을 유지하기 위해 반복적으로 개선하는 것을 목표로 합니다. 그러나 각 실험 단계는 비용이 많이 들기 때문에 제한된 실험 예산 하에서 기존 방법의 효율성이 중요한 과제입니다. 시행착오 방식은 많은 실험 단계를 필요로 하는 반면, 외부 지식을 활용하는 방법은 익숙한 템플릿을 반복적으로 사용하며 어려운 목표에 대한 성능이 저하되는 경향이 있습니다. 핵심적인 해결 과제는 의사 결정을 돕고 향후 최적화에 활용할 수 있는 재사용 가능한 통찰력을 제공하는 장기 기억 시스템입니다. 이를 해결하기 위해, 우리는 장기 기억 시스템을 갖춘 다단계 에이전트 기반 강화 학습 프레임워크인 MolMem (분자 최적화 + 기억)을 제안합니다. 구체적으로, MolMem은 초기 단계에서 관련 예제를 검색하는 데 사용되는 정적 예제 메모리와, 성공적인 경로를 재사용 가능한 전략으로 추출하는 데 사용되는 진화하는 기술 메모리를 사용합니다. 이러한 메모리 기반 접근 방식을 통해, 우리는 밀집된 단계별 보상을 사용하여 정책을 학습시키고, 비용이 많이 드는 실험 단계를 장기적인 지식으로 변환하여 향후 최적화를 개선합니다. 광범위한 실험 결과, MolMem은 단일 특성 최적화 작업에서 90%의 성공률(최고 성능의 기준 모델 대비 1.5배 향상)을, 다중 특성 최적화 작업에서는 52%의 성공률을 500번의 실험 단계를 사용하여 달성했습니다. 저희의 코드는 다음 링크에서 확인할 수 있습니다: https://github.com/REAL-Lab-NU/MolMem.

Original Abstract

In drug discovery, molecular optimization aims to iteratively refine a lead compound to improve molecular properties while preserving structural similarity to the original molecule. However, each oracle evaluation is expensive, making sample efficiency a key challenge for existing methods under a limited oracle budget. Trial-and-error approaches require many oracle calls, while methods that leverage external knowledge tend to reuse familiar templates and struggle on challenging objectives. A key missing piece is long-term memory that can ground decisions and provide reusable insights for future optimizations. To address this, we present MolMem (\textbf{Mol}ecular optimization with \textbf{Mem}ory), a multi-turn agentic reinforcement learning (RL) framework with a dual-memory system. Specifically, MolMem uses Static Exemplar Memory to retrieve relevant exemplars for cold-start grounding, and Evolving Skill Memory to distill successful trajectories into reusable strategies. Built on this memory-augmented formulation, we train the policy with dense step-wise rewards, turning costly rollouts into long-term knowledge that improves future optimization. Extensive experiments show that MolMem achieves 90\% success on single-property tasks (1.5$\times$ over the best baseline) and 52\% on multi-property tasks using only 500 oracle calls. Our code is available at https://github.com/REAL-Lab-NU/MolMem.

2 Citations

0 Influential

22 Altmetric

112.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!