2603.01274v1 Mar 01, 2026 cs.LG

GlassMol: 개념 병목 모델을 활용한 해석 가능한 분자 특성 예측

GlassMol: Interpretable Molecular Property Prediction with Concept Bottleneck Models

O. Rivera

Citations: 3

h-index: 1

Ziqing Wang

Citations: 35

h-index: 4

Matthieu Dagommer

Citations: 8

h-index: 2

Abhishek Pandey

Citations: 14

h-index: 2

Kaize Ding

Citations: 137

h-index: 3

머신러닝은 분자 특성 예측을 가속화하지만, 최첨단 대규모 언어 모델(LLM)과 그래프 신경망은 종종 블랙박스로 작동합니다. 특히 안전이 중요한 약물 개발 분야에서 이러한 불투명성은 잘못된 상관관계를 숨기거나 인간 전문가의 지식을 배제할 위험이 있습니다. 기존의 해석 가능성 방법은 효과성과 신뢰성 간의 균형 문제를 안고 있습니다. 즉, 설명이 모델의 실제 추론을 제대로 반영하지 못하거나, 성능을 저하시키거나, 특정 분야에 대한 이해가 부족할 수 있습니다. 개념 병목 모델(CBM)은 입력을 인간이 이해할 수 있는 개념으로 투영한 후 결과를 도출함으로써, 설명이 의사 결정 과정에 내재적으로 충실하도록 하는 해결책을 제시합니다. 그러나 CBM을 화학 분야에 적용하는 데는 세 가지 과제가 있습니다. 첫째는 관련성 간극(큰 설명 공간에서 작업과 관련된 개념을 선택하는 문제), 둘째는 주석 간극(분자 데이터에 대한 개념 감독 정보를 얻는 문제), 셋째는 용량 간극(병목 제약으로 인한 성능 저하)입니다. 본 연구에서는 자동화된 개념 큐레이션 및 LLM 기반의 개념 선택을 통해 이러한 간극을 해결하는 모델에 국한되지 않는 CBM인 GlassMol을 소개합니다. 13개의 벤치마크를 대상으로 수행한 실험 결과, GlassMol은 일반적으로 블랙박스 모델의 성능과 유사하거나 그 이상을 나타내며, 이는 해석 가능성이 성능을 저하시키지 않으며, 일반적으로 가정되는 균형 관계에 도전한다는 것을 시사합니다. 코드 및 관련 자료는 다음 주소에서 확인할 수 있습니다: https://github.com/walleio/GlassMol.

Original Abstract

Machine learning accelerates molecular property prediction, yet state-of-the-art Large Language Models and Graph Neural Networks operate as black boxes. In drug discovery, where safety is critical, this opacity risks masking false correlations and excluding human expertise. Existing interpretability methods suffer from the effectiveness-trustworthiness trade-off: explanations may fail to reflect a model's true reasoning, degrade performance, or lack domain grounding. Concept Bottleneck Models (CBMs) offer a solution by projecting inputs to human-interpretable concepts before readout, ensuring that explanations are inherently faithful to the decision process. However, adapting CBMs to chemistry faces three challenges: the Relevance Gap (selecting task-relevant concepts from a large descriptor space), the Annotation Gap (obtaining concept supervision for molecular data), and the Capacity Gap (degrading performance due to bottleneck constraints). We introduce GlassMol, a model-agnostic CBM that addresses these gaps through automated concept curation and LLM-guided concept selection. Experiments across thirteen benchmarks demonstrate that \method generally matches or exceeds black-box baselines, suggesting that interpretability does not sacrifice performance and challenging the commonly assumed trade-off. Code is available at https://github.com/walleio/GlassMol.

1 Citations

0 Influential

22 Altmetric

111.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!