2603.14792v1 Mar 16, 2026 cs.LG

LaPro-DTA: 잠재적 이중 관점 약물 표현과 중요한 단백질 특징 추출을 통한 일반화된 약물-표적 친화력 예측

LaPro-DTA: Latent Dual-View Drug Representations and Salient Protein Feature Extraction for Generalizable Drug--Target Affinity Prediction

Zihan Dun

Citations: 0

h-index: 0

Yining Qian

Citations: 59

h-index: 5

Liuyi Xu

Citations: 2

h-index: 1

An-Yang Lu

Citations: 25

h-index: 2

Shuang Li

Citations: 49

h-index: 4

약물-표적 친화력 예측은 신약 개발을 가속화하는 데 매우 중요하지만, 기존 방법들은 실제 환경에서의 새로운 약물, 표적 또는 쌍에 대한 예측 성능이 현저히 저하되는 경향이 있습니다. 이는 주로 학습 데이터에 과적합되는 문제와 관련 없는 표적 서열로부터 정보를 잃어버리는 데 기인합니다. 본 논문에서는 이러한 문제를 해결하기 위해, 강력하고 일반화된 약물-표적 친화력 예측을 가능하게 하는 프레임워크인 LaPro-DTA를 제안합니다. 과적합 문제를 해결하기 위해, 우리는 잠재적인 이중 관점 약물 표현 메커니즘을 설계했습니다. 이 메커니즘은 확률적 변동을 통해 미세한 부분 구조를 포착하는 인스턴스 레벨 관점과, 의미론적 재매핑을 통해 일반화된 화학적 골격을 추출하는 분포 레벨 관점을 결합하여, 모델이 특정 샘플을 암기하는 대신 전이 가능한 구조적 규칙을 학습하도록 유도합니다. 정보 손실 문제를 완화하기 위해, 패턴 인식 풀링을 사용한 중요한 단백질 특징 추출 전략을 도입하여 배경 노이즈를 효과적으로 제거하고 높은 반응성을 보이는 생체 활성 영역을 분리합니다. 또한, 정제된 특징들을 융합하여 포괄적인 상호 작용을 모델링하기 위한 다중 헤드 어텐션 메커니즘을 사용합니다. 벤치마크 데이터 세트에 대한 광범위한 실험 결과, LaPro-DTA는 최첨단 방법보다 현저히 우수한 성능을 보이며, 특히 어려운 새로운 약물 환경에서 Davis 데이터 세트에서 MSE를 8% 줄이는 결과를 얻었습니다. 또한, LaPro-DTA는 결합 메커니즘에 대한 해석 가능한 통찰력을 제공합니다.

Original Abstract

Drug--target affinity prediction is pivotal for accelerating drug discovery, yet existing methods suffer from significant performance degradation in realistic cold-start scenarios (unseen drugs/targets/pairs), primarily driven by overfitting to training instances and information loss from irrelevant target sequences. In this paper, we propose LaPro-DTA, a framework designed to achieve robust and generalizable DTA prediction. To tackle overfitting, we devise a latent dual-view drug representation mechanism. It synergizes an instance-level view to capture fine-grained substructures with stochastic perturbation and a distribution-level view to distill generalized chemical scaffolds via semantic remapping, thereby enforcing the model to learn transferable structural rules rather than memorizing specific samples. To mitigate information loss, we introduce a salient protein feature extraction strategy using pattern-aware top-$k$ pooling, which effectively filters background noise and isolates high-response bioactive regions. Furthermore, a cross-view multi-head attention mechanism fuses these purified features to model comprehensive interactions. Extensive experiments on benchmark datasets demonstrate that LaPro-DTA significantly outperforms state-of-the-art methods, achieving an 8\% MSE reduction on the Davis dataset in the challenging unseen-drug setting, while offering interpretable insights into binding mechanisms.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!