2602.06706v1 Feb 06, 2026 cs.LG

SaDiT: 잠재 구조 토큰화 및 디퓨전 트랜스포머를 이용한 효율적인 단백질 뼈대 설계

SaDiT: Efficient Protein Backbone Design via Latent Structural Tokenization and Diffusion Transformers

Citations: 1,710

h-index: 22

Citations: 37

h-index: 4

새로운 단백질 구조를 생성하는 de novo 단백질 뼈대 설계 분야에서 생성 모델은 놀라운 성공을 거두었습니다. 그러나 이러한 디퓨전 기반 접근 방식은 여전히 계산 집약적이며, 대규모 구조 탐색에 필요한 속도보다 느립니다. 최근 Proteina와 같은 연구에서는 샘플링 효율성을 향상시키기 위해 플로우 매칭을 도입했지만, 단백질 분야에서 구조 압축 및 가속화를 위한 토큰화의 잠재력은 아직 충분히 탐구되지 않았습니다. 본 연구에서는 SaProt 토큰화를 디퓨전 트랜스포머(DiT) 아키텍처와 통합하여 단백질 뼈대 생성을 가속화하는 새로운 프레임워크인 SaDiT를 제시합니다. SaDiT는 이산적인 잠재 공간을 사용하여 단백질 기하학적 구조를 표현함으로써, 생성 과정의 복잡성을 크게 줄이면서 이론적으로 SE(3) 등가성을 유지합니다. 또한, 효율성을 더욱 향상시키기 위해, 반복적인 샘플링 과정에서 계산된 토큰 상태를 재사용하여 Invariant Point Attention (IPA) 레이어를 최적화하는 IPA 토큰 캐시 메커니즘을 도입했습니다. 실험 결과는 SaDiT가 RFDiffusion 및 Proteina를 포함한 최첨단 모델보다 계산 속도 및 구조적 타당성 측면에서 우수한 성능을 보임을 보여줍니다. 본 모델은 조건 없는 뼈대 생성 및 폴드 클래스 조건부 생성 작업에서 평가되었으며, SaDiT는 높은 설계 가능성을 가진 복잡한 위상학적 특징을 효과적으로 포착하는 데 뛰어난 능력을 보여주었습니다.

Original Abstract

Generative models for de novo protein backbone design have achieved remarkable success in creating novel protein structures. However, these diffusion-based approaches remain computationally intensive and slower than desired for large-scale structural exploration. While recent efforts like Proteina have introduced flow-matching to improve sampling efficiency, the potential of tokenization for structural compression and acceleration remains largely unexplored in the protein domain. In this work, we present SaDiT, a novel framework that accelerates protein backbone generation by integrating SaProt Tokenization with a Diffusion Transformer (DiT) architecture. SaDiT leverages a discrete latent space to represent protein geometry, significantly reducing the complexity of the generation process while maintaining theoretical SE(3) equivalence. To further enhance efficiency, we introduce an IPA Token Cache mechanism that optimizes the Invariant Point Attention (IPA) layers by reusing computed token states during iterative sampling. Experimental results demonstrate that SaDiT outperforms state-of-the-art models, including RFDiffusion and Proteina, in both computational speed and structural viability. We evaluate our model across unconditional backbone generation and fold-class conditional generation tasks, where SaDiT shows superior ability to capture complex topological features with high designability.

0 Citations

0 Influential

11 Altmetric

55.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!