2603.16423v1 Mar 17, 2026 cs.CV

SF-Mamba: 비전 분야를 위한 상태 공간 모델 재고

SF-Mamba: Rethinking State Space Model for Vision

Wei-Yao Wang

Citations: 3

h-index: 1

Masakazu Yoshimura

Citations: 112

h-index: 6

Teruaki Hayashi

Citations: 30

h-index: 3

Yukiko Hoshino

Citations: 14

h-index: 3

Takeshi Ohashi

Citations: 7

h-index: 1

최근 몇 년 동안 비전 분야에서 Mamba 기술은 2차 복잡성을 갖는 Vision Transformer (ViT)의 대안으로 발전해 왔습니다. Mamba의 순환 스캔 메커니즘은 계산 효율성을 제공하지만, 이미지 패치 간의 비인과적 상호 작용을 본질적으로 제한합니다. 이전 연구에서는 다양한 멀티-스캔 전략을 통해 이러한 제한을 해결하려고 시도했지만, 이러한 접근 방식은 최적화되지 않은 스캔 설계와 빈번한 데이터 재정렬로 인해 비효율적인 문제가 있습니다. 또한, Mamba는 일반적으로 비전 작업에서 사용되는 짧은 토큰 길이에 대해 상대적으로 느린 계산 속도를 보입니다. 진정으로 효율적인 비전 인코더를 개발하기 위해, 우리는 비전 분야에서 스캔 연산과 Mamba의 계산 효율성을 재고합니다. 이를 위해, 우리는 양방향 정보 흐름을 단방향 스캔 하에 구현하기 위한 보조 패치 스와핑과, 향상된 GPU 병렬화를 위한 배치 폴딩 및 주기적인 상태 초기화를 특징으로 하는 새로운 비전 Mamba 모델인 SF-Mamba를 제안합니다. 이미지 분류, 객체 탐지, 인스턴스 및 의미 분할에 대한 광범위한 실험 결과, 제안된 SF-Mamba 모델이 다양한 모델 크기에서 처리량을 향상시키면서 최첨단 모델을 크게 능가하는 것으로 나타났습니다. 출판 후 소스 코드를 공개할 예정입니다.

Original Abstract

The realm of Mamba for vision has been advanced in recent years to strike for the alternatives of Vision Transformers (ViTs) that suffer from the quadratic complexity. While the recurrent scanning mechanism of Mamba offers computational efficiency, it inherently limits non-causal interactions between image patches. Prior works have attempted to address this limitation through various multi-scan strategies; however, these approaches suffer from inefficiencies due to suboptimal scan designs and frequent data rearrangement. Moreover, Mamba exhibits relatively slow computational speed under short token lengths, commonly used in visual tasks. In pursuit of a truly efficient vision encoder, we rethink the scan operation for vision and the computational efficiency of Mamba. To this end, we propose SF-Mamba, a novel visual Mamba with two key proposals: auxiliary patch swapping for encoding bidirectional information flow under an unidirectional scan and batch folding with periodic state reset for advanced GPU parallelism. Extensive experiments on image classification, object detection, and instance and semantic segmentation consistently demonstrate that our proposed SF-Mamba significantly outperforms state-of-the-art baselines while improving throughput across different model sizes. We will release the source code after publication.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!