2602.19536v1 Feb 23, 2026 cs.CV

Fore-Mamba3D: 3D 객체 탐지를 위한 맘바(Mamba) 기반 전경(foreground) 강화 인코딩

Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection

Zhiwei Ning

Citations: 19

h-index: 2

Xuanang Gao

Citations: 15

h-index: 2

Xinzhong Zhu

Citations: 674

h-index: 9

Wei Liu

Citations: 446

h-index: 9

Runze Yang

Citations: 38

h-index: 3

Jiaxi Cao

Citations: 0

h-index: 0

Huiying Xu

Citations: 1,060

h-index: 15

Jie Yang

Citations: 83

h-index: 4

맘바(Mamba)와 같은 선형 모델링 방법은 3D 객체 탐지 작업에서 효과적인 기반 모델로 활용되어 왔습니다. 그러나 기존의 맘바 기반 방법들은 전체 비어있지 않은 픽셀 시퀀스에 대해 양방향 인코딩을 사용하는데, 이는 장면 내 풍부한 불필요한 배경 정보를 포함합니다. 전경 픽셀만을 직접 인코딩하는 것이 합리적인 해결책처럼 보이지만, 실제로는 탐지 성능을 저하시키는 경향이 있습니다. 우리는 이를 선형 모델링에서 전경 시퀀스에 대한 응답 감쇠 및 제한적인 문맥 표현 때문으로 판단했습니다. 이러한 문제를 해결하기 위해, 맘바 기반 인코더를 수정하여 전경 강화를 목표로 하는 새로운 기반 모델인 Fore-Mamba3D를 제안합니다. 먼저, 예측된 점수를 기반으로 전경 픽셀을 샘플링합니다. 서로 다른 인스턴스 간의 전경 픽셀 상호 작용에서 발생하는 응답 감쇠를 고려하여, 지역 분할된 정보를 전체 시퀀스로 전파하는 지역-전역 슬라이드 윈도우(RGSW)를 설계했습니다. 또한, 맘바 모델 내에서 의미론적 인식과 기하학적 인식을 강화하여 문맥 표현을 풍부하게 하기 위한 의미 기반 및 상태 공간 융합 모듈(SASFMamba)을 제안합니다. 우리의 방법은 전경 인코딩에 중점을 두고 선형 자기 회귀 모델의 거리 기반 및 인과적 의존성을 완화합니다. 다양한 벤치마크에서 우수한 성능을 보이는 결과는 3D 객체 탐지 작업에서 Fore-Mamba3D의 효과를 입증합니다.

Original Abstract

Linear modeling methods like Mamba have been merged as the effective backbone for the 3D object detection task. However, previous Mamba-based methods utilize the bidirectional encoding for the whole non-empty voxel sequence, which contains abundant useless background information in the scenes. Though directly encoding foreground voxels appears to be a plausible solution, it tends to degrade detection performance. We attribute this to the response attenuation and restricted context representation in the linear modeling for fore-only sequences. To address this problem, we propose a novel backbone, termed Fore-Mamba3D, to focus on the foreground enhancement by modifying Mamba-based encoder. The foreground voxels are first sampled according to the predicted scores. Considering the response attenuation existing in the interaction of foreground voxels across different instances, we design a regional-to-global slide window (RGSW) to propagate the information from regional split to the entire sequence. Furthermore, a semantic-assisted and state spatial fusion module (SASFMamba) is proposed to enrich contextual representation by enhancing semantic and geometric awareness within the Mamba model. Our method emphasizes foreground-only encoding and alleviates the distance-based and causal dependencies in the linear autoregression model. The superior performance across various benchmarks demonstrates the effectiveness of Fore-Mamba3D in the 3D object detection task.

0 Citations

0 Influential

7.5 Altmetric

37.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!