2604.12502v1 Apr 14, 2026 cs.CV

SEATrack: 간단하고 효율적인 적응형 다중 모드 추적기

SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker

Weiming Hu

Citations: 28

h-index: 2

Ziteng Xue

Citations: 31

h-index: 3

Shihui Zhang

Citations: 12

h-index: 1

Zhipeng Zhang

Citations: 25

h-index: 3

Junbin Su

Citations: 2

h-index: 1

Kun Chen

Citations: 9

h-index: 2

다중 모드 추적에서 파라미터 효율적인 미세 조정(PEFT)은 최근 성능 향상이 종종 늘어난 파라미터 수를 동반하여 PEFT의 효율성 약속을 근본적으로 훼손하는 우려스러운 경향을 보여줍니다. 본 연구에서는 이러한 성능-효율성 딜레마를 해결하기 위해 간단하고 효율적인 적응형 두 스트림 다중 모드 추적기인 SEATrack을 소개합니다. 우리는 먼저 일치하는 응답의 모드 간 정렬을 우선시하는데, 이는 간과되었지만 중요한 요소이며, 이러한 요소를 통해 성능과 효율성 간의 균형을 맞추는 데 필수적이라고 주장합니다. 특히, 기존의 두 스트림 방법에서 나타나는 모드별 편향은 충돌하는 일치하는 어텐션 맵을 생성하여 효과적인 공동 표현 학습을 방해한다는 것을 확인했습니다. 이를 완화하기 위해, 도메인 적응을 위한 저랭크 적응(LoRA)과 동적으로 어텐션 맵을 개선하고 정렬하기 위한 적응형 상호 안내(AMG)를 원활하게 통합한 AMG-LoRA를 제안합니다. 또한, 우리는 기존의 로컬 퓨전 접근 방식에서 벗어나 효율적인 전역 관계 모델링을 가능하게 하는 계층적 앙상블 전문가(HMoE)를 도입하여, 다중 모드 퓨전에서 표현력과 계산 효율성 간의 균형을 효과적으로 맞춥니다. 이러한 혁신을 통해 SEATrack은 RGB-T, RGB-D 및 RGB-E 추적 작업에서 최첨단 방법보다 성능과 효율성 측면에서 상당한 발전을 이루었습니다. (소스 코드: [유효하지 않은 URL 삭제됨])

Original Abstract

Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream multimodal tracker that tackles this performance-efficiency dilemma from two complementary perspectives. We first prioritize cross-modal alignment of matching responses, an underexplored yet pivotal factor that we argue is essential for breaking the trade-off. Specifically, we observe that modality-specific biases in existing two-stream methods generate conflicting matching attention maps, thereby hindering effective joint representation learning. To mitigate this, we propose AMG-LoRA, which seamlessly integrates Low-Rank Adaptation (LoRA) for domain adaptation with Adaptive Mutual Guidance (AMG) to dynamically refine and align attention maps across modalities. We then depart from conventional local fusion approaches by introducing a Hierarchical Mixture of Experts (HMoE) that enables efficient global relation modeling, effectively balancing expressiveness and computational efficiency in cross-modal fusion. Equipped with these innovations, SEATrack advances notable progress over state-of-the-art methods in balancing performance with efficiency across RGB-T, RGB-D, and RGB-E tracking tasks. \href{https://github.com/AutoLab-SAI-SJTU/SEATrack}{\textcolor{cyan}{Code is available}}.

1 Citations

0 Influential

30.45879734614 Altmetric

153.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!