2602.09080v1 Feb 09, 2026 cs.LG

앞으로 나아가기 위해 되돌아보기: 효율적이고 유연한 대규모 다중 모드 모델을 위한 재귀적 트랜스포머

Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models

Ruihan Xu

Citations: 13

h-index: 3

Yuting Gao

Citations: 818

h-index: 12

Lan Wang

Citations: 4

h-index: 1

Jianing Li

Citations: 150

h-index: 2

Weihao Chen

Citations: 43

h-index: 3

Qingpei Guo

Citations: 181

h-index: 7

Ming Yang

Citations: 125

h-index: 6

Shiliang Zhang

Citations: 0

h-index: 0

대규모 다중 모드 모델(LMM)은 시각-언어 작업에서 놀라운 성공을 거두었지만, 종종 훈련 및 추론 과정에서 막대한 파라미터 수가 충분히 활용되지 않습니다. 본 연구에서는 '앞으로 나아가기 위해 되돌아보기'라는 개념을 바탕으로, 모델 크기를 늘리지 않고 재귀적 정제를 통해 더욱 강력한 다중 모드 표현을 추출하기 위해 모델 파라미터를 재사용합니다. 본 연구에서는 LMM에 특화된 재귀적 트랜스포머 아키텍처인 RecursiveVLM을 제안합니다. 두 가지 핵심적인 혁신이 효과적인 재귀를 가능하게 합니다. (i) Recursive Connector는 중간 레이어의 은닉 상태를 융합하고 모달리티별 투영을 적용하여 재귀 단계 간의 특징을 정렬함으로써, 시각 및 언어 토큰의 뚜렷한 통계적 구조를 존중합니다. (ii) Monotonic Recursion Loss는 모든 단계를 감독하고 재귀 깊이에 따라 성능이 단조적으로 향상되도록 보장합니다. 이러한 설계는 재귀를 필요에 따른 정제 메커니즘으로 변환하여, 제한된 리소스 환경에서 적은 횟수의 반복으로 강력한 결과를 제공하고, 더 많은 컴퓨팅 리소스가 있을 때 점진적으로 출력을 향상시킵니다. 실험 결과, 표준 트랜스포머보다 +3%, 일반적인 재귀 모델보다 +7%의 성능 향상을 보여주며, 전략적인 재귀가 효율적이고 배포에 적합한 LMM을 개발하는 강력한 방법임을 입증합니다.

Original Abstract

Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this work, we embrace the idea of looping back to move forward: reusing model parameters through recursive refinement to extract stronger multimodal representations without increasing model size. We propose RecursiveVLM, a recursive Transformer architecture tailored for LMMs. Two key innovations enable effective looping: (i) a Recursive Connector that aligns features across recursion steps by fusing intermediate-layer hidden states and applying modality-specific projections, respecting the distinct statistical structures of vision and language tokens; (ii) a Monotonic Recursion Loss that supervises every step and guarantees performance improves monotonically with recursion depth. This design transforms recursion into an on-demand refinement mechanism: delivering strong results with few loops on resource-constrained devices and progressively improving outputs when more computation resources are available. Experiments show consistent gains of +3% over standard Transformers and +7% over vanilla recursive baselines, demonstrating that strategic looping is a powerful path toward efficient, deployment-adaptive LMMs.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!