2602.00780v1 Jan 31, 2026 cs.AI

비전-언어-행동 모델을 위한 인터리브 추론 오케스트레이션을 결합한 환경 인식형 적응형 가지치기

Environment-Aware Adaptive Pruning with Interleaved Inference Orchestration for Vision-Language-Action Models

Yuting Huang

Citations: 27

h-index: 2

Yanyong Zhang

Citations: 1,940

h-index: 14

Leilei Ding

Citations: 47

h-index: 3

Zhipeng Tang

Citations: 35

h-index: 3

Jiajun Deng

Citations: 134

h-index: 8

Xinrui Lin

Citations: 50

h-index: 4

Haojie Ren

Citations: 43

h-index: 3

Jianmin Ji

Citations: 483

h-index: 12

Zenghua Zhu

Citations: 1

h-index: 1

Shuo Liu

Citations: 168

h-index: 4

비전-언어-행동(VLA) 모델은 체화된 지능(embodied intelligence) 분야에서 유망하지만, 방대한 파라미터 수로 인한 상당한 추론 지연이 실시간 조작을 저해하여 파라미터 희소화의 필요성이 대두되고 있다. 그러나 VLA 실행 중 환경이 변화함에 따라 최적의 희소성 패턴도 그에 맞춰 변화한다. 정적 가지치기는 환경의 역동성에 필요한 적응성이 부족한 반면, 고정 간격 동적 레이어 가지치기는 입도가 거칠고 재학습 오버헤드가 높다는 단점이 있다. 이러한 격차를 해소하기 위해, 우리는 기존 VLA 가속화 방법과 직교적으로 결합할 수 있는, 훈련이 필요 없는 플러그 앤 플레이 방식의 적응형 가지치기 프레임워크인 EcoVLA를 제안한다. EcoVLA는 환경 인식형 적응형 가지치기(EAP)와 인터리브 추론 오케스트레이션(I^2O)의 두 가지 구성 요소로 이루어져 있다. EAP는 물리적 환경의 시간적 일관성을 반영하여 희소성 패턴을 갱신하는 경량 적응형 채널 가지치기 방법이다. I^2O는 VLA 추론에 내재된 FLOPs 버블을 활용하여 가지치기 방법을 병렬로 스케줄링함으로써 지연 시간에 미치는 영향을 최소화한다. 다양한 VLA 모델과 벤치마크에서 평가한 결과, EcoVLA는 성공률을 단 0.4% 희생하면서 최대 1.60배의 속도 향상을 달성하여 최첨단 성능을 기록했으며, 토큰 가지치기와 결합 시 0.5%의 성능 저하만으로 2.18배의 속도 향상을 달성했다. 또한, 우리는 실제 로봇 환경에서 EcoVLA의 유효성을 검증했다.

Original Abstract

While Vision-Language-Action (VLA) models hold promise in embodied intelligence, their large parameter counts lead to substantial inference latency that hinders real-time manipulation, motivating parameter sparsification. However, as the environment evolves during VLA execution, the optimal sparsity patterns change accordingly. Static pruning lacks the adaptability required for environment dynamics, whereas fixed-interval dynamic layer pruning suffers from coarse granularity and high retraining overheads. To bridge this gap, we propose EcoVLA, a training-free, plug-and-play adaptive pruning framework that supports orthogonal combination with existing VLA acceleration methods. EcoVLA comprises two components: Environment-aware Adaptive Pruning (EAP) and Interleaved Inference Orchestration ($I^2O$). EAP is a lightweight adaptive channel pruning method that incorporates the temporal consistency of the physical environment to update sparsity patterns. $I^2O$ leverages the FLOPs bubbles inherent in VLA inference to schedule the pruning method in parallel, ensuring negligible impact on latency. Evaluated on diverse VLA models and benchmarks, EcoVLA delivers state-of-the-art performance, achieving up to 1.60$\times$ speedup with only a 0.4% drop in success rate, and further reaches 2.18$\times$ speedup with only a 0.5% degradation when combined with token pruning. We further validate the effectiveness of EcoVLA on real-world robots.

1 Citations

0 Influential

7 Altmetric

36.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!