2602.15836v1 Jan 12, 2026 cs.RO

EdgeNav-QE: QLoRA 양자화 및 동적 조기 종료를 이용한 엣지 장치 기반 LAM 네비게이션

EdgeNav-QE: QLoRA Quantization and Dynamic Early Exit for LAM-based Navigation on Edge Devices

Shanshan Huang

Citations: 58

h-index: 3

Mengyun Liu

Citations: 36

h-index: 3

Jian Jiang

Citations: 24

h-index: 3

대규모 행동 모델(LAM)은 고수준 추론과 저수준 제어를 연결하여 자율적인 네비게이션 분야에서 엄청난 잠재력을 보여주고 있습니다. 그러나 이러한 수십억 개의 파라미터를 가진 모델을 메모리 제약과 지연 시간 요구 사항으로 인해 엣지 장치에 배포하는 것은 여전히 중요한 과제입니다. 본 논문에서는 양자화된 저랭크 적응(QLoRA)과 동적 조기 종료(DEE) 메커니즘을 통합하여 LAM 모델을 실시간 엣지 네비게이션에 최적화하는 새로운 프레임워크인 EdgeNav-QE를 제안합니다. 백본을 4비트 정밀도로 양자화하고 전략적으로 조기 종료 지점을 배치함으로써, 모델은 간단한 네비게이션 작업의 경우 추론을 조기에 종료하면서 복잡한 의사 결정에는 전체 깊이를 유지할 수 있습니다. OpenVLA-7B 백본을 사용한 Matterport3D 데이터셋으로 구성된 Habitat-Sim 환경에서의 실험 결과는 EdgeNav-QE가 전체 정밀도 기준 모델에 비해 추론 지연 시간을 82.7% 줄이고 메모리 사용량을 66.7% 줄이는 동시에 81.8%의 네비게이션 성공률을 유지한다는 것을 보여줍니다. 또한, EdgeNav-QE는 최첨단 정적 조기 종료 방법보다 지연 시간 측면에서 17.9% 더 우수한 성능을 보여주며, 안전이 중요한 응용 분야에서 콘텐츠 인식적 적응 컴퓨팅의 우수성을 입증합니다.

Original Abstract

Large Action Models (LAMs) have shown immense potential in autonomous navigation by bridging high-level reasoning with low-level control. However, deploying these multi-billion parameter models on edge devices remains a significant challenge due to memory constraints and latency requirements. In this paper, we propose EdgeNav-QE, a novel framework that integrates Quantized Low-Rank Adaptation (QLoRA) with a dynamic early-exit (DEE) mechanism to optimize LAMs for real-time edge navigation. By quantizing the backbone to 4-bit precision and strategically placing early-exit branches, we enable the model to terminate inference early for simple navigation tasks while retaining full depth for complex decision-making. Experimental results on the Habitat-Sim environment with Matterport3D dataset using OpenVLA-7B backbone, demonstrate that EdgeNav-QE reduces inference latency by 82.7% and memory footprint by 66.7% compared to full-precision baselines, while maintaining 81.8% navigation success rate. Furthermore, it outperforms state-of-the-art static early-exit method by 17.9% in latency, demonstrating the superiority of content-aware adaptive computation for safety-critical applications.

1 Citations

0 Influential

1.5 Altmetric

8.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!