2604.24086v1 Apr 27, 2026 cs.RO

AsyncShield: 비동기 클라우드 기반 VLA 네비게이션을 위한 플러그 앤 플레이 엣지 어댑터

AsyncShield: A Plug-and-Play Edge Adapter for Asynchronous Cloud-based VLA Navigation

Zedong Chu

Citations: 91

h-index: 7

Shichao Xie

Citations: 153

h-index: 7

Xiaolong Wu

Citations: 134

h-index: 6

Yanfen Shen

Citations: 22

h-index: 2

Zhengbo Wang

Citations: 14

h-index: 2

Yingnan Guo

Citations: 19

h-index: 2

Mu Xu

Citations: 82

h-index: 6

Kai Yang

Citations: 306

h-index: 6

Xing Li

Citations: 23

h-index: 2

비전-언어-액션(VLA) 모델은 로봇 제어에 강력한 제로샷 일반화 능력을 보여주지만, 일반적으로 거대한 파라미터 크기로 인해 클라우드 기반 배포가 필요합니다. 그러나 클라우드 배포는 네트워크 지터와 추론 지연을 유발하며, 이는 지속적인 이동 환경에서 발생하는 공간-시간 불일치를 초래할 수 있습니다. 과거 프레임에서 표현된 의도가 현재 프레임에서 공간적으로 부정확해져 충돌을 일으킬 수 있습니다. 이러한 문제를 해결하기 위해, 우리는 플러그 앤 플레이 방식의 비동기 제어 프레임워크인 AsyncShield를 제안합니다. AsyncShield는 기존의 블랙박스 시계열 예측 방식을 버리고, 결정적인 물리 기반의 화이트박스 공간 매핑 방식을 사용합니다. 시스템은 시간별 자세 버퍼를 유지하고 운동학적 변환을 활용하여, 시간 지연을 공간적 자세 오프셋으로 정확하게 변환하여 VLA의 원래 기하학적 의도를 복원합니다. 의도 복원의 정확성과 물리적 안전성을 균형 있게 유지하기 위해, 엣지 어댑테이션은 제약 조건이 있는 마르코프 결정 프로세스(CMDP)로 공식화되었습니다. PPO-Lagrangian 알고리즘을 통해 해결된 강화 학습 어댑터는 VLA 의도를 추적하는 것과 LiDAR 기반의 고주파 장애물 회피 제약 조건에 대응하는 것 사이에서 동적으로 균형을 맞춥니다. 또한, 표준화된 범용 하위 목표 인터페이스, 도메인 랜덤화 및 충돌 반경 팽창을 통한 인지 수준 어댑테이션을 통해 AsyncShield는 경량의 플러그 앤 플레이 모듈로 작동합니다. 시뮬레이션 및 실제 환경 실험 결과, 클라우드 기반의 기본 모델을 미세 조정하지 않고도 AsyncShield가 제로샷 및 강력한 일반화 능력을 보여주며, 비동기 네비게이션의 성공률과 물리적 안전성을 효과적으로 향상시키는 것을 확인했습니다.

Original Abstract

While Vision-Language-Action (VLA) models have been demonstrated possessing strong zero-shot generalization for robot control, their massive parameter sizes typically necessitate cloud-based deployment. However, cloud deployment introduces network jitter and inference latency, which can induce severe spatiotemporal misalignment in mobile navigation under continuous displacement, so that the stale intents expressed in past ego frames may become spatially incorrect in the current frame and lead to collisions. To address this issue, we propose AsyncShield, a plug-and-play asynchronous control framework. AsyncShield discards traditional black-box time-series prediction in favor of a deterministic physical white-box spatial mapping. By maintaining a temporal pose buffer and utilizing kinematic transformations, the system accurately converts temporal lag into spatial pose offsets to restore the VLA's original geometric intent. To balance intent restoration fidelity and physical safety, the edge adaptation is formulated as a constrained Markov decision process (CMDP). Solved via the PPO-Lagrangian algorithm, a reinforcement learning adapter dynamically trades off between tracking the VLA intent and responding to high-frequency LiDAR obstacle avoidance hard constraints. Furthermore, benefiting from a standardized universal sub-goal interface, domain randomization, and perception-level adaptation via Collision Radius Inflation, AsyncShield operates as a lightweight, plug-and-play module. Simulation and real-world experiments demonstrate that, without fine-tuning any cloud-based foundation models, the framework exhibits zero-shot and robust generalization capabilities, effectively improving the success rate and physical safety of asynchronous navigation.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!