2604.17701v1 Apr 20, 2026 cs.IT

WISV: 무선 환경 정보를 활용한 의미 기반 검증 - 분산 추론 환경에서의 장치-엣지 LLM 추론을 위한 방법

WISV: Wireless-Informed Semantic Verification for Distributed Speculative Decoding in Device-Edge LLM Inference

Jiangchao Yao

Citations: 3,031

h-index: 23

Zixuan Liu

Citations: 90

h-index: 3

Zhiyong Chen

Citations: 329

h-index: 9

Nan Xue

Citations: 49

h-index: 3

Shengkang Chen

Citations: 1

h-index: 1

Meixia Tao

Citations: 512

h-index: 10

Wenjun Zhang

Citations: 95

h-index: 7

분산 환경에서의 장치-엣지 추론은 다양한 노드를 활용하여 자원 활용도를 높일 수 있지만, 기존의 토큰 단위 검증 방식은 성능 저하의 원인이 됩니다. 이러한 엄격한 정렬 방식은 불필요한 거부 현상을 유발하여 수락 가능한 시퀀스 길이를 줄이고, 불안정한 무선 통신 환경에서 통신 횟수를 증가시킵니다. 본 논문에서는 무선 환경 정보를 활용한 의미 기반 검증(WISV)이라는 새로운 분산 추론 프레임워크를 제안합니다. WISV는 채널 상태 정보를 고려한 의미 기반 수락 정책을 통해 엄격한 토큰 단위 일치를 넘어선 검증 방식을 제공합니다. WISV는 엣지 측 타겟 LLM에 경량화된 의사 결정 모듈을 통합하여, 고차원 은닉 표현과 즉각적인 채널 상태 정보를 합성하여 추론된 토큰을 동적으로 평가합니다. 검증 정확도와 통신 오버헤드 간의 균형을 최적화하기 위해, 우리는 두 가지 맞춤형 통신 프로토콜인 '전체 은닉 표현 업로드'와 '불일치 시 선택적 은닉 표현 업로드'를 설계했습니다. 10억 개의 파라미터를 가진 모델과 80억 개의 파라미터를 가진 모델을 사용한 광범위한 시뮬레이션 결과, WISV는 기존의 추론 방식에 비해 수락 가능한 시퀀스 길이가 최대 60.8% 증가하고, 통신 횟수가 37.3% 감소하며, 전체 지연 시간이 31.4% 개선되는 것을 확인했습니다. 이러한 성능 향상은 작업 정확도 저하가 미미한 수준(<1%)입니다. 마지막으로, NVIDIA Jetson AGX Orin과 A40 장착 서버로 구성된 하드웨어 테스트 환경에서 WISV를 검증하여, 엣지 환경에 배포된 LLM 추론을 가속화하는 데 있어 실질적인 효용성을 확인했습니다.

Original Abstract

While distributed device-edge speculative decoding enhances resource utilization across heterogeneous nodes, its performance is often bottlenecked by conventional token-level verification strategies. Such rigid alignment leads to excessive rejections, significantly diminishing the accepted sequence length and increasing interaction rounds under fluctuating wireless conditions. In this paper, we propose WISV (Wireless-Informed Semantic Verification), a novel distributed speculative decoding framework that goes beyond strict token-level matching via a channel-aware semantic acceptance policy. WISV integrates a lightweight decision head into the edge-side target LLM to dynamically evaluate speculative tokens by synthesizing high-dimensional hidden representations with instantaneous channel state information (CSI). To optimize the trade-off between verification fidelity and communication overhead, we further design two tailored communication protocols: full-hidden upload and mismatch-first selective-hidden upload. Extensive simulations using a 1B drafter and an 8B target model demonstrate that WISV achieves up to a 60.8% increase in accepted length, a 37.3% reduction in interaction rounds, and a 31.4% improvement in end-to-end latency compared to vanilla speculative decoding across tested settings, while maintaining a negligible task accuracy drop (<1%). Finally, we validate WISV on a hardware testbed comprising an NVIDIA Jetson AGX Orin and an A40-equipped server, confirming its real-world efficacy in accelerating edge-deployed LLM inference.

1 Citations

0 Influential

11.5 Altmetric

58.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!