2602.00152v1 Jan 29, 2026 cs.CV

엣지 마이크로컨트롤러 기반 실시간 인간 활동 인식: 다중 스펙트럼 센서 융합을 통한 동적 계층적 추론

Real-Time Human Activity Recognition on Edge Microcontrollers: Dynamic Hierarchical Inference with Multi-Spectral Sensor Fusion

Boyu Li

Citations: 3

h-index: 1

Yonghui Wu

Citations: 3

h-index: 1

Kuang Zuo

Citations: 20

h-index: 1

Lincong Li

Citations: 11

h-index: 1

엣지 애플리케이션에서 정확한 온디바이스 패턴 인식에 대한 요구가 증가하고 있지만, 기존 접근 방식은 정확도와 계산 제약 사이의 균형을 맞추는 데 어려움을 겪고 있습니다. 이러한 문제를 해결하기 위해, 본 논문에서는 다중 스펙트럼 융합 및 해석 가능한 모듈을 기반으로 하는, 자원 효율적인 계층적 네트워크인 Hierarchical Parallel Pseudo-image Enhancement Fusion Network (HPPI-Net)를 제안합니다. HPPI-Net은 실시간 온디바이스 인간 활동 인식(HAR)을 위해 설계되었으며, ARM Cortex-M4 마이크로컨트롤러에 배포되어 저전력 실시간 추론을 수행합니다. 최적화 후 HPPI-Net은 22.3 KiB의 RAM과 439.5 KiB의 ROM을 사용하면서 96.70%의 정확도를 달성합니다. HPPI-Net은 두 개의 계층으로 구성됩니다. 첫 번째 계층은 Fast Fourier Transform (FFT) 스펙트로그램을 사용하여 초기 특징을 추출하고, 두 번째 계층은 정적인 활동 인식 전용 모듈 또는 동적 상태를 위한 병렬 LSTM-MobileNet 네트워크 (PLMN) 중 하나를 선택적으로 활성화합니다. PLMN은 세 개의 병렬 LSTM 인코더를 통해 FFT, 웨이블릿, 가보 스펙트로그램을 융합하고, Efficient Channel Attention (ECA) 및 Depthwise Separable Convolution (DSC)를 사용하여 연결된 특징을 정제합니다. 이를 통해 채널 수준의 해석 가능성을 제공하는 동시에 곱셈-누적 연산을 크게 줄입니다. MobileNetV3와 비교했을 때, HPPI-Net은 정확도를 1.22% 향상시키고 RAM 사용량을 71.2%, ROM 사용량을 42.1% 감소시켰습니다. 이러한 결과는 HPPI-Net이 우수한 정확도-효율 균형을 달성하며 설명 가능한 예측을 제공하여, 메모리 제약이 있는 엣지 플랫폼에서 웨어러블, 산업, 스마트 홈 HAR에 대한 실용적인 솔루션을 제시함을 보여줍니다.

Original Abstract

The demand for accurate on-device pattern recognition in edge applications is intensifying, yet existing approaches struggle to reconcile accuracy with computational constraints. To address this challenge, a resource-aware hierarchical network based on multi-spectral fusion and interpretable modules, namely the Hierarchical Parallel Pseudo-image Enhancement Fusion Network (HPPI-Net), is proposed for real-time, on-device Human Activity Recognition (HAR). Deployed on an ARM Cortex-M4 microcontroller for low-power real-time inference, HPPI-Net achieves 96.70% accuracy while utilizing only 22.3 KiB of RAM and 439.5 KiB of ROM after optimization. HPPI-Net employs a two-layer architecture. The first layer extracts preliminary features using Fast Fourier Transform (FFT) spectrograms, while the second layer selectively activates either a dedicated module for stationary activity recognition or a parallel LSTM-MobileNet network (PLMN) for dynamic states. PLMN fuses FFT, Wavelet, and Gabor spectrograms through three parallel LSTM encoders and refines the concatenated features using Efficient Channel Attention (ECA) and Depthwise Separable Convolution (DSC), thereby offering channel-level interpretability while substantially reducing multiply-accumulate operations. Compared with MobileNetV3, HPPI-Net improves accuracy by 1.22% and reduces RAM usage by 71.2% and ROM usage by 42.1%. These results demonstrate that HPPI-Net achieves a favorable accuracy-efficiency trade-off and provides explainable predictions, establishing a practical solution for wearable, industrial, and smart home HAR on memory-constrained edge platforms.

1 Citations

0 Influential

0.5 Altmetric

3.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!