2603.21508v1 Mar 23, 2026 cs.LG

사용자 행동 시퀀스를 활용한 온디바이스 모델 추론을 위한 특징 추출 최적화

Optimizing Feature Extraction for On-device Model Inference with User Behavior Sequences

Guihai Chen

Citations: 52

h-index: 4

Zhenzhe Zheng

Citations: 1,950

h-index: 26

Fan Wu

Citations: 43

h-index: 4

Chen Gong

Citations: 73

h-index: 4

Yiliu Chen

Citations: 186

h-index: 2

Shengjie Wang

Citations: 24

h-index: 2

머신러닝 모델은 사용자 행동을 분석하고 개인화된 서비스를 제공하기 위해 현대 모바일 앱에 널리 통합되고 있습니다. 고품질 사용자 경험을 유지하기 위해서는 낮은 지연 시간으로 온디바이스 모델 실행이 매우 중요합니다. 기존 연구에서는 주로 주어진 입력 특징을 사용하여 모델 추론 속도를 높이는 데 초점을 맞추었지만, 본 연구에서는 실제 온디바이스 모델 실행 파이프라인에서 간과되는 병목 현상인, 원시 애플리케이션 로그로부터 입력 특징을 추출하는 과정에 주목합니다. 본 연구에서는 다양한 모델 특징 및 연속적인 모델 추론 과정에서 중복되는 추출 작업을 분석하고 제거하여 특징 추출 최적화라는 새로운 방향을 제시합니다. 또한, 모델 추론 정확도를 저하시키지 않으면서 온디바이스 특징 추출 프로세스를 가속화하도록 설계된 자동화된 특징 추출 엔진인 AutoFeature를 소개합니다. AutoFeature는 세 가지 핵심 디자인으로 구성됩니다. (1) 다양한 입력 특징의 추출 워크플로우를 하나의 방향성 비순환 그래프로 표현하는 그래프 추상화, (2) 그래프 내의 다양한 특징에서 중복되는 연산 노드를 식별하고 병합하는 그래프 최적화, (3) 연속적인 모델 추론 과정에서 중복되는 원시 데이터에 대한 연산을 최소화하는 효율적인 캐싱입니다. AutoFeature의 시스템 프로토타입을 구현하고 검색, 비디오, 전자상거래 분야의 5가지 산업용 모바일 서비스에 통합했습니다. 온라인 평가 결과, AutoFeature는 주간 동안 전체 온디바이스 모델 실행 지연 시간을 1.33배에서 3.93배, 야간에는 1.43배에서 4.53배 단축시키는 것을 확인했습니다.

Original Abstract

Machine learning models are widely integrated into modern mobile apps to analyze user behaviors and deliver personalized services. Ensuring low-latency on-device model execution is critical for maintaining high-quality user experiences. While prior research has primarily focused on accelerating model inference with given input features, we identify an overlooked bottleneck in real-world on-device model execution pipelines: extracting input features from raw application logs. In this work, we explore a new direction of feature extraction optimization by analyzing and eliminating redundant extraction operations across different model features and consecutive model inferences. We then introduce AutoFeature, an automated feature extraction engine designed to accelerate on-device feature extraction process without compromising model inference accuracy. AutoFeature comprises three core designs: (1) graph abstraction to formulate the extraction workflows of different input features as one directed acyclic graph, (2) graph optimization to identify and fuse redundant operation nodes across different features within the graph; (3) efficient caching to minimize operations on overlapping raw data between consecutive model inferences. We implement a system prototype of AutoFeature and integrate it into five industrial mobile services spanning search, video and e-commerce domains. Online evaluations show that AutoFeature reduces end-to-end on-device model execution latency by 1.33x-3.93x during daytime and 1.43x-4.53x at night.

0 Citations

0 Influential

13 Altmetric

65.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!