2601.17050v1 Jan 21, 2026 cs.CV

내재적 개인정보 보호 기능을 갖춘 행동 지능을 위한 단일 픽셀 비전-언어 모델

Single-Pixel Vision-Language Model for Intrinsic Privacy-Preserving Behavioral Intelligence

Hongjun An

Citations: 95

h-index: 4

Xuelong Li

Citations: 11

h-index: 2

Yiliang Song

Citations: 6

h-index: 2

Jiawei Shao

Citations: 105

h-index: 4

Zhe Sun

Citations: 87

h-index: 5

괴롭힘, 스토킹 및 기타 불법 행위와 같은 부정적인 사회적 상호작용은 개인의 복지와 공공 안전에 심각한 위협이 되며, 신체적 및 정신 건강에 큰 영향을 미칩니다. 그러나 이러한 중요한 사건들은 종종 화장실이나 탈의실과 같이 개인 정보 보호가 중요한 환경에서 발생하며, 이러한 환경에서는 기존의 감시 시스템이 엄격한 개인 정보 보호 규제 및 윤리적 문제로 인해 금지되거나 심각하게 제한됩니다. 본 논문에서는 안전한 환경 모니터링을 위한 새로운 프레임워크인 단일 픽셀 비전-언어 모델(SP-VLM)을 제안합니다. SP-VLM은 본질적으로 낮은 차원의 단일 픽셀 방식으로 인간의 행동을 캡처하고, 비전과 언어의 원활한 통합을 통해 복잡한 행동 패턴을 추론함으로써 설계 단계에서부터 개인 정보 보호 기능을 내재적으로 구현합니다. 본 프레임워크를 기반으로, 단일 픽셀 센싱이 본질적으로 개인 식별 가능성을 억제하여 최첨단 얼굴 인식 시스템이 특정 샘플링 비율 이하에서 효과를 잃는다는 것을 보여줍니다. 또한, SP-VLM은 심각하게 손상된 단일 픽셀 데이터로부터 의미 있는 행동 의미론을 추출하여 강력한 이상 감지, 사람 수 세기 및 활동 이해를 가능하게 함을 보여줍니다. 이러한 결과를 종합적으로 분석한 결과, 행동 지능이 나타나는 동시에 개인의 신원이 강력하게 보호되는 실용적인 샘플링 비율 범위를 확인했습니다. 이러한 결과는 개인 정보 보호가 중요한 공간에서 침습적인 감시를 정당화하지 않고도 적절한 시기에 개입을 지원할 수 있는, 인권 존중적인 안전 모니터링 방식을 제시합니다.

Original Abstract

Adverse social interactions, such as bullying, harassment, and other illicit activities, pose significant threats to individual well-being and public safety, leaving profound impacts on physical and mental health. However, these critical events frequently occur in privacy-sensitive environments like restrooms, and changing rooms, where conventional surveillance is prohibited or severely restricted by stringent privacy regulations and ethical concerns. Here, we propose the Single-Pixel Vision-Language Model (SP-VLM), a novel framework that reimagines secure environmental monitoring. It achieves intrinsic privacy-by-design by capturing human dynamics through inherently low-dimensional single-pixel modalities and inferring complex behavioral patterns via seamless vision-language integration. Building on this framework, we demonstrate that single-pixel sensing intrinsically suppresses identity recoverability, rendering state-of-the-art face recognition systems ineffective below a critical sampling rate. We further show that SP-VLM can nonetheless extract meaningful behavioral semantics, enabling robust anomaly detection, people counting, and activity understanding from severely degraded single-pixel observations. Combining these findings, we identify a practical sampling-rate regime in which behavioral intelligence emerges while personal identity remains strongly protected. Together, these results point to a human-rights-aligned pathway for safety monitoring that can support timely intervention without normalizing intrusive surveillance in privacy-sensitive spaces.

2 Citations

0 Influential

2.5 Altmetric

14.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!