2604.00986v2 Apr 01, 2026 cs.CR

스마트폰 사용 에이전트가 귀하의 개인 정보를 존중합니까?

Do Phone-Use Agents Respect Your Privacy?

Canwen Xu

Citations: 92

h-index: 4

Xidong Wang

Citations: 1,064

h-index: 11

Shunian Chen

Citations: 1,534

h-index: 13

Tongxu Luo

Citations: 501

h-index: 8

Zhengyang Tang

Citations: 663

h-index: 8

Ke Ji

Citations: 836

h-index: 9

Zihan Ye

Citations: 13

h-index: 2

Xinyu Wang

Citations: 0

h-index: 0

Yi Guo

Citations: 152

h-index: 4

Chenxing Li

Citations: 3

h-index: 1

Jingyuan Hu

Citations: 1

h-index: 1

Jiaxi Bi

Citations: 19

h-index: 1

Zeyu Qin

Citations: 94

h-index: 4

Shaobo Wang

Citations: 13

h-index: 1

X. Lai

Citations: 89

h-index: 2

Pengyuan Lyu

Citations: 3,522

h-index: 17

Junyi Li

Citations: 8

h-index: 2

Chengquan Zhang

Citations: 30

h-index: 4

Han Hu

Citations: 2

h-index: 1

Ming Yan

Citations: 16

h-index: 2

Benyou Wang

Citations: 656

h-index: 7

Ziniu Li

Citations: 325

h-index: 9

본 연구는 스마트폰 사용 에이전트가 무해한 모바일 작업을 수행하는 동안 개인 정보를 얼마나 잘 보호하는지를 조사합니다. 이 질문은 에이전트의 개인 정보 보호 행동에 대한 명확한 기준이 없고, 일반적인 앱이 에이전트가 실행 중에 어떤 데이터를 어떤 입력 필드에 입력하는지 정확하게 알려주지 않기 때문에 오랫동안 명확하게 답하기 어려웠습니다. 이 질문을 측정 가능하게 만들기 위해, 저희는 모바일 에이전트의 개인 정보 보호 행동을 평가하기 위한 검증 가능한 프레임워크인 MyPhoneBench를 소개합니다. 저희는 권한 부여 접근, 최소한의 정보 공개, 그리고 사용자가 제어하는 메모리라는 세 가지 요소를 포함하는 최소한의 개인 정보 보호 계약(iMy)을 통해 개인 정보 보호를 준수하는 스마트폰 사용을 정의하고, 불필요한 권한 요청, 기만적인 정보 재공개, 그리고 불필요한 양식 작성을 관찰하고 재현할 수 있도록 설계된 모의 앱과 규칙 기반 감사를 결합했습니다. 10개의 모바일 앱과 300개의 작업에 대한 5개의 최첨단 모델을 사용하여, 작업 성공, 개인 정보 보호를 준수하는 작업 완료, 그리고 이후 세션에서 저장된 설정 사용은 서로 다른 기능이며, 어떤 모델도 이 세 가지 기능을 모두 뛰어넘지 않는다는 것을 발견했습니다. 성공과 개인 정보 보호를 함께 평가하면 모델의 순위가 개별적인 지표만 사용할 때와 다르게 나타납니다. 모델 전반에 걸쳐 가장 흔하게 나타나는 실패 원인은 간단한 데이터 최소화입니다. 즉, 에이전트는 작업에 필요하지 않은 선택적인 개인 정보 항목을 여전히 입력합니다. 이러한 결과는 개인 정보 보호 실패가 무해한 작업의 지나치게 친절한 실행에서 비롯되며, 성공만을 기준으로 평가하는 것은 현재 스마트폰 사용 에이전트의 실제 배포 가능성을 과대평가한다는 것을 보여줍니다. 모든 코드, 모의 앱, 그리고 에이전트의 동작 경로는 다음 링크에서 공개적으로 이용할 수 있습니다: https://github.com/FreedomIntelligence/MyPhoneBench.

Original Abstract

We study whether phone-use agents respect privacy while completing benign mobile tasks. This question has remained hard to answer because privacy-compliant behavior is not operationalized for phone-use agents, and ordinary apps do not reveal exactly what data agents type into which form entries during execution. To make this question measurable, we introduce MyPhoneBench, a verifiable evaluation framework for privacy behavior in mobile agents. We operationalize privacy-respecting phone use as permissioned access, minimal disclosure, and user-controlled memory through a minimal privacy contract, iMy, and pair it with instrumented mock apps plus rule-based auditing that make unnecessary permission requests, deceptive re-disclosure, and unnecessary form filling observable and reproducible. Across five frontier models on 10 mobile apps and 300 tasks, we find that task success, privacy-compliant task completion, and later-session use of saved preferences are distinct capabilities, and no single model dominates all three. Evaluating success and privacy jointly reshuffles the model ordering relative to either metric alone. The most persistent failure mode across models is simple data minimization: agents still fill optional personal entries that the task does not require. These results show that privacy failures arise from over-helpful execution of benign tasks, and that success-only evaluation overestimates the deployment readiness of current phone-use agents. All code, mock apps, and agent trajectories are publicly available at~ https://github.com/FreedomIntelligence/MyPhoneBench.

1 Citations

0 Influential

42.951858789481 Altmetric

215.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!