2605.00642v1 May 01, 2026 cs.AI

스스로 배우는 클릭 위치: GUI 기반 작업 환경 이해를 위한 온-폴리시 자체 증류

Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

Huawen Shen

Citations: 78

h-index: 6

Yan Zhang

Citations: 36

h-index: 4

Daiqing Wu

Citations: 55

h-index: 5

Can Ma

Citations: 92

h-index: 5

Yu ZHOU

Nankai University, China

Citations: 2,414

h-index: 24

GUI(Graphical User Interface) 기반 작업 환경 이해는 자연어 명령을 대상 요소의 시각적 좌표에 매핑하여 자율적인 GUI 에이전트의 핵심 기능을 제공합니다. 최근 강화 학습 방법(예: GRPO)은 뛰어난 성능을 보였지만, 여러 번의 시행(rollout)을 필요로 하고, 어려운 샘플에서 희소한 신호를 받습니다. 이러한 제한 사항으로 인해, 단일 시행에서 토큰 단위의 상세한 정보를 제공하는 온-폴리시 자체 증류(OPSD)는 유망한 대안으로 떠오르고 있습니다. 하지만, OPSD가 GUI 기반 작업 환경 이해에 적용될 가능성은 아직 탐구되지 않았습니다. 본 논문에서는 GUI 기반 작업 환경 이해에 특화된 최초의 OPSD 프레임워크인 GUI-SD를 제시합니다. 먼저, 대상 영역의 경계 상자 및 가우시안 소프트 마스크를 사용하여 시각적으로 풍부한 정보를 제공하는 '특권 정보(privileged context)'를 교사 모델(teacher)에게 제공하여 정확한 좌표를 노출하지 않고도 유용한 지침을 제공합니다. 둘째, 중요도 및 교사 모델의 확신 정도에 따라 토큰의 가중치를 적응적으로 조정하는 엔트로피 기반 증류를 사용하여 최적화가 가장 효과적인 위치에 집중합니다. 6개의 대표적인 GUI 기반 작업 환경 이해 벤치마크에 대한 광범위한 실험 결과, GUI-SD는 GRPO 기반 방법 및 기본적인 OPSD 방법보다 정확도와 학습 효율성 모두에서 일관되게 우수한 성능을 보였습니다. 코드 및 학습 데이터는 https://zhangyan-ucas.github.io/GUI-SD/ 에서 확인할 수 있습니다.

Original Abstract

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided distillation, which adaptively weights tokens based on digit significance and teacher confidence, concentrating optimization on the most impactful and reliable positions. Extensive experiments on six representative GUI grounding benchmarks show that GUI-SD consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency. Code and training data are available at https://zhangyan-ucas.github.io/GUI-SD/.

1 Citations

0 Influential

12 Altmetric

61.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!