2605.06664v1 May 07, 2026 cs.CV

BAMI: GUI 기반 작업 환경에서 훈련 없이 편향을 완화하는 방법

BAMI: Training-Free Bias Mitigation in GUI Grounding

Liang Tang

Citations: 16

h-index: 3

Yiqiang Yan

Citations: 14

h-index: 3

Borui Zhang

Tsinghua University

Citations: 567

h-index: 8

Jiwen Lu

Citations: 1,272

h-index: 17

Wenzhao Zheng

Citations: 3,570

h-index: 25

Bo Zhang

Citations: 2

h-index: 1

Bo Wang

Citations: 1

h-index: 1

Yuhao Cheng

Citations: 139

h-index: 7

Jie Zhou

Citations: 119

h-index: 7

GUI 기반 작업 환경은 GUI 에이전트가 클릭 및 드래그와 같은 작업을 수행하는 데 필수적인 기능입니다. 그러나 ScreenSpot-Pro 벤치마크와 같은 복잡한 시나리오에서 기존 모델은 종종 최적의 성능을 보이지 않습니다. 본 논문에서 제안하는 **마스크 예측 분포 (MPD)** 설명 방법론을 사용하여 오류의 주요 원인이 두 가지임을 확인했습니다. 첫째, 높은 이미지 해상도는 정밀도 편향을 유발하고, 둘째, 복잡한 인터페이스 요소는 모호성 편향을 야기합니다. 이러한 문제점을 해결하기 위해, 본 논문에서는 두 가지 핵심적인 방법을 통합한 **편향 인지 조작 추론 (BAMI)**을 제안합니다. BAMI는 거시적에서 세부적인 영역으로의 집중 및 후보 선택이라는 두 가지 핵심 조작을 통해 이러한 편향을 효과적으로 완화합니다. 광범위한 실험 결과는 BAMI가 훈련 없이 다양한 GUI 기반 작업 환경 모델의 정확도를 크게 향상시킨다는 것을 보여줍니다. 예를 들어, TianXi-Action-7B 모델에 본 방법을 적용했을 때, ScreenSpot-Pro 벤치마크에서의 정확도가 51.9%에서 57.8%로 향상되었습니다. 또한, 다양한 파라미터 설정에 대한 분석 결과는 BAMI 접근 방식의 견고성을 확인해주며, 이는 BAMI의 안정성과 효과성을 강조합니다. 코드 및 관련 자료는 다음 링크에서 확인할 수 있습니다: https://github.com/Neur-IO/BAMI.

Original Abstract

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at https://github.com/Neur-IO/BAMI.

1 Citations

0 Influential

42.897207708399 Altmetric

215.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!