2604.05226v1 Apr 06, 2026 cs.RO

RoboPlayground: 구조화된 물리적 환경을 통한 로봇 평가의 민주화

RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

Y. Wang

Citations: 518

h-index: 7

Carter Ung

Citations: 11

h-index: 1

E. Gubarev

Citations: 8

h-index: 1

Christopher Tan

Citations: 11

h-index: 1

S. Srinivasa

Citations: 25,603

h-index: 77

Dieter Fox

Citations: 128

h-index: 2

로봇 조작 시스템의 평가는 주로 소수의 전문가가 작성한 고정된 벤치마크에 의존하며, 여기서 작업 인스턴스, 제약 조건 및 성공 기준은 미리 정의되어 있으며 확장이 어렵습니다. 이러한 패러다임은 평가에 참여할 수 있는 사람을 제한하고 정책이 사용자가 정의한 작업 의도, 제약 조건 및 성공 개념의 변화에 어떻게 반응하는지에 대한 정보를 가립니다. 우리는 현대적인 조작 정책을 평가하려면 평가를 구조화된 물리적 환경에서의 언어 기반 프로세스로 재정의해야 한다고 주장합니다. 우리는 사용자가 구조화된 물리적 환경 내에서 자연어 설명을 사용하여 실행 가능한 조작 작업을 작성할 수 있는 프레임워크인 RoboPlayground를 제시합니다. 자연어 지침은 명시적인 자산 정의, 초기화 분포 및 성공 예측을 포함하는 재현 가능한 작업 사양으로 컴파일됩니다. 각 지침은 관련된 작업의 구조화된 그룹을 정의하여 실행 가능성과 비교성을 유지하면서 제어된 의미적 및 행동적 변화를 가능하게 합니다. 우리는 RoboPlayground를 구조화된 블록 조작 도메인에 적용하고 세 가지 측면에서 평가했습니다. 사용자 연구 결과, 언어 기반 인터페이스가 프로그래밍 기반 및 코드 지원 기준보다 사용하기 쉽고 인지적 부담이 적다는 것을 보여주었습니다. 언어로 정의된 작업 그룹에서 학습된 정책을 평가한 결과, 고정된 벤치마크 평가에서는 명확하게 드러나지 않는 일반화 실패가 발견되었습니다. 마지막으로, 작업 다양성이 작업 수뿐만 아니라 기여자 다양성에 따라 증가한다는 것을 보여주어, 크라우드 소싱 기여를 통해 평가 공간이 지속적으로 확장될 수 있음을 알 수 있습니다. 프로젝트 페이지: https://roboplayground.github.io

Original Abstract

Evaluation of robotic manipulation systems has largely relied on fixed benchmarks authored by a small number of experts, where task instances, constraints, and success criteria are predefined and difficult to extend. This paradigm limits who can shape evaluation and obscures how policies respond to user-authored variations in task intent, constraints, and notions of success. We argue that evaluating modern manipulation policies requires reframing evaluation as a language-driven process over structured physical domains. We present RoboPlayground, a framework that enables users to author executable manipulation tasks using natural language within a structured physical domain. Natural language instructions are compiled into reproducible task specifications with explicit asset definitions, initialization distributions, and success predicates. Each instruction defines a structured family of related tasks, enabling controlled semantic and behavioral variation while preserving executability and comparability. We instantiate RoboPlayground in a structured block manipulation domain and evaluate it along three axes. A user study shows that the language-driven interface is easier to use and imposes lower cognitive workload than programming-based and code-assist baselines. Evaluating learned policies on language-defined task families reveals generalization failures that are not apparent under fixed benchmark evaluations. Finally, we show that task diversity scales with contributor diversity rather than task count alone, enabling evaluation spaces to grow continuously through crowd-authored contributions. Project Page: https://roboplayground.github.io

0 Citations

0 Influential

30 Altmetric

150.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!