2605.30326v1 May 28, 2026 cs.RO

RoboWits: Unexpected Challenges for Robotic Creative Problem Solving

Chuang Gan
Chuang Gan
Citations: 155
h-index: 6
Thomas L. Griffiths
Thomas L. Griffiths
Citations: 27
h-index: 3
Yejin Choi
Yejin Choi
Citations: 159
h-index: 6
Chun-Tse Lin
Chun-Tse Lin
Citations: 137
h-index: 5
Hongxin Zhang
Hongxin Zhang
Citations: 202
h-index: 5
Feng Yu
Feng Yu
Citations: 35
h-index: 3
Zhehuan Chen
Zhehuan Chen
Citations: 112
h-index: 5
David Held
David Held
Citations: 43
h-index: 3

The ability to reason, adapt, and creatively solve problems under unexpected challenges is essential for robots operating in real-world environments. However, current robotic benchmarks primarily emphasize skill-level execution and provide limited insight into such cognitive reasoning capabilities. We introduce RoboWits, a bi-manual robotic benchmark designed to systematically evaluate cognitive reasoning, creative tool use, and robustness to unexpected conditions. To enable scalable construction of high-quality reasoning-centric unexpected scenarios, we propose an automated task generation pipeline formulated as a multi-agent cooperative framework, comprising agents for seed task generation and verification, metric generation, scene generation, and task mutation. Using the pipeline, we curated 30 diverse seed tasks and 208 tasks with mutations and graded difficulty across geometry, material, and assembly-based reasoning. We benchmark popular robot policies, pre-trained VLAs, and oracle-state planners. Our results reveal a significant performance gap: while pre-trained VLAs exhibit preliminary success on seed tasks after single-task fine-tuning, they struggle to perform on mutated tasks, implying their brittleness in manipulation tasks requiring reasoning, strategy adaptation, and robustness to deceptive or constrained environments. Project page is available at https://umass-embodied-agi.github.io/RoboWits.

0 Citations
0 Influential
3 Altmetric
15.0 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!