2604.23018v1 Apr 24, 2026 cs.CV

AmaraSpatial-10K: 공간 컴퓨팅 및 인공지능 로봇을 위한 공간적 및 의미적으로 정렬된 3D 데이터셋

AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI

Mohammadreza Salehi

Citations: 762

h-index: 6

Alex Perkins

Citations: 1

h-index: 1

Igor P. Maurell

Citations: 12

h-index: 2

Ashkan Dabbagh

Citations: 1

h-index: 1

Raymond Wong

Citations: 43

h-index: 1

웹 규모의 3D 자산 컬렉션은 풍부하지만, 실제 사용에 적합한 경우는 드뭅니다. 이러한 자산들은 종종 임의의 크기, 부정확한 회전축, 불안정한 기하학 구조, 그리고 리라이팅을 지원하지 않는 텍스처를 가지고 있어, 인공지능 로봇, 로봇 시뮬레이션, 게임 개발, 그리고 증강/가상 현실 분야에서의 활용을 제한합니다. 본 논문에서는 1만 개 이상의 합성 3D 자산으로 구성된 데이터셋인 AmaraSpatial-10K를 소개합니다. 이 데이터셋은 단순히 데이터의 양이 많음을 넘어, 실제 활용을 목표로 설계되었습니다. 각 자산은 미터법 단위로 크기가 조정되고, 의미론적 기준점을 가지는 .glb 파일 형태로 제공되며, 분리된 PBR 머티리얼 맵, 볼록 충돌 형상, 쌍을 이루는 참조 이미지, 그리고 풍부한 다중 문장 텍스트 메타데이터를 포함합니다. 데이터셋은 실내 물체, 차량, 건축물, 생물, 그리고 소품 등 다양한 범주를 포괄하며, 통일된 공간 기준을 따릅니다. 데이터셋과 함께, 3D 자산 컬렉션에 대한 평가 도구를 소개합니다. 이 도구는 LLM(대규모 언어 모델)을 활용한 Scale Plausibility Score (SPS), 메타데이터의 LLM 개념 밀도 점수, 앵커 오류 지표, 그리고 교차 모달 CLIP 일관성 프로토콜 등을 포함합니다. 이를 사용하여 AmaraSpatial-10K를 Objaverse, HSSD, ABO, 그리고 GSO의 일부 데이터셋과 비교 분석했습니다. Objaverse에서 가져온 자산과 비교했을 때, AmaraSpatial-10K는 텍스트 기반 검색 정확도를 크게 향상시켰습니다 (CLIP Recall@5: 0.612 vs 0.181, 3.4배 향상, 중앙 순위 267에서 3으로 감소). 또한, AmaraSpatial-10K는 물리 기반 장면 구성 및 인공지능 로봇을 위한 자산 컬렉션에 필요한 공간적 및 의미론적 조건을 충족하며, 이러한 측면에서의 추가적인 연구는 향후 진행될 예정입니다. AmaraSpatial-10K는 Hugging Face에서 공개적으로 이용할 수 있습니다.

Original Abstract

Web-scale 3D asset collections are abundant, but rarely deployment-ready. Assets ship with arbitrary metric scale, incorrect pivots and forward axes, brittle geometry, and textures that do not support relighting, which limits their utility for embodied AI, robotics simulation, game development, and AR/VR. We present AmaraSpatial-10K, a dataset of over 10,000 synthetic 3D assets designed for downstream use rather than volume alone. Each asset is released as a metric-scaled, semantically anchored .glb with separated PBR material maps, a convex collision hull, a paired reference image, and rich multi-sentence text metadata. The dataset spans indoor objects, vehicles, architecture, creatures, and props under a unified spatial convention. Alongside the dataset, we introduce an evaluation suite for 3D asset banks. The suite comprises a continuous Scale Plausibility Score (SPS) with an LLM-as-Judge interval protocol, an LLM Concept Density score for metadata, an anchor-error metric, and a cross-modal CLIP coherence protocol, and we use it to audit AmaraSpatial-10K alongside matched subsets from Objaverse, HSSD, ABO, and GSO. Compared with Objaverse-sourced assets, we demonstrate that AmaraSpatial-10K substantially improves text-based retrieval precision (CLIP Recall@5 of 0.612 vs 0.181, a 3.4x improvement with median rank falling from 267 to 3), and we establish that it satisfies the spatial and semantic prerequisites for physics-aware scene composition and embodied-AI asset banks, leaving those downstream evaluations to future work. AmaraSpatial-10K is publicly available on Hugging Face.

1 Citations

0 Influential

3 Altmetric

16.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!