2603.23447v1 Mar 24, 2026 cs.CV

3DCity-LLM: 3차원 도시 규모의 인식 및 이해를 위한 다중 모드 대규모 언어 모델 강화

3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

Ouyang Jie

Citations: 186

h-index: 8

Haoning Wu

Citations: 273

h-index: 3

Wenyu Ke

Citations: 16

h-index: 2

Yiping Chen

Citations: 63

h-index: 2

Jinpeng Li

Citations: 3

h-index: 1

Yang Luo

Citations: 4

h-index: 2

Li Liu

Citations: 270

h-index: 3

Hongchao Fan

Citations: 62

h-index: 2

Zhongjie He

Citations: 177

h-index: 7

다중 모드 대규모 언어 모델은 객체 중심 또는 실내 환경에서 뛰어난 성능을 보이지만, 이를 3차원 도시 규모 환경으로 확장하는 것은 여전히 어려운 과제입니다. 이러한 격차를 해소하기 위해, 우리는 3차원 도시 규모의 시각-언어 인식 및 이해를 위한 통합 프레임워크인 3DCity-LLM을 제안합니다. 3DCity-LLM은 목표 객체, 객체 간 관계 및 전체 장면을 위한 세 개의 병렬 브랜치를 포함하는, 거칠기로부터 세밀함으로의 특징 인코딩 전략을 사용합니다. 대규모 학습을 용이하게 하기 위해, 우리는 약 120만 개의 고품질 샘플을 포함하는 3DCity-LLM-1.2M 데이터셋을 소개합니다. 이 데이터셋은 미세한 객체 분석부터 다각적인 장면 계획에 이르기까지 7가지 대표적인 작업 범주를 포괄하며, 엄격한 품질 관리를 거쳤습니다. 이 데이터셋은 명시적인 3차원 수치 정보와 다양한 사용자 중심 시뮬레이션을 통합하여 도시 시나리오의 질문-응답 다양성과 현실감을 향상시킵니다. 또한, 우리는 텍스트 유사성 지표 및 LLM 기반 의미론적 평가를 기반으로 하는 다차원 프로토콜을 적용하여 모든 방법론에 대한 충실하고 포괄적인 평가를 보장합니다. 두 가지 벤치마크에 대한 광범위한 실험 결과, 3DCity-LLM은 기존의 최첨단 방법론보다 훨씬 뛰어난 성능을 보이며, 공간 추론 및 도시 지능 발전을 위한 유망하고 의미 있는 방향을 제시합니다. 소스 코드 및 데이터셋은 https://github.com/SYSU-3DSTAILab/3D-City-LLM 에서 확인할 수 있습니다.

Original Abstract

While multi-modality large language models excel in object-centric or indoor scenarios, scaling them to 3D city-scale environments remains a formidable challenge. To bridge this gap, we propose 3DCity-LLM, a unified framework designed for 3D city-scale vision-language perception and understanding. 3DCity-LLM employs a coarse-to-fine feature encoding strategy comprising three parallel branches for target object, inter-object relationship, and global scene. To facilitate large-scale training, we introduce 3DCity-LLM-1.2M dataset that comprises approximately 1.2 million high-quality samples across seven representative task categories, ranging from fine-grained object analysis to multi-faceted scene planning. This strictly quality-controlled dataset integrates explicit 3D numerical information and diverse user-oriented simulations, enriching the question-answering diversity and realism of urban scenarios. Furthermore, we apply a multi-dimensional protocol based on text-similarity metrics and LLM-based semantic assessment to ensure faithful and comprehensive evaluations for all methods. Extensive experiments on two benchmarks demonstrate that 3DCity-LLM significantly outperforms existing state-of-the-art methods, offering a promising and meaningful direction for advancing spatial reasoning and urban intelligence. The source code and dataset are available at https://github.com/SYSU-3DSTAILab/3D-City-LLM.

0 Citations

0 Influential

32.95879734614 Altmetric

164.8 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!