2604.08884v1 Apr 10, 2026 cs.CV

HM-Bench: 다중 모드 대규모 언어 모델을 위한 종합적인 벤치마크 - 하이퍼스펙트럴 원격 감지 분야

HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing

Juepeng Zheng

Citations: 1,518

h-index: 21

Zurong Mai

Citations: 2

h-index: 1

Yuhang Chen

Citations: 2

h-index: 1

Jianxi Huang

Citations: 31

h-index: 2

Haohuan Fu

Citations: 637

h-index: 14

Qingmei Li

Citations: 62

h-index: 3

Zjin Liao

Citations: 0

h-index: 0

Tianyu Bi

Citations: 42

h-index: 5

Haoyuan Liang

Citations: 11

h-index: 2

Rui Su

Citations: 31

h-index: 3

Zi Qian

Citations: 51

h-index: 3

Xinyue Zhang

Citations: 73

h-index: 2

Yibin Wen

Citations: 102

h-index: 3

Xiaoyang Fan

Citations: 4

h-index: 1

Chan Tsz Ho

Citations: 0

h-index: 0

Yutong Lu

Citations: 6

h-index: 2

다중 모드 대규모 언어 모델(MLLM)은 자연 이미지 이해 분야에서 상당한 발전을 이루었지만, 원격 감지에 중요한 역할을 하는 하이퍼스펙트럴 이미지(HSI)를 인식하고 추론하는 능력은 아직 충분히 연구되지 않았습니다. HSI는 높은 차원과 복잡한 스펙트럼-공간 특성을 가지므로, 주로 RGB 데이터로 훈련된 모델에게는 독특한 어려움을 제시합니다. 이러한 격차를 해소하기 위해, 우리는 MLLM의 HSI 이해 능력을 평가하기 위해 특별히 설계된 최초의 벤치마크인 하이퍼스펙트럴 다중 모드 벤치마크(HM-Bench)를 소개합니다. 우리는 13가지 작업 범주에 걸쳐 총 19,337개의 질문-답변 쌍으로 구성된 대규모 데이터 세트를 구축했으며, 이는 기본적인 인식부터 스펙트럴 추론에 이르기까지 다양한 수준을 포함합니다. 기존의 MLLM은 원시 하이퍼스펙트럴 데이터를 직접 처리할 수 없으므로, 우리는 HSI 데이터를 두 가지 상호 보완적인 표현으로 변환하는 이중 모드 평가 프레임워크를 제안합니다. 이러한 접근 방식은 모델 성능에 대한 다양한 표현의 체계적인 비교를 가능하게 합니다. 18개의 대표적인 MLLM에 대한 광범위한 평가 결과, 복잡한 공간-스펙트럴 추론 작업을 처리하는 데 상당한 어려움이 있음을 보여줍니다. 또한, 우리의 결과는 시각적 입력이 일반적으로 텍스트 입력보다 우수한 성능을 보인다는 것을 보여주며, 이는 효과적인 HSI 이해를 위해서는 스펙트럼-공간 증거에 기반한 학습의 중요성을 강조합니다. 데이터 세트 및 추가 자료는 다음 주소에서 확인할 수 있습니다: https://github.com/HuoRiLi-Yu/HM-Bench.

Original Abstract

While multimodal large language models (MLLMs) have made significant strides in natural image understanding, their ability to perceive and reason over hyperspectral image (HSI) remains underexplored, which is a vital modality in remote sensing. The high dimensionality and intricate spectral-spatial properties of HSI pose unique challenges for models primarily trained on RGB data.To address this gap, we introduce Hyperspectral Multimodal Benchmark (HM-Bench), the first benchmark designed specifically to evaluate MLLMs in HSI understanding. We curate a large-scale dataset of 19,337 question-answer pairs across 13 task categories, ranging from basic perception to spectral reasoning. Given that existing MLLMs are not equipped to process raw hyperspectral cubes natively, we propose a dual-modality evaluation framework that transforms HSI data into two complementary representations: PCA-based composite images and structured textual reports. This approach facilitates a systematic comparison of different representation for model performance. Extensive evaluations on 18 representative MLLMs reveal significant difficulties in handling complex spatial-spectral reasoning tasks. Furthermore, our results demonstrate that visual inputs generally outperform textual inputs, highlighting the importance of grounding in spectral-spatial evidence for effective HSI understanding. Dataset and appendix can be accessed at https://github.com/HuoRiLi-Yu/HM-Bench.

0 Citations

0 Influential

35.993061443341 Altmetric

180.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!