2605.05854v1 May 07, 2026 cs.AI

AirQualityBench: 글로벌 대기질 예측을 위한 현실적인 평가 벤치마크

AirQualityBench: A Realistic Evaluation Benchmark for Global Air Quality Forecasting

Zhen-Qiang Zhou

Citations: 214

h-index: 10

Huiling Zhao

Citations: 97

h-index: 5

Yang Wang

Citations: 876

h-index: 17

Xinghong Xu

Citations: 11

h-index: 1

Xu Wang

Citations: 736

h-index: 17

Yudong Zhang

Citations: 270

h-index: 10

대기질 예측 모델은 일반적으로 지역적이고, 전처리 및 정규화된 데이터셋을 사용하여 평가되는데, 이때는 누락된 관측값들이 제거되거나 인위적으로 채워집니다. 이러한 방식은 비교를 단순화하지만, 실제 모니터링 네트워크에서 나타나는 중요한 요소들을 간과합니다. 여기에는 불균등한 글로벌 커버리지, 체계적인 누락, 이질적인 오염 물질 규모, 그리고 배포 비용 등이 포함됩니다. 본 연구에서는 이러한 현실적인 조건 하에서 예측 모델을 평가하기 위해 설계된 글로벌 다중 오염 물질 벤치마크인 **AirQualityBench**를 소개합니다. 이 벤치마크는 2021년부터 2025년까지 3,720개의 모니터링 스테이션에서 수집된 시간별 관측 데이터를 포함하며, 6가지 주요 오염 물질을 다루고, 제공업체에서 제공하는 원본 관측 데이터 마스크를 유지합니다. AirQualityBench는 밀집된 데이터 텐서를 생성하는 대신, 누락된 데이터를 예측 문제의 일부로 간주하고, 물리적 농도 척도로 변환한 후 유효한 미래 관측값에 대한 오차를 보고합니다. 대표적인 시공간 모델을 이 통합된 방식으로 평가한 결과, 정제된 데이터셋에서 높은 성능을 보이는 모델이 글로벌 환경에서 단편화된 모니터링 데이터 스트림으로 이동할 때 반드시 좋은 성능을 보장하지는 않는다는 것을 알 수 있습니다. 따라서 AirQualityBench는 확장 가능하고, 마스크 정보를 활용하며, 물리적으로 해석 가능한 대기질 예측을 위한 현실적인 테스트 환경을 제공합니다. 벤치마크 데이터, 코드, 평가 스크립트, 그리고 기본 구현은 GitHub에서 확인할 수 있습니다: [https://github.com/Star-Learning/AirQualityBench](https://github.com/Star-Learning/AirQualityBench)

Original Abstract

Air-quality forecasting models are commonly evaluated on regional, preprocessed, and normalized datasets, where missing observations are removed or artificially completed. Such protocols simplify comparison but hide the conditions that dominate real monitoring networks: uneven global coverage, structured missingness, heterogeneous pollutant scales, and deployment cost. We introduce \textbf{AirQualityBench}, a global multi-pollutant benchmark designed to evaluate forecasting models under these realistic conditions. The benchmark contains hourly observations from 3,720 monitoring stations over 2021--2025, covers six major pollutants, and preserves provider-native observation masks. Rather than imputing a dense data tensor, AirQualityBench exposes missingness as part of the forecasting problem and reports errors on valid future observations after inverse transformation to physical concentration scales. Evaluating representative spatio-temporal models under this unified protocol shows that strong performance on sanitized datasets does not reliably transfer to global, fragmented monitoring streams. AirQualityBench therefore serves as a realistic testbed for scalable, mask-aware, and physically interpretable air-quality forecasting. All benchmark data, code, evaluation scripts, and baseline implementations are available at \href{https://github.com/Star-Learning/AirQualityBench}{GitHub}.

0 Citations

0 Influential

37.45879734614 Altmetric

187.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!