2604.04145v1 Apr 05, 2026 cs.AI

Solar-VLM: 다중 모드 비전-언어 모델을 활용한 태양광 발전 예측 성능 향상

Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting

Haoran Pei

Citations: 31

h-index: 3

Hang Fan

Citations: 39

h-index: 4

Runze Liang

Citations: 3

h-index: 1

Weican Liu

Citations: 26

h-index: 3

Long Cheng

Citations: 13

h-index: 2

Wei Wei

Citations: 451

h-index: 5

태양광 발전(PV) 예측은 전력 시스템 운영 및 시장 참여에 중요한 역할을 합니다. 태양광 발전량은 날씨 조건 및 구름 움직임에 매우 민감하기 때문에, 정확한 예측을 위해서는 다양한 정보 소스를 활용하여 복잡한 시공간적 의존성을 효과적으로 모델링해야 합니다. 최근 연구에서 AI 기반 예측 방법이 발전했지만, 대부분의 방법은 시계열 데이터, 위성 이미지 및 텍스트 기반 날씨 정보를 통합적인 프레임워크 내에서 활용하지 못합니다. 본 논문에서는 다중 모드 태양광 발전 예측을 위한 대규모 언어 모델 기반 프레임워크인 Solar-VLM을 제안합니다. 먼저, 각 모드에 특화된 인코더를 개발하여 이질적인 입력 데이터로부터 상호 보완적인 특징을 추출합니다. 시계열 인코더는 각 지역의 다변량 관측 데이터에서 시간적 패턴을 파악하기 위해 패치 기반 설계를 채택합니다. 시각 인코더는 Qwen 기반의 비전 백본을 사용하여 위성 이미지에서 구름 정보를 추출합니다. 텍스트 인코더는 과거 날씨 특성을 텍스트 설명을 통해 추출합니다. 둘째, 지리적으로 분산된 태양광 발전소 간의 공간적 의존성을 파악하기 위해 교차 사이트 특징 융합 메커니즘을 도입했습니다. 구체적으로, 그래프 러너는 K-최근접 이웃(KNN) 그래프를 기반으로 그래프 어텐션 네트워크를 통해 발전소 간의 상관관계를 모델링하며, 교차 사이트 어텐션 모듈은 사이트 간의 적응적인 정보 교환을 더욱 촉진합니다. 중국 북부 지역의 8개 태양광 발전소 데이터를 활용한 실험 결과, 제안하는 프레임워크의 효과성을 입증했습니다. 제안된 모델은 https://github.com/rhp413/Solar-VLM 에서 공개적으로 이용할 수 있습니다.

Original Abstract

Photovoltaic (PV) power forecasting plays a critical role in power system dispatch and market participation. Because PV generation is highly sensitive to weather conditions and cloud motion, accurate forecasting requires effective modeling of complex spatiotemporal dependencies across multiple information sources. Although recent studies have advanced AI-based forecasting methods, most fail to fuse temporal observations, satellite imagery, and textual weather information in a unified framework. This paper proposes Solar-VLM, a large-language-model-driven framework for multimodal PV power forecasting. First, modality-specific encoders are developed to extract complementary features from heterogeneous inputs. The time-series encoder adopts a patch-based design to capture temporal patterns from multivariate observations at each site. The visual encoder, built upon a Qwen-based vision backbone, extracts cloud-cover information from satellite images. The text encoder distills historical weather characteristics from textual descriptions. Second, to capture spatial dependencies across geographically distributed PV stations, a cross-site feature fusion mechanism is introduced. Specifically, a Graph Learner models inter-station correlations through a graph attention network constructed over a K-nearest-neighbor (KNN) graph, while a cross-site attention module further facilitates adaptive information exchange among sites. Finally, experiments conducted on data from eight PV stations in a northern province of China demonstrate the effectiveness of the proposed framework. Our proposed model is publicly available at https://github.com/rhp413/Solar-VLM.

0 Citations

0 Influential

22.5 Altmetric

112.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!