2602.22059v1 Feb 25, 2026 cs.CV

NESTOR: 대규모 PDE 사전 훈련을 위한 중첩 MOE 기반 신경 연산자

NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training

Dengdi Sun

Citations: 846

h-index: 12

Xiaoyan Zhou

Citations: 0

h-index: 0

Hao Si

Citations: 2

h-index: 1

Wanli Lyu

Citations: 2

h-index: 1

Jin Tang

Citations: 5

h-index: 1

Bin Luo

Citations: 46

h-index: 4

Xiao Wang

Citations: 3

h-index: 1

신경 연산자는 PDE 문제를 해결하는 효율적인 패러다임으로 부상했으며, 기존의 수치 해법의 한계를 극복하고 계산 효율성을 크게 향상시킵니다. 그러나 PDE 시스템의 다양성과 복잡성으로 인해, 기존의 신경 연산자는 일반적으로 단일 네트워크 구조에 의존하며, 이는 이질적인 특징과 복잡한 시스템 의존성을 완전히 포착하는 능력을 제한합니다. 이러한 제약은 신경 연산자를 기반으로 한 대규모 PDE 사전 훈련의 병목 현상을 야기합니다. 이러한 문제점을 해결하기 위해, 우리는 중첩된 Mixture-of-Experts (MoE) 프레임워크를 기반으로 한 대규모 PDE 사전 훈련 신경 연산자를 제안합니다. 특히, 이미지 수준의 MoE는 전역 의존성을 포착하도록 설계되었고, 토큰 수준의 Sub-MoE는 지역 의존성에 집중합니다. 우리의 모델은 주어진 입력에 가장 적합한 전문가 네트워크를 선택적으로 활성화하여 일반화 및 전이성을 향상시킵니다. 우리는 다양한 출처의 12개의 PDE 데이터 세트에 대한 대규모 사전 훈련을 수행하고, 모델을 하위 작업에 성공적으로 전이했습니다. 광범위한 실험을 통해 우리의 접근 방식의 효과성을 입증했습니다.

Original Abstract

Neural operators have emerged as an efficient paradigm for solving PDEs, overcoming the limitations of traditional numerical methods and significantly improving computational efficiency. However, due to the diversity and complexity of PDE systems, existing neural operators typically rely on a single network architecture, which limits their capacity to fully capture heterogeneous features and complex system dependencies. This constraint poses a bottleneck for large-scale PDE pre-training based on neural operators. To address these challenges, we propose a large-scale PDE pre-trained neural operator based on a nested Mixture-of-Experts (MoE) framework. In particular, the image-level MoE is designed to capture global dependencies, while the token-level Sub-MoE focuses on local dependencies. Our model can selectively activate the most suitable expert networks for a given input, thereby enhancing generalization and transferability. We conduct large-scale pre-training on twelve PDE datasets from diverse sources and successfully transfer the model to downstream tasks. Extensive experiments demonstrate the effectiveness of our approach.

0 Citations

0 Influential

6 Altmetric

30.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!