2604.01762v1 Apr 02, 2026 cs.LG

FourierMoE: 푸리에 혼합 전문가(Mixture-of-Experts)를 활용한 대규모 언어 모델의 적응

FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

Juyong Jiang

Citations: 1,245

h-index: 6

Fan Wang

Citations: 1,226

h-index: 5

Jing Tang

Citations: 14

h-index: 2

Sunghun Kim

Citations: 452

h-index: 5

Honggang Qi

Citations: 13

h-index: 3

파라미터 효율적인 미세 조정(PEFT)은 제한된 계산 자원 환경에서 대규모 언어 모델(LLM)을 적응시키는 데 중요한 패러다임으로 자리 잡았습니다. 그러나 표준 PEFT 방법은 종종 다양한 최적화 목표로 인해 작업 간섭이 발생하고 제한된 파라미터 예산으로 인해 표현 능력 부족이 발생하는 다중 작업 미세 조정 환경에서 어려움을 겪습니다. 최근에는 이러한 문제를 완화하기 위해 혼합 전문가(MoE) 아키텍처를 도입하는 접근 방식이 등장했지만, 대부분 공간 영역에서 작동하며 구조적 중복과 파라미터 오버헤드를 초래할 수 있습니다. 이러한 한계를 극복하기 위해, 우리는 적응을 주파수 영역으로 재정의했습니다. 우리의 주파수 분석 결과, 서로 다른 작업은 뚜렷한 주파수 에너지 분포를 나타내며, LLM 레이어는 이질적인 주파수 감수성을 보입니다. 이러한 통찰력을 바탕으로, 우리는 주파수 인지 적응을 위해 MoE 아키텍처를 역 이산 푸리에 변환(IDFT)과 통합한 FourierMoE를 제안합니다. 구체적으로, FourierMoE는 주파수 적응 라우터를 사용하여 토큰을 서로 다른 주파수 대역에 특화된 전문가에게 할당합니다. 각 전문가는 완전한 위상 및 진폭 정보를 유지하면서 이론적으로 손실 없는 IDFT 재구성을 보장하는 켤레 대칭 복소 계수를 학습합니다. 28개의 벤치마크, 다양한 모델 아키텍처 및 규모에 대한 광범위한 실험 결과, FourierMoE는 단일 작업 및 다중 작업 환경 모두에서 경쟁적인 기본 모델보다 우수한 성능을 보이며, 훨씬 적은 수의 학습 가능한 파라미터를 사용합니다. 이러한 결과는 주파수 영역 전문가 적응이 LLM 미세 조정을 위한 효과적이고 파라미터 효율적인 패러다임으로서의 잠재력을 보여줍니다.

Original Abstract

Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transform (IDFT) for frequency-aware adaptation. Specifically, FourierMoE employs a frequency-adaptive router to dispatch tokens to experts specialized in distinct frequency bands. Each expert learns a set of conjugate-symmetric complex coefficients, preserving complete phase and amplitude information while theoretically guaranteeing lossless IDFT reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks, multiple model architectures, and scales demonstrate that FourierMoE consistently outperforms competitive baselines in both single-task and multi-task settings while using significantly fewer trainable parameters. These results highlight the promise of spectral-domain expert adaptation as an effective and parameter-efficient paradigm for LLM fine-tuning.

0 Citations

0 Influential

3 Altmetric

15.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!