2604.00801v1 Apr 01, 2026 cs.LG

라우팅 없는 전문가 혼합 모델

Routing-Free Mixture-of-Experts

Jinru Han

Citations: 117

h-index: 5

Sikuan Yan

Citations: 136

h-index: 3

Volker Tresp

Citations: 131

h-index: 2

Yunpu Ma

Citations: 1,787

h-index: 25

Yilun Liu

Citations: 710

h-index: 15

기존의 전문가 혼합(MoE) 모델은 중앙 집중식 라우팅 메커니즘에 의존하며, 이는 경직된 유도 편향을 야기합니다. 우리는 라우팅 없는 MoE 모델을 제안하며, 외부 라우터, 소프트맥스, Top-K 및 로드 밸런싱을 포함한 모든 하드 코딩된 중앙 집중식 설계를 제거합니다. 대신 모든 활성화 기능을 개별 전문가 내에 캡슐화하고, 연속적인 기울기 흐름을 통해 직접 최적화하여, 각 전문가가 자체적으로 활성화를 결정하도록 합니다. 우리는 통일된 적응형 로드 밸런싱 프레임워크를 도입하여, 설정 가능한 보간 방법을 통해 전문가 밸런싱 및 토큰 밸런싱 목표를 동시에 최적화하고, 유연하고 사용자 정의 가능한 리소스 할당을 가능하게 합니다. 광범위한 실험 결과, 라우팅 없는 MoE 모델은 더 나은 확장성과 견고성을 보이며, 기존 모델보다 일관되게 우수한 성능을 나타냅니다. 우리는 이 모델의 동작을 자세히 분석하고, 향후 MoE 설계 및 최적화에 도움이 될 수 있는 통찰력을 제공합니다.

Original Abstract

Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate future MoE design ad optimization.

0 Citations

0 Influential

12.5 Altmetric

62.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!