2603.04772v1 Mar 05, 2026 cs.CL

TSEmbed: 범용 멀티모달 임베딩에서 작업 확장성을 실현하는 방법

TSEmbed: Unlocking Task Scaling in Universal Multimodal Embeddings

Ziwei Xie

Citations: 9,543

h-index: 5

Changwang Zhang

Citations: 215

h-index: 10

Yebo Wu

University of Macau

Citations: 115

h-index: 6

Fenglin Liu

Citations: 745

h-index: 10

Zhiyuan Liu

Citations: 13

h-index: 3

Li Li

Citations: 2,708

h-index: 4

Jun Wang

Citations: 109

h-index: 6

멀티모달 대규모 언어 모델(MLLM)은 뛰어난 추론 능력을 갖추고 있지만, 작업 간의 충돌로 인해 범용 임베딩 모델로 적용하는 데 어려움이 있습니다. 이러한 문제를 해결하기 위해, 우리는 Mixture-of-Experts (MoE)와 Low-Rank Adaptation (LoRA)를 결합하여 충돌하는 작업 목표를 명시적으로 분리하는 범용 멀티모달 임베딩 프레임워크인 TSEmbed를 제안합니다. 또한, 우리는 expert routing 분포를 의미적 유사성의 내재적 지표로 활용하는 새로운 전략인 Expert-Aware Negative Sampling (EANS)을 도입했습니다. EANS는 쿼리와 동일한 expert 활성화 패턴을 공유하는 정보적인 어려운 negative 샘플을 동적으로 우선순위를 부여하여 모델의 판별력을 향상시키고 임베딩 경계를 개선합니다. 또한, 우리는 expert의 전문성을 강화한 후 EANS를 통해 표현을 최적화하는 두 단계 학습 패러다임을 설계하여 학습 안정성을 확보했습니다. TSEmbed는 Massive Multimodal Embedding Benchmark (MMEB)와 실제 산업 생산 데이터셋 모두에서 최고 수준의 성능을 달성하며, 범용 멀티모달 임베딩에서의 작업 수준 확장을 위한 기반을 마련합니다.

Original Abstract

Despite the exceptional reasoning capabilities of Multimodal Large Language Models (MLLMs), their adaptation into universal embedding models is significantly impeded by task conflict. To address this, we propose TSEmbed, a universal multimodal embedding framework that synergizes Mixture-of-Experts (MoE) with Low-Rank Adaptation (LoRA) to explicitly disentangle conflicting task objectives. Moreover, we introduce Expert-Aware Negative Sampling (EANS), a novel strategy that leverages expert routing distributions as an intrinsic proxy for semantic similarity. By dynamically prioritizing informative hard negatives that share expert activation patterns with the query, EANS effectively sharpens the model's discriminative power and refines embedding boundaries. To ensure training stability, we further devise a two-stage learning paradigm that solidifies expert specialization before optimizing representations via EANS. TSEmbed achieves state-of-the-art performance on both the Massive Multimodal Embedding Benchmark (MMEB) and real-world industrial production datasets, laying a foundation for task-level scaling in universal multimodal embeddings.

3 Citations

0 Influential

5 Altmetric

28.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!