2605.05676v1 May 07, 2026 cs.CL

대규모 언어 모델의 기본 능력 분해: 다중 작업 지시 미세 조정에서의 교차 작업 간섭 완화

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Ximing Li

Citations: 425

h-index: 12

Bing Wang

Citations: 360

h-index: 10

C. Li

Citations: 497

h-index: 13

Jinjin Chi

Citations: 239

h-index: 8

Gang Niu

Citations: 111

h-index: 5

Masashi Sugiyama

Citations: 47

h-index: 4

최근 대규모 언어 모델(LLM)의 뛰어난 성능은 주로 다중 작업 지시 미세 조정에 의해 주도되었습니다. 그러나 이 훈련 패러다임은 다양한 작업 간 공유 매개변수에 대한 상충되는 기울기 때문에 발생하는 주요 문제인 교차 작업 간섭이라는 단점이 있습니다. 일부 기존 방법은 작업별 매개변수를 분리하여 이 문제를 완화하려고 시도합니다. 예를 들어, 작업별 뉴런 선택 및 전문가 혼합 방식이 있습니다. 본 논문에서는 기존 솔루션에서도 여전히 많은 수의 매개변수가 여러 작업에 의해 공유되기 때문에 교차 작업 간섭이 존재한다는 것을 경험적으로 밝히고, 이에 따라 다중 작업 지시 미세 조정을 위한 새로운 솔루션인 기본 능력 분해(Basic Abilities Decomposition for multi-task Instruct-Tuning, BADIT)를 제안합니다. 구체적으로, 우리는 특정 매개변수가 일관적으로 함께 활성화되고, 이러한 함께 활성화된 매개변수가 자연스럽게 기본 그룹으로 구성된다는 것을 경험적으로 발견했습니다. 이는 LLM이 여러 개의 직교하는 기본 능력을 인코딩하며, 모든 작업이 이러한 능력들의 선형 결합으로 표현될 수 있다는 점을 시사합니다. 따라서, 우리는 LLM 매개변수를 기본 능력을 나타내는 직교하는 고-특이값 LoRA 전문가로 분해하고, 랭크-1 구성 요소의 구형 클러스터링을 통해 훈련 중에 이들의 직교성을 동적으로 강화하는 BADIT를 제안합니다. 우리는 6개의 LLM을 사용하여 SuperNI 벤치마크에서 광범위한 실험을 수행했으며, 경험적 결과는 BADIT가 최첨단 방법보다 우수한 성능을 보이며 교차 작업 간섭의 정도를 완화할 수 있음을 보여줍니다.

Original Abstract

Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared parameters among different tasks. Some previous methods mitigate this issue by isolating task-specific parameters, e.g., task-specific neuron selection and mixture-of-experts. In this paper, we empirically reveal that the cross-task interference still exists for the existing solutions because of many parameters also shared by different tasks, and accordingly, we propose a novel solution, namely Basic Abilities Decomposition for multi-task Instruct-Tuning (BADIT). Specifically, we empirically find that certain parameters are consistently co-activated, and that co-activated parameters naturally organize into base groups. This motivates us to analogize that LLMs encode several orthogonal basic abilities, and that any task can be represented as a linear combination of these abilities. Accordingly, we propose BADIT that decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities, and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components. We conduct extensive experiments on the SuperNI benchmark with 6 LLMs, and empirical results demonstrate that BADIT can outperform SOTA methods and mitigate the degree of cross-task interference.

0 Citations

0 Influential

6.5 Altmetric

32.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!