2601.08146v2 Jan 13, 2026 cs.CL

메커니즘은 전이 가능하다: 회로(circuit)를 목표로 하는 지도 미세 조정(supervised fine-tuning)을 통한 데이터 효율적인 저자원 환경 적응

Mechanisms are Transferable: Data-Efficient Low-Resource Adaptation via Circuit-Targeted Supervised Fine-Tuning

Khumaisa Nur'aini

Monash University Indonesia

Citations: 58

h-index: 2

Alham Fikri Aji

MBZUAI

Citations: 8,673

h-index: 37

Ayu Purwarianti

Citations: 206

h-index: 8

Derry Wijaya

Citations: 28

h-index: 3

LLM(Large Language Models)을 저자원 언어에 적용하는 것은 어렵습니다. 레이블이 있는 데이터가 부족하고, 전체 모델 미세 조정은 불안정하며, 지속적인 교차 언어 튜닝은 재앙적인 망각을 초래할 수 있습니다. 우리는 회로를 목표로 하는 지도 미세 조정(CT-SFT)이라는 방법을 제안합니다. CT-SFT는 CD-T(Contextual Decomposition Transformer)의 반사실적(counterfactual-free) 적응 방법으로, 레이블 균형 평균 기준선을 사용하고, 임베디드 언어 체크포인트에서 작업 관련성이 높은 희소한 주의(attention) 헤드를 식별하기 위해 작업 지향적 관련성 점수를 사용합니다. 그런 다음, 헤드 수준의 그래디언트 마스킹을 통해 해당 헤드(및 LayerNorm)만 업데이트하여 대상 언어로의 전이 학습을 수행합니다. NusaX-Senti 및 XNLI 데이터셋에서 CT-SFT는 전체 모델 미세 조정보다 교차 언어 정확도를 향상시키면서 모델 파라미터의 작은 부분만 업데이트합니다. 우리는 편집을 보존하는 상호 작용을 발견했습니다. 더 어려운 전이는 편집 회로 헤드를 선호하는 반면, 더 쉬운 전이는 종종 거의 0에 가까운(즉, 관련성이 낮은 헤드) 업데이트를 선호하여 소스 메커니즘을 보존합니다. CT-SFT는 또한 재앙적인 망각을 크게 줄여 전이 과정에서 임베디드/소스 언어의 능력을 유지합니다.

Original Abstract

Adapting LLMs to low-resource languages is difficult: labeled data is scarce, full-model fine-tuning is unstable, and continued cross-lingual tuning can cause catastrophic forgetting. We propose Circuit-Targeted Supervised Fine-Tuning (CT-SFT): a counterfactual-free adaptation of CD-T (Contextual Decomposition Transformer) that uses a label-balanced mean baseline and task-directional relevance scoring to identify a sparse set of task-relevant attention heads in a proxy-language checkpoint, then transfer learns to a target language by updating only those heads (plus LayerNorm) via head-level gradient masking. Across NusaX-Senti and XNLI, CT-SFT improves cross-lingual accuracy over continued full fine-tuning while updating only a small subset of model parameters. We find an editing-preserving trade-off: harder transfers favor editing circuit heads, while easier transfers often favor near-zero (i.e., low-relevance heads) updates, preserving the source mechanism. CT-SFT also substantially reduces catastrophic forgetting, preserving proxy/source-language competence during transfer.

0 Citations

0 Influential

18.5 Altmetric

92.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!