2601.07177v1 Jan 12, 2026 cs.CR

Safe-FedLLM: 분산 대규모 언어 모델의 안전성에 대한 연구

Safe-FedLLM: Delving into the Safety of Federated Large Language Models

Ming Tao

Citations: 13

h-index: 2

Yu Tian

Citations: 21

h-index: 3

Wenxuan Tu

Citations: 12

h-index: 2

Yue Yang

Citations: 99

h-index: 2

Xue Yang

Citations: 39

h-index: 4

Xiangyan Tang

Citations: 1

h-index: 1

분산 학습(FL)은 대규모 언어 모델(LLM)의 데이터 개인 정보 보호 및 데이터 사일로 문제를 해결합니다. 기존 연구 대부분은 분산 LLM의 학습 효율성을 향상시키는 데 초점을 맞추고 있습니다. 그러나 개방형 환경에서의 보안, 특히 악의적인 클라이언트에 대한 방어는 간과되는 경향이 있습니다. 본 연구에서는 분산 학습 과정에서의 LLM의 안전성을 조사하기 위해, Low-Rank Adaptation (LoRA) 가중치의 관점에서 잠재적인 공격 표면과 방어 가능성을 분석하는 예비 실험을 수행했습니다. 실험 결과, 분산 학습 환경에서 다음과 같은 두 가지 주요 특징을 발견했습니다. 1) LLM은 분산 학습 환경에서 악의적인 클라이언트의 공격에 취약하며, 2) LoRA 가중치는 구별되는 행동 패턴을 보이며, 간단한 분류기를 통해 필터링될 수 있습니다. 이러한 특징을 바탕으로, 분산 LLM을 위한 탐지 기반 방어 프레임워크인 Safe-FedLLM을 제안합니다. Safe-FedLLM은 각 클라이언트가 분산 학습 과정에서 로컬로 학습한 LoRA 가중치를 고차원 행동 특징으로 간주하고, 경량 분류 모델을 사용하여 악의적인 속성을 가지고 있는지 판별하는 방식으로, 단계 수준, 클라이언트 수준, 섀도 수준의 세 가지 차원에서 방어를 구축합니다. 광범위한 실험을 통해 Safe-FedLLM이 정상 데이터에 대한 성능 저하 없이 분산 LLM의 방어 능력을 효과적으로 향상시킬 수 있음을 확인했습니다. 특히, 제안하는 방법은 악의적인 데이터의 영향을 효과적으로 억제하면서도 학습 속도에 큰 영향을 미치지 않으며, 많은 수의 악의적인 클라이언트가 존재하는 상황에서도 효과적입니다. 본 연구의 코드는 다음 주소에서 확인할 수 있습니다: https://github.com/dmqx/Safe-FedLLM.

Original Abstract

Federated learning (FL) addresses data privacy and silo issues in large language models (LLMs). Most prior work focuses on improving the training efficiency of federated LLMs. However, security in open environments is overlooked, particularly defenses against malicious clients. To investigate the safety of LLMs during FL, we conduct preliminary experiments to analyze potential attack surfaces and defensible characteristics from the perspective of Low-Rank Adaptation (LoRA) weights. We find two key properties of FL: 1) LLMs are vulnerable to attacks from malicious clients in FL, and 2) LoRA weights exhibit distinct behavioral patterns that can be filtered through simple classifiers. Based on these properties, we propose Safe-FedLLM, a probe-based defense framework for federated LLMs, constructing defenses across three dimensions: Step-Level, Client-Level, and Shadow-Level. The core concept of Safe-FedLLM is to perform probe-based discrimination on the LoRA weights locally trained by each client during FL, treating them as high-dimensional behavioral features and using lightweight classification models to determine whether they possess malicious attributes. Extensive experiments demonstrate that Safe-FedLLM effectively enhances the defense capability of federated LLMs without compromising performance on benign data. Notably, our method effectively suppresses malicious data impact without significant impact on training speed, and remains effective even with many malicious clients. Our code is available at: https://github.com/dmqx/Safe-FedLLM.

1 Citations

0 Influential

25.4657359028 Altmetric

128.3 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!