2603.18908v1 Mar 19, 2026 cs.AI

대규모 언어 모델의 안전한 선형 정렬

Secure Linear Alignment of Large Language Models

Citations: 152

h-index: 4

Citations: 63

h-index: 4

언어 모델은 학습 목표, 아키텍처 및 데이터 모달리티의 차이에도 불구하고, 점점 유사한 표현을 학습하는 것으로 나타납니다. 이러한 독립적으로 학습된 모델 간의 상호 호환성은 하위 작업 목표에 대한 교차 모델 정렬의 새로운 기회를 제공합니다. 또한, 보안, 개인 정보 보호 또는 경쟁 제한으로 인해 직접적인 데이터 또는 모델 공유가 불가능한 환경과 같은 새로운 잠재적 응용 분야를 열어줍니다. 본 연구에서는 표현 수렴을 활용하여 독립적인 언어 모델 간의 교차 사일로 추론을 가능하게 하는 개인 정보 보호 프레임워크를 제안합니다. 이 프레임워크는 공유된 공개 데이터 세트에 대한 선형 변환을 학습하고, 추론 과정에서 클라이언트 쿼리를 보호하기 위해 동형 암호화를 적용합니다. 이 방법은 선형 정렬 및 분류 작업만 암호화함으로써 강력한 보안 보장을 유지하면서도 초당 1초 미만의 추론 지연 시간을 달성합니다. 본 연구에서는 표현 수렴에 대한 경험적 조사를 통해 독립 모델의 최종 은닉 상태 간의 선형 변환을 학습합니다. 학습된 교차 모델 매핑을 임베딩 분류 및 이상 감지 작업에 대해 평가한 결과, 모델 쌍 간에 최소한의 성능 저하가 관찰되었습니다. 또한, 본 연구는 처음으로 선형 정렬이 때때로 독립적으로 학습된 모델 간의 텍스트 생성을 가능하게 함을 보여줍니다.

Original Abstract

Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. This emerging compatibility between independently trained models introduces new opportunities for cross-model alignment to downstream objectives. Moreover, it unlocks new potential application domains, such as settings where security, privacy, or competitive constraints prohibit direct data or model sharing. In this work, we propose a privacy-preserving framework that exploits representational convergence to enable cross-silo inference between independent language models. The framework learns an affine transformation over a shared public dataset and applies homomorphic encryption to protect client queries during inference. By encrypting only the linear alignment and classification operations, the method achieves sub-second inference latency while maintaining strong security guarantees. We support this framework with an empirical investigation into representational convergence, in which we learn linear transformations between the final hidden states of independent models. We evaluate these cross-model mappings on embedding classification and out-of-distribution detection, observing minimal performance degradation across model pairs. Additionally, we show for the first time that linear alignment sometimes enables text generation across independently trained models.

0 Citations

0 Influential

2 Altmetric

10.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!