2603.14756v1 Mar 16, 2026 cs.CL

추론 단계에서의 개인 정보 보호 기계 번역 연구: 새로운 과제 및 벤치마크

Towards Privacy-Preserving Machine Translation at the Inference Stage: A New Task and Benchmark

Wei Shao

Citations: 146

h-index: 4

Lemao Liu

Citations: 10

h-index: 1

Yinqiao Li

Citations: 340

h-index: 10

Guoping Huang

Citations: 4

h-index: 2

Shuming Shi

Citations: 42

h-index: 3

Linqi Song

Citations: 22

h-index: 3

현재 온라인 번역 서비스는 사용자의 텍스트를 클라우드 서버로 전송해야 하며, 텍스트에 민감한 정보가 포함될 경우 개인 정보 유출의 위험이 있습니다. 이러한 위험은 개인 정보 보호가 중요한 환경에서 온라인 번역 서비스의 활용을 저해합니다. 온라인 번역 서비스의 이러한 위험을 완화하는 방법 중 하나는 번역 모델의 추론 단계에서 개인 정보 보호 메커니즘을 도입하는 것입니다. 그러나 텍스트 분류 및 요약과 같은 자연어 처리의 하위 분야와 비교했을 때, 기계 번역 연구 커뮤니티는 추론 단계에서의 개인 정보 보호에 대한 연구가 제한적으로 이루어져 왔습니다. 추론 단계에 대한 명확하게 정의된 개인 정보 보호 과제, 전용 평가 데이터셋 및 지표, 그리고 참고 벤치마크 방법이 존재하지 않습니다. 이러한 요소의 부재는 연구자들이 이 분야를 심층적으로 탐구하는 데 심각한 제약을 가했습니다. 이러한 격차를 해소하기 위해, 본 논문에서는 텍스트 내의 개인 정보를 모델 추론 단계에서 보호하는 것을 목표로 하는 새로운 "개인 정보 보호 기계 번역(PPMT)" 과제를 제안합니다. 이 과제를 위해, 세 개의 벤치마크 테스트 데이터셋을 구축하고, 이에 상응하는 평가 지표를 설계했으며, 이 과제를 위한 시작점이 될 수 있는 일련의 벤치마크 방법을 제안했습니다. 개인 정보의 정의는 복잡하고 다양합니다. 이름이 포함된 개체(named entities)가 종종 많은 개인 정보 및 상업적 비밀을 포함한다는 점을 고려하여, 본 연구에서는 텍스트 내의 이름이 포함된 개체의 개인 정보 보호에 중점을 두었습니다. 본 연구가 기계 번역 분야의 개인 정보 보호 문제에 대한 새로운 관점과 견고한 기반을 제공할 것으로 기대합니다.

Original Abstract

Current online translation services require sending user text to cloud servers, posing a risk of privacy leakage when the text contains sensitive information. This risk hinders the application of online translation services in privacy-sensitive scenarios. One way to mitigate this risk for online translation services is introducing privacy protection mechanisms targeting the inference stage of translation models. However, compared to subfields of NLP like text classification and summarization, the machine translation research community has limited exploration of privacy protection during the inference stage. There is no clearly defined privacy protection task for the inference stage, dedicated evaluation datasets and metrics, and reference benchmark methods. The absence of these elements has seriously constrained researchers' in-depth exploration of this direction. To bridge this gap, this paper proposes a novel "Privacy-Preserving Machine Translation" (PPMT) task, aiming to protect the private information in text during the model inference stage. For this task, we constructed three benchmark test datasets, designed corresponding evaluation metrics, and proposed a series of benchmark methods as a starting point for this task. The definition of privacy is complex and diverse. Considering that named entities often contain a large amount of personal privacy and commercial secrets, we have focused our research on protecting only the named entity's privacy in the text. We expect this research work will provide a new perspective and a solid foundation for the privacy protection problem in machine translation.

0 Citations

0 Influential

5 Altmetric

25.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!