2601.12535v2 Jan 18, 2026 cs.CL

라운드트립 강화 학습을 통한 저자원 기계 번역 성능 향상

Improving Low-Resource Machine Translation via Round-Trip Reinforcement Learning

Alham Fikri Aji

MBZUAI

Citations: 8,673

h-index: 37

Ahmed Adel Attia

University of Maryland

Citations: 94

h-index: 5

저자원 기계 번역(MT)은 저자원 언어 커뮤니티에서 수집된 병렬 데이터의 증가로 인해 점점 더 많은 관심을 받고 있지만, 저자원 MT 성능을 향상시킬 수 있는 잠재적인 방법은 여전히 많이 연구되지 않았습니다. 본 연구에서는 No Language Left Behind (NLLB) 모델 패밀리를 활용하여, 라운드트립 부트스트래핑을 이용한 자체 지도 강화 학습 기반의 번역 미세 조정을 통해 저자원 환경에서의 번역 성능을 향상시키는 방법을 탐구합니다. 저희의 접근 방식은 영어를 대상 저자원 언어로 번역한 후, 다시 영어로 번역하는 과정을 거치며, 재구성된 영어 문장에 대해 chrF++ 및 BLEU를 결합한 함수를 보상 함수로 사용합니다. NLLB-MD 데이터셋을 사용하여 6억 개와 13억 개 파라미터를 가진 NLLB 모델을 모두 평가했으며, 중앙 아이마라어, 프리울리아어, 울로프어 및 러시아어에서 일관된 성능 향상을 관찰했습니다. 번역 결과의 질적 검토 결과, 유창성과 의미 충실도가 향상된 것을 확인할 수 있었습니다. 저희는 본 방법이 규모의 확장에 따라 더 큰 효과를 발휘할 수 있으며, 모델이 사전 학습된 지식을 더욱 효과적으로 활용하고 자체 개선 능력을 향상시킬 수 있다고 주장합니다. 관련 코드는 GitHub에서 확인할 수 있습니다: https://github.com/Copticoder/thesis-nllb-bootstrap-grpo

Original Abstract

Low-resource machine translation (MT) has gained increasing attention as parallel data from low-resource language communities is collected, but many potential methods for improving low-resource MT remain unexplored. We investigate a self-supervised reinforcement-learning-based fine-tuning for translation in low-resource settings using round-trip bootstrapping with the No Language Left Behind (NLLB) family of models. Our approach translates English into a target low-resource language and then back into English, using a combination of chrF++ and BLEU as the reward function on the reconstructed English sentences. Using the NLLB-MD dataset, we evaluate both the 600M and 1.3B parameter NLLB models and observe consistent improvements for the following languages: Central Aymara, Friulian, Wolof and Russian. Qualitative inspection of translation outputs indicates increased fluency and semantic fidelity. We argue that our method can further benefit from scale, enabling models to increasingly leverage their pretrained knowledge and continue self-improving. The code is available on github: https://github.com/Copticoder/thesis-nllb-bootstrap-grpo.

0 Citations

0 Influential

38.5 Altmetric

192.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!