2602.14763v1 Feb 16, 2026 cs.CL

대규모 언어 모델에서 기계 번역 성능 향상을 위한 추론 능력 활용

Unlocking Reasoning Capability on Machine Translation in Large Language Models

Sara Rajaee

Citations: 54

h-index: 5

Sebastian Vincent

Citations: 10

h-index: 1

Marzieh Fadaee

University of Amsterdam, Zeta Alpha Vector

Citations: 2,845

h-index: 23

Kelly Marchisio

Citations: 2

h-index: 1

Tom Kocmi

Citations: 463

h-index: 8

Alexandre Berard

Cohere

Citations: 1,145

h-index: 16

추론 능력을 갖춘 대규모 언어 모델(RLM)은 수학 및 코딩과 같은 작업에서 명시적인 중간 추론 과정을 생성하여 상당한 성능 향상을 보입니다. 그러나 이러한 모델이 기계 번역(MT)에 미치는 영향은 아직 충분히 연구되지 않았습니다. 본 연구에서는 WMT24++ 벤치마크를 사용하여 공개 및 비공개 가중치를 가진 다양한 RLM을 체계적으로 평가한 결과, 명시적인 추론 기능을 활성화하면 언어 및 모델에 관계없이 번역 품질이 일관적으로 저하되는 것으로 나타났습니다. 분석 결과, 기계 번역 추론 과정은 매우 단순하며 수정, 자기 교정 및 대체 번역 탐색이 부족하여 유용성이 제한되는 것으로 밝혀졌습니다. 또한, 더 강력한 모델에서 생성된 고품질 추론 과정을 약한 모델에 적용해도 성능 향상이 확실하게 나타나지 않았습니다. 이러한 문제점을 해결하기 위해, 다단계 초안 작성, 적절성 개선, 유창성 향상 및 선택적 반복 수정에 기반한 번역에 특화된 구조화된 추론 프레임워크를 제안합니다. 동적 구조화된 추론 과정을 포함하는 합성 데이터셋을 구축하고, 이를 사용하여 대규모 추론 모델을 추가 학습시켰습니다. 실험 결과, 제안하는 방법은 기존의 번역 미세 조정 방법 및 일반적인 추론 기능을 활용한 방법보다 상당한 성능 향상을 보이는 것으로 나타났습니다. 본 연구 결과는 추론이 기계 번역 성능 향상을 위해서는 작업에 특화된 구조를 가져야 함을 보여줍니다.

Original Abstract

Reasoning-oriented large language models (RLMs) achieve strong gains on tasks such as mathematics and coding by generating explicit intermediate reasoning. However, their impact on machine translation (MT) remains underexplored. We systematically evaluate several open- and closed-weights RLMs on the WMT24++ benchmark and find that enabling explicit reasoning consistently degrades translation quality across languages and models. Analysis reveals that MT reasoning traces are highly linear, lacking revision, self-correction and exploration of alternative translations, which limits their usefulness. Furthermore, injecting higher-quality reasoning traces from stronger models does not reliably improve weaker models' performance. To address this mismatch, we propose a structured reasoning framework tailored to translation, based on multi-step drafting, adequacy refinement, fluency improvement, and selective iterative revision. We curate a synthetic dataset of dynamic structured reasoning traces and post-train a large reasoning model on this data. Experiments show significant improvements over standard translation fine-tuning and injected generic reasoning baselines. Our findings demonstrate that reasoning must be task-structured to benefit MT.

1 Citations

0 Influential

11.5 Altmetric

58.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!