2603.09392v1 Mar 10, 2026 cs.CV

ICDAR 2025: 복잡한 레이아웃을 위한 문서 이미지 기계 번역 엔드-투-엔드 경진대회

ICDAR 2025 Competition on End-to-End Document Image Machine Translation Towards Complex Layouts

Lu Xiang

Citations: 394

h-index: 10

Yang Zhao

Citations: 380

h-index: 10

Yu Zhou

Citations: 188

h-index: 9

Chengqing Zong

Citations: 264

h-index: 9

Yaping Zhang

Citations: 200

h-index: 7

Yupu Liang

Citations: 115

h-index: 6

Zhiyang Zhang

Citations: 124

h-index: 6

Zhiyuan Chen

Citations: 10

h-index: 2

문서 이미지 기계 번역(DIMT)은 문서 이미지에 포함된 텍스트를 한 언어에서 다른 언어로 번역하는 기술로, 텍스트 내용과 페이지 레이아웃을 함께 모델링하여 광학 문자 인식(OCR)과 자연어 처리(NLP)를 연결합니다. DIMT 2025 챌린지는 다중 모드 문서 이해 분야에서 빠르게 발전하고 있는 엔드-투-엔드 문서 이미지 번역 연구를 발전시키는 것을 목표로 합니다. 이 대회는 OCR-프리(OCR을 사용하지 않음) 및 OCR-기반 트랙으로 구성되며, 각 트랙은 작은 모델(10억 개 미만의 파라미터)과 큰 모델(10억 개 이상의 파라미터)을 위한 두 가지 하위 작업으로 나뉩니다. 참가자들은 하나의 통합된 DIMT 시스템을 제출하며, 제공된 OCR 전사본을 통합할 수 있는 옵션이 있습니다. 2024년 12월 10일부터 2025년 4월 20일까지 진행된 이 대회에는 총 69개 팀이 참가하여 27개의 유효한 결과물을 제출했습니다. 트랙 1에는 34개 팀이 참가하여 13개의 유효한 결과물을 제출했고, 트랙 2에는 35개 팀이 참가하여 14개의 유효한 결과물을 제출했습니다. 본 보고서에서는 챌린지의 목표, 데이터셋 구축 과정, 작업 정의, 평가 프로토콜, 그리고 결과 요약 내용을 제시합니다. 분석 결과, 큰 모델 기반 접근 방식은 복잡한 레이아웃의 문서 이미지를 번역하는 데 있어 유망한 새로운 패러다임을 제시하며, 향후 연구를 위한 상당한 기회를 보여줍니다.

Original Abstract

Document Image Machine Translation (DIMT) seeks to translate text embedded in document images from one language to another by jointly modeling both textual content and page layout, bridging optical character recognition (OCR) and natural language processing (NLP). The DIMT 2025 Challenge advances research on end-to-end document image translation, a rapidly evolving area within multimodal document understanding. The competition features two tracks, OCR-free and OCR-based, each with two subtasks for small (less than 1B parameters) and large (greater than 1B parameters) models. Participants submit a single unified DIMT system, with the option to incorporate provided OCR transcripts. Running from December 10, 2024 to April 20, 2025, the competition attracted 69 teams and 27 valid submissions in total. Track 1 had 34 teams and 13 valid submissions, while Track 2 had 35 teams and 14 valid submissions. In this report, we present the challenge motivation, dataset construction, task definitions, evaluation protocol, and a summary of results. Our analysis shows that large-model approaches establish a promising new paradigm for translating complex-layout document images and highlight substantial opportunities for future research.

1 Citations

0 Influential

5 Altmetric

26.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!