2604.27272v1 Apr 29, 2026 cs.CL

2차원 작업과 1차원 직렬화의 만남: 구조화된 작업에서의 직렬화 문제

When 2D Tasks Meet 1D Serialization: On Serialization Friction in Structured Tasks

Y. Bengio

Citations: 1,243

h-index: 11

Diji Yang

Citations: 281

h-index: 7

Yunkai Zhang

Citations: 22

h-index: 3

Chung-Hsiang Lo

Citations: 2

h-index: 1

Lu Li

Citations: 117

h-index: 5

Tianyu Zhang

Citations: 4

h-index: 1

Yi Zhang

Citations: 5,302

h-index: 16

대규모 언어 모델(LLM)은 일반적으로 구조화된 입력을 1차원 토큰 시퀀스로 처리합니다. 산문과 같은 텍스트에는 자연스럽지만, 행-열 정렬 및 지역적 관계와 같이 명시적인 2차원 구조에 의존하는 작업의 경우, 이러한 선형화는 추가적인 표현 부담을 초래할 수 있습니다. 본 연구에서는 이러한 현상을 '직렬화 문제'라고 정의하고, 명시적인 2차원 구조를 가진 합성 작업(행렬 전치, Conway's Game of Life, LU 분해)을 사용하여 이를 분석합니다. 이를 위해, 텍스트 기반의 언어 모델 경로와 동일한 언어 모델 기반의 비전 증강 경로를 비교합니다. 비전 증강 경로는 동일한 내용을 작업에 충실한 2차원 레이아웃으로 표현하여 입력합니다. 이러한 비교를 통해, 다양한 작업 및 설정에서 비전 경로가 텍스트 경로보다 일관되게 더 우수한 성능을 보입니다. 특히, 차원이 커질수록 성능 차이가 더 커지며, 직렬화로 인한 오류 패턴은 공간적으로 더욱 구조화되는 경향을 보입니다. 이러한 결과는 입력 표현 방식과 모델 성능 간의 관계가 추가적인 연구가 필요함을 시사하며, 작업 관련 2차원 레이아웃을 유지하는 것이 구조화된 2차원 작업에 있어 유망한 방향임을 제안합니다.

Original Abstract

Large language models (LLMs) conventionally process structured inputs as 1D token sequences. While natural for prose, such linearization may introduce additional representational burden for tasks whose computation depends directly on explicit 2D structure, because row--column alignment and local neighborhoods are no longer directly expressed in the input. We study this setting, which we refer to as serialization friction, on a small diagnostic testbed of synthetic tasks with explicit 2D structure: matrix transpose, Conway's Game of Life, and LU decomposition. To examine this question, we compare a text-only language pathway over serialized inputs with a vision-augmented pathway, built on the same language backbone, that receives the same underlying content rendered in task-faithful 2D layout, yielding a system-level comparison between two end-to-end input pathways. Across the tasks and settings we study, the visual pathway consistently outperforms the textual pathway; the gap often widens at larger dimensions, and error patterns under serialization become increasingly spatially structured. These findings indicate that the relationship between input representation and model performance on such tasks warrants further investigation, and suggest that preserving task-relevant 2D layout is a promising direction for structured 2D tasks.

0 Citations

0 Influential

8 Altmetric

40.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!