2606.09585v1 Jun 08, 2026 cs.AI

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

Dongjie Cheng
Dongjie Cheng
Citations: 4
h-index: 1
Wenjie Li
Wenjie Li
Citations: 18
h-index: 2
Yongqing Li
Yongqing Li
Citations: 260
h-index: 7
Heming Xia
Heming Xia
Citations: 421
h-index: 7
Yutong Bian
Yutong Bian
Citations: 6
h-index: 1

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both textual rationales and visual evidence. In this work, we propose a bolder and more ambitious idea: could images alone serve as the reasoning medium for both language and multimodal tasks? To explore this, we propose optical reasoning, which treats images as a standalone reasoning medium. We instantiate this concept with two variants: typographic-based optical reasoning, which optimizes visual layouts for compact rationale rendering, and graphical-based optical reasoning, which composes text and graphical elements into structured visual rationales. Across mathematical, scientific, and interleaved-modal reasoning benchmarks, optical reasoning can match or even exceed traditional text reasoning while reducing reasoning tokens by an average of 28.57% on language tasks and 16% on multimodal tasks, achieving 1.96 times the token efficiency of text reasoning. These results show that images can effectively and efficiently encode rationales while providing a unified visual canvas for reasoning.

0 Citations
0 Influential
3.5 Altmetric
17.5 Score
Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

Log in to request an AI analysis.

댓글

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!