2601.21164v1 Jan 29, 2026 cs.AI

가교로서의 간결한 기하학적 기술: 평면 기하 문제 해결을 위한 LLM의 잠재력 극대화

Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving

Dian Li

Citations: 39

h-index: 3

Jiahong Yan

Citations: 17

h-index: 2

Guoliang Kang

Citations: 43

h-index: 3

Jingyun Wang

Citations: 46

h-index: 3

Xiaohan Wang

Citations: 10

h-index: 1

Gang Liu

Citations: 37

h-index: 3

평면 기하 문제 해결(PGPS)은 기하학적 도형과 문제의 텍스트 설명에 기초하여 평면 기하 문제를 해결하는 것을 목표로 하는 멀티모달 추론 작업입니다. 대형 언어 모델(LLM)은 강력한 추론 능력을 가지고 있지만, 시각적 도형을 처리할 수 없기 때문에 PGPS에 직접 적용하는 데에는 한계가 있습니다. 기존 연구들은 시각적 이해와 추론을 동시에 향상시키기 위해 대규모 PGPS 데이터에 대해 멀티모달 LLM(MLLM)을 종단간(end-to-end)으로 미세 조정하는 것이 일반적입니다. 그러나 이러한 결합 최적화는 기본 LLM의 고유한 추론 능력을 저하시킬 수 있습니다. 본 연구에서는 시각적 정보를 텍스트 설명으로 적절히 구성할 경우 LLM 자체가 강력한 PGPS 해결사가 될 수 있다는 점에 주목했습니다. 우리는 시각적 도형에 대한 기하학적 설명을 생성하도록 MLLM 해석기(Interpreter)를 훈련시키고, 추론 수행에는 기성(off-the-shelf) LLM을 활용할 것을 제안합니다. 구체적으로, 우리는 기하학적 설명으로 조건부 선언 언어(CDL)를 선택했는데, 이는 그 간결함이 MLLM 해석기 훈련을 용이하게 하기 때문입니다. MLLM 해석기는 CDL을 생성하기 위해 CoT(Chain-of-Thought)가 보강된 SFT(지도 미세 조정)와 그에 뒤이은 GRPO를 통해 미세 조정됩니다. 추론 결과를 정답(ground-truth)과 비교하는 기존의 솔루션 기반 보상을 사용하는 대신, 우리는 더 효과적인 GRPO 훈련을 촉진하기 위해 CDL 매칭 보상을 설계하여 CDL 생성에 대해 더 직접적이고 밀도 높은 가이드를 제공합니다. 훈련을 지원하기 위해, 우리는 Formalgeo7k v2를 수동으로 검토하고 CoT 주석을 통합하여 새로운 데이터셋인 Formalgeo7k-Rec-CoT를 구축했습니다. Formalgeo7k-Rec-CoT, Unigeo, MathVista에 대한 광범위한 실험 결과, 우리의 방법(불과 5.5k 데이터로 미세 조정됨)이 주요 오픈 소스 및 비공개 소스 MLLM들과 비교하여 우수한 성능을 보임을 확인했습니다.

Original Abstract

Plane Geometry Problem Solving (PGPS) is a multimodal reasoning task that aims to solve a plane geometric problem based on a geometric diagram and problem textual descriptions. Although Large Language Models (LLMs) possess strong reasoning skills, their direct application to PGPS is hindered by their inability to process visual diagrams. Existing works typically fine-tune Multimodal LLMs (MLLMs) end-to-end on large-scale PGPS data to enhance visual understanding and reasoning simultaneously. However, such joint optimization may compromise base LLMs' inherent reasoning capability. In this work, we observe that LLM itself is potentially a powerful PGPS solver when appropriately formulating visual information as textual descriptions. We propose to train a MLLM Interpreter to generate geometric descriptions for the visual diagram, and an off-the-shelf LLM is utilized to perform reasoning. Specifically, we choose Conditional Declaration Language (CDL) as the geometric description as its conciseness eases the MLLM Interpreter training. The MLLM Interpreter is fine-tuned via CoT (Chain-of-Thought)-augmented SFT followed by GRPO to generate CDL. Instead of using a conventional solution-based reward that compares the reasoning result with the ground-truth answer, we design CDL matching rewards to facilitate more effective GRPO training, which provides more direct and denser guidance for CDL generation. To support training, we construct a new dataset, Formalgeo7k-Rec-CoT, by manually reviewing Formalgeo7k v2 and incorporating CoT annotations. Extensive experiments on Formalgeo7k-Rec-CoT, Unigeo, and MathVista show our method (finetuned on only 5.5k data) performs favorably against leading open-source and closed-source MLLMs.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!