2601.21164v2 Jan 29, 2026 cs.AI

간결한 기하학적 설명: 평면 기하 문제 해결을 위한 LLM의 잠재력을 극대화하는 다리

Concise Geometric Description as a Bridge: Unleashing the Potential of LLM for Plane Geometry Problem Solving

Dian Li

Citations: 39

h-index: 3

Jiahong Yan

Citations: 17

h-index: 2

Guoliang Kang

Citations: 43

h-index: 3

Jingyun Wang

Citations: 46

h-index: 3

Xiaohan Wang

Citations: 10

h-index: 1

Gang Liu

Citations: 37

h-index: 3

평면 기하 문제 해결(PGPS)은 기하학적 다이어그램과 문제에 대한 텍스트 설명을 기반으로 평면 기하 문제를 해결하는 다중 모드 추론 작업입니다. 대규모 언어 모델(LLM)은 강력한 추론 능력을 가지고 있지만, 시각적 다이어그램을 처리할 수 없기 때문에 PGPS에 직접 적용하는 데 어려움이 있습니다. 기존 연구에서는 일반적으로 대규모 PGPS 데이터셋을 사용하여 다중 모드 LLM(MLLM)을 전체적으로 미세 조정하여 시각적 이해 및 추론 능력을 동시에 향상시킵니다. 그러나 이러한 공동 최적화는 기본 LLM의 고유한 추론 능력을 저해할 수 있습니다. 본 연구에서는 LLM 자체가 적절하게 시각 정보를 텍스트 설명으로 표현할 때 강력한 PGPS 해결 도구가 될 수 있음을 확인했습니다. 우리는 시각적 다이어그램에 대한 기하학적 설명을 생성하는 MLLM 인터프리터를 학습시키고, 상용 LLM을 활용하여 추론을 수행하는 방법을 제안합니다. 특히, 간결성이 MLLM 인터프리터 학습을 용이하게 하는 조건부 선언 언어(CDL)를 기하학적 설명으로 사용합니다. MLLM 인터프리터는 CoT(Chain-of-Thought) 증강된 SFT(Supervised Fine-Tuning)를 통해 학습된 후, GRPO(Generative Reinforcement Proximal Optimization)를 사용하여 CDL을 생성합니다. 기존의 정답과 추론 결과를 비교하는 솔루션 기반 보상 대신, CDL 매칭 보상을 설계하여 GRPO 학습을 더욱 효과적으로 진행하고, CDL 생성을 위한 더 직접적이고 밀도 있는 지침을 제공합니다. 학습을 지원하기 위해, 수동으로 Formalgeo7k v2를 검토하고 CoT 주석을 추가하여 새로운 데이터셋인 Formalgeo7k-Rec-CoT를 구축했습니다. Formalgeo7k-Rec-CoT, Unigeo, 및 MathVista 데이터셋에 대한 광범위한 실험 결과, 저희 방법(5.5k 데이터로만 미세 조정)이 선도적인 오픈 소스 및 상용 MLLM에 비해 우수한 성능을 보이는 것을 확인했습니다.

Original Abstract

Plane Geometry Problem Solving (PGPS) is a multimodal reasoning task that aims to solve a plane geometric problem based on a geometric diagram and problem textual descriptions. Although Large Language Models (LLMs) possess strong reasoning skills, their direct application to PGPS is hindered by their inability to process visual diagrams. Existing works typically fine-tune Multimodal LLMs (MLLMs) end-to-end on large-scale PGPS data to enhance visual understanding and reasoning simultaneously. However, such joint optimization may compromise base LLMs' inherent reasoning capability. In this work, we observe that LLM itself is potentially a powerful PGPS solver when appropriately formulating visual information as textual descriptions. We propose to train a MLLM Interpreter to generate geometric descriptions for the visual diagram, and an off-the-shelf LLM is utilized to perform reasoning. Specifically, we choose Conditional Declaration Language (CDL) as the geometric description as its conciseness eases the MLLM Interpreter training. The MLLM Interpreter is fine-tuned via CoT (Chain-of-Thought)-augmented SFT followed by GRPO to generate CDL. Instead of using a conventional solution-based reward that compares the reasoning result with the ground-truth answer, we design CDL matching rewards to facilitate more effective GRPO training, which provides more direct and denser guidance for CDL generation. To support training, we construct a new dataset, Formalgeo7k-Rec-CoT, by manually reviewing Formalgeo7k v2 and incorporating CoT annotations. Extensive experiments on Formalgeo7k-Rec-CoT, Unigeo, and MathVista show our method (finetuned on only 5.5k data) performs favorably against leading open-source and closed-source MLLMs.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!