2601.15737v1 Jan 22, 2026 cs.AI

PhysProver: 물리학을 위한 자동 정리 증명의 발전

PhysProver: Advancing Automatic Theorem Proving for Physics

Hanning Zhang

Citations: 285

h-index: 6

Ruida Wang

Citations: 215

h-index: 6

Rui Pan

Citations: 419

h-index: 11

Wenyuan Wang

Citations: 5

h-index: 2

Bingxu Meng

Citations: 4

h-index: 2

Tong Zhang

Citations: 68

h-index: 5

검증 가능한 언어와 거대 언어 모델(LLM)의 결합은 정리 증명을 위한 엄밀한 기초를 제공함으로써 수학 및 컴퓨터 과학 커뮤니티에 지대한 영향을 미쳤습니다. 이 분야의 최근 발전은 파운데이션 모델과 정교한 에이전트 시스템을 통해 형식적 수학 추론의 한계를 LLM의 자연어 능력에 근접하도록 확장하고 있습니다. 그러나 유사한 문제 해결 및 정리 증명 프레임워크에 크게 의존함에도 불구하고, 형식적 물리학 추론에는 상대적으로 관심이 부족했습니다. 이 문제를 해결하기 위해 본 논문은 물리학 도메인에서 형식적 정리 증명을 향상시키기 위한 최초의 접근 방식을 제시합니다. 우리는 이를 위해 전용 데이터셋인 PhysLeanData를 구축했습니다. 이 데이터셋은 PhysLean에서 추출한 정리들과 추측 기반의 형식 데이터 생성 파이프라인을 통해 생성된 데이터로 구성됩니다. 학습 과정에서 우리는 강력한 오픈 소스 수학 정리 증명기인 DeepSeek-Prover-V2-7B를 활용하고, 검증 가능한 보상을 활용한 강화 학습(RLVR)을 적용하여 PhysProver 모델을 학습시켰습니다. 포괄적인 실험 결과, PhysProver는 약 5천 개의 학습 샘플만으로 여러 하위 도메인에서 전체적으로 2.4%의 성능 향상을 달성했습니다. 또한, 형식적 물리학 학습 후 MiniF2F-Test 벤치마크에서 1.3%의 성능 향상이 관찰되었는데, 이는 물리학 도메인을 넘어선 상당한 일반화 능력과 형식적 수학 능력의 향상을 나타냅니다. 이러한 결과는 수학 도메인 외부로 형식적 증명기를 확장하는 패러다임을 제공하는 우리 접근 방식의 효과와 효율성을 강조합니다. 후속 연구를 장려하기 위해 데이터셋과 모델을 커뮤니티에 공개할 예정입니다.

Original Abstract

The combination of verifiable languages and LLMs has significantly influenced both the mathematical and computer science communities because it provides a rigorous foundation for theorem proving. Recent advancements in the field provide foundation models and sophisticated agentic systems pushing the boundaries of formal mathematical reasoning to approach the natural language capability of LLMs. However, little attention has been given to the formal physics reasoning, which also heavily relies on similar problem-solving and theorem-proving frameworks. To solve this problem, this paper presents, to the best of our knowledge, the first approach to enhance formal theorem proving in the physics domain. We compose a dedicated dataset PhysLeanData for the task. It is composed of theorems sampled from PhysLean and data generated by a conjecture-based formal data generation pipeline. In the training pipeline, we leverage DeepSeek-Prover-V2-7B, a strong open-source mathematical theorem prover, and apply Reinforcement Learning with Verifiable Rewards (RLVR) to train our model PhysProver. Comprehensive experiments demonstrate that, using only $\sim$5K training samples, PhysProver achieves an overall 2.4\% improvement in multiple sub-domains. Furthermore, after formal physics training, we observe 1.3\% gains on the MiniF2F-Test benchmark, which indicates non-trivial generalization beyond physics domains and enhancement for formal math capability as well. The results highlight the effectiveness and efficiency of our approach, which provides a paradigm for extending formal provers outside mathematical domains. To foster further research, we will release both our dataset and model to the community.

2 Citations

0 Influential

5.5 Altmetric

29.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!