2601.16282v1 Jan 22, 2026 cs.CL

문헌 기반 과학 이론 자동 생성: 대규모 적용

Generating Literature-Driven Scientific Theories at Scale

Daniel S. Weld

Citations: 163

h-index: 8

Doug Downey

Citations: 49

h-index: 4

P. Jansen

Citations: 9

h-index: 1

Peter Clark

Citations: 51

h-index: 2

최근 자동화된 과학 연구 분야는 주로 과학 실험을 수행하는 에이전트에 초점을 맞춰 왔지만, 이론 구축과 같은 고차원적인 과학 활동을 수행하는 시스템은 상대적으로 연구가 부족합니다. 본 연구에서는 대규모 과학 문헌 데이터에서 질적 및 양적 법칙으로 구성된 이론을 합성하는 문제를 정의합니다. 우리는 13,700편의 논문을 사용하여 2,900개의 이론을 생성하는 방식으로, 문헌 기반 지식과 파라미터 기반 지식을 활용한 이론 생성 방식, 그리고 정확도 중심과 독창성 중심의 생성 목표가 이론의 특성에 미치는 영향을 분석했습니다. 실험 결과, 파라미터 기반 LLM 메모리를 사용하는 방식과 비교했을 때, 본 연구에서 제안하는 문헌 기반 방법은 기존 증거와 일치하는 정도와 후속적으로 작성된 4,600편의 논문에서 예측하는 결과의 정확성 모두에서 현저히 더 나은 이론을 생성하는 것으로 나타났습니다.

Original Abstract

Contemporary automated scientific discovery has focused on agents for generating scientific experiments, while systems that perform higher-level scientific activities such as theory building remain underexplored. In this work, we formulate the problem of synthesizing theories consisting of qualitative and quantitative laws from large corpora of scientific literature. We study theory generation at scale, using 13.7k source papers to synthesize 2.9k theories, examining how generation using literature-grounding versus parametric knowledge, and accuracy-focused versus novelty-focused generation objectives change theory properties. Our experiments show that, compared to using parametric LLM memory for generation, our literature-supported method creates theories that are significantly better at both matching existing evidence and at predicting future results from 4.6k subsequently-written papers

0 Citations

0 Influential

4 Altmetric

20.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!