2602.20459v1 Feb 24, 2026 cs.AI

PreScience: 과학적 기여 예측을 위한 벤치마크

PreScience: A Benchmark for Forecasting Scientific Contributions

Anirudh Ajith

Citations: 752

h-index: 5

Jay DeYoung

Northeastern University

Citations: 1,503

h-index: 15

Nadav Kunievsky

Citations: 12

h-index: 2

Austin C. Kozlowski

Citations: 154

h-index: 4

Oyvind Tafjord

Citations: 13,925

h-index: 33

Daniel S. Weld

Citations: 163

h-index: 8

Tom Hope

Citations: 164

h-index: 5

Doug Downey

Citations: 49

h-index: 4

Amanpreet Singh

Citations: 482

h-index: 7

James Evans

Citations: 77

h-index: 2

특정 시점까지 축적된 과학적 기록을 기반으로 학습된 AI 시스템이 이후의 과학적 발전을 예측할 수 있을까요? 이러한 능력은 연구자들이 협력자를 찾고 영향력 있는 연구 방향을 파악하며, 앞으로 어떤 문제와 방법이 중요해질지 예측하는 데 도움이 될 수 있습니다. 본 연구에서는 과학적 예측 벤치마크인 PreScience를 소개합니다. PreScience는 연구 과정을 협력자 예측, 선행 연구 선택, 기여 생성, 영향 예측이라는 네 가지 상호 의존적인 생성 작업으로 분해합니다. PreScience는 최근 AI 관련 연구 논문 98,000건으로 구성된 신중하게 큐레이션된 데이터 세트로, 명확하게 식별된 저자 정보, 시간적으로 정렬된 학술 메타데이터, 그리고 502,000건의 전체 논문에 대한 협력 저자의 출판 이력 및 인용 정보가 구조화된 그래프로 구성되어 있습니다. 각 작업에 대한 기준 모델과 평가 방법을 개발했으며, 특히 기여 유사성을 측정하는 새로운 LLM 기반 지표인 LACERScore를 개발했습니다. LACERScore는 기존 지표보다 성능이 뛰어나며, 평가자 간 일치도를 근사합니다. 각 작업에서 상당한 발전의 여지가 남아 있으며, 예를 들어 기여 생성 작업에서 최첨단 LLM은 실제 기여도와 비교적 낮은 유사성을 보입니다 (GPT-5의 경우 평균 10점 만점에 5.6점). 이러한 작업들을 12개월 동안의 과학적 생산 시뮬레이션에 통합했을 때, 생성된 합성 데이터는 동일 기간의 인간이 작성한 연구 자료보다 다양성이 낮고 참신성이 떨어지는 것으로 나타났습니다.

Original Abstract

Can AI systems trained on the scientific record up to a fixed point in time forecast the scientific advances that follow? Such a capability could help researchers identify collaborators and impactful research directions, and anticipate which problems and methods will become central next. We introduce PreScience -- a scientific forecasting benchmark that decomposes the research process into four interdependent generative tasks: collaborator prediction, prior work selection, contribution generation, and impact prediction. PreScience is a carefully curated dataset of 98K recent AI-related research papers, featuring disambiguated author identities, temporally aligned scholarly metadata, and a structured graph of companion author publication histories and citations spanning 502K total papers. We develop baselines and evaluations for each task, including LACERScore, a novel LLM-based measure of contribution similarity that outperforms previous metrics and approximates inter-annotator agreement. We find substantial headroom remains in each task -- e.g. in contribution generation, frontier LLMs achieve only moderate similarity to the ground-truth (GPT-5, averages 5.6 on a 1-10 scale). When composed into a 12-month end-to-end simulation of scientific production, the resulting synthetic corpus is systematically less diverse and less novel than human-authored research from the same period.

2 Citations

0 Influential

16.5 Altmetric

84.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!