2604.12243v1 Apr 14, 2026 cs.CL

지속적인 지식 대사: 진화하는 문헌을 통해 과학적 가설 생성

Continuous Knowledge Metabolism: Generating Scientific Hypotheses from Evolving Literature

Jin Tao

Citations: 2

h-index: 1

Yubo Wang

University of Waterloo

Citations: 2,588

h-index: 9

Xiaoyu Liu

Citations: 27

h-index: 1

Menglin Yang

Citations: 54

h-index: 3

과학적 가설 생성은 현재 알려진 정보뿐만 아니라 지식이 어떻게 변화하는지를 추적해야 합니다. 본 연구에서는 과학 문헌을 슬라이딩 타임 윈도우를 통해 처리하고, 새로운 연구 결과가 도착함에 따라 구조화된 지식 기반을 점진적으로 업데이트하는 프레임워크인 Continuous Knowledge Metabolism (CKM)을 소개합니다. 우리는 효율성이 뛰어나며 예측 정확도가 높은 CKM-Lite 변형을 제시합니다. CKM-Lite는 점진적인 축적을 통해 배치 처리에 비해 히트율 (+2.8%, p=0.006), 가설 생성량 (+3.6, p<0.001), 최적 매칭 정확도 (+0.43, p<0.001)에서 우수한 성능을 보이며, 토큰 비용은 92% 절감합니다. 이러한 차이가 발생하는 원인을 이해하기 위해, 각 새로운 연구 결과를 '새로운', '확인', 또는 '모순'으로 분류하고, 지식 변화 신호를 감지하며, 전체 발전 경로를 기반으로 가설 생성을 조건부로 설정하는 CKM-Full 변형을 개발했습니다. CKM-Full을 사용하여 50개의 연구 주제에 걸쳐 생성된 892개의 가설과, 다른 변형들의 결과를 함께 분석한 결과, 다음과 같은 네 가지 경험적 관찰 결과를 얻었습니다. (1) 점진적인 처리가 예측 정확도 및 효율성 지표 측면에서 배치 기반보다 우수합니다. (2) 변화 인지 기능은 LLM 평가에서 더 높은 참신성(Cohen's d=3.46)과 관련이 있지만, 예측 정확도는 낮아져 품질-범위 간의 균형이 존재합니다. (3) 연구 분야의 안정성은 가설 성공과 관련이 있으며(r=-0.28, p=0.051), 이는 문헌 기반 예측의 경계 조건을 시사합니다. (4) 지식 융합 신호는 모순 신호보다 약 5배 높은 히트율을 나타내며, 이는 변화 유형에 따른 예측 가능성의 차이를 보여줍니다. 이러한 결과는 생성된 가설의 특성이 처리된 문헌의 양뿐만 아니라, 처리 방식에 의해서도 결정된다는 것을 시사합니다. 또한, 평가 프레임워크는 단일 지표를 최적화하는 대신 품질-범위 간의 균형을 고려해야 함을 나타냅니다.

Original Abstract

Scientific hypothesis generation requires tracking how knowledge evolves, not just what is currently known. We introduce Continuous Knowledge Metabolism (CKM), a framework that processes scientific literature through sliding time windows and incrementally updates a structured knowledge base as new findings arrive. We present CKM-Lite, an efficient variant that achieves strong predictive coverage through incremental accumulation, outperforming batch processing on hit rate (+2.8%, p=0.006), hypothesis yield (+3.6, p<0.001), and best-match alignment (+0.43, p<0.001) while reducing token cost by 92%. To understand what drives these differences, we develop CKM-Full, an instrumented variant that categorizes each new finding as novel, confirming, or contradicting, detects knowledge change signals, and conditions hypothesis generation on the full evolution trajectory. Analyzing 892 hypotheses generated by CKM-Full across 50 research topics, alongside parallel runs of the other variants, we report four empirical observations: (1) incremental processing outperforms batch baseline across predictive and efficiency metrics; (2) change-aware instrumentation is associated with higher LLM-judged novelty (Cohen's d=3.46) but lower predictive coverage, revealing a quality-coverage trade-off; (3) a field's trajectory stability is associated with hypothesis success (r=-0.28, p=0.051), suggesting boundary conditions for literature-based prediction; (4) knowledge convergence signals are associated with nearly 5x higher hit rate than contradiction signals, pointing to differential predictability across change types. These findings suggest that the character of generated hypotheses is shaped not only by how much literature is processed, but also by how it is processed. They further indicate that evaluation frameworks must account for the quality-coverage trade-off rather than optimize for a single metric.

1 Citations

0 Influential

4.5 Altmetric

23.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!