2602.07943v1 Feb 08, 2026 cs.AI

IV Co-Scientist: 인과적 도구 변수 발견을 위한 다중 에이전트 LLM 프레임워크

IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery

Ivaxi Sheth

Citations: 230

h-index: 8

Mario Fritz

Citations: 52

h-index: 4

Zhijing Jin

Citations: 60

h-index: 5

Bryan Wilder

Citations: 18

h-index: 2

D. Janzing

Citations: 38

h-index: 3

내생 변수와 결과 변수 사이에 교란 요인이 존재할 때, 내생 변수의 인과 효과를 분리하기 위해 도구 변수(IV)가 사용된다. 유효한 도구를 식별하는 것은 학제간 지식, 창의성, 그리고 맥락에 대한 이해를 필요로 하기 때문에 결코 쉬운 작업이 아니다. 본 논문에서는 대규모 언어 모델(LLM)이 이 작업을 도울 수 있는지 조사한다. 이를 위해 우리는 2단계 평가 프레임워크를 수행한다. 첫째, LLM이 문헌에 존재하는 잘 확립된 도구 변수들을 찾아낼 수 있는지 테스트하여 표준적인 추론 과정을 재현하는 능력을 평가한다. 둘째, 실증적 또는 이론적으로 부적절하다고 판명된 도구들을 LLM이 식별하고 배제할 수 있는지 평가한다. 이러한 결과를 바탕으로, 주어진 처치-결과 쌍에 대해 도구 변수를 제안, 비평 및 개선하는 다중 에이전트 시스템인 'IV Co-Scientist'를 소개한다. 또한 정답(ground truth)이 부재한 상황에서 일관성을 파악하기 위한 통계적 검정 방법을 소개한다. 우리의 연구 결과는 대규모 관찰 데이터베이스에서 유효한 도구 변수를 발견하는 데 있어 LLM의 잠재력을 보여준다.

Original Abstract

In the presence of confounding between an endogenous variable and the outcome, instrumental variables (IVs) are used to isolate the causal effect of the endogenous variable. Identifying valid instruments requires interdisciplinary knowledge, creativity, and contextual understanding, making it a non-trivial task. In this paper, we investigate whether large language models (LLMs) can aid in this task. We perform a two-stage evaluation framework. First, we test whether LLMs can recover well-established instruments from the literature, assessing their ability to replicate standard reasoning. Second, we evaluate whether LLMs can identify and avoid instruments that have been empirically or theoretically discredited. Building on these results, we introduce IV Co-Scientist, a multi-agent system that proposes, critiques, and refines IVs for a given treatment-outcome pair. We also introduce a statistical test to contextualize consistency in the absence of ground truth. Our results show the potential of LLMs to discover valid instrumental variables from a large observational database.

1 Citations

0 Influential

4 Altmetric

21.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!