2603.19005v1 Mar 19, 2026 cs.LG

AgentDS 기술 보고서: 도메인 특화 데이터 과학 분야의 인간-AI 협업 미래 성능 벤치마킹

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

Ashish Kundu

Citations: 91

h-index: 7

Charles Fleming

Citations: 42

h-index: 3

An Luo

Citations: 26

h-index: 2

Jin Du

Citations: 27

h-index: 2

Xun Xian

Citations: 164

h-index: 8

R. Specht

Citations: 29

h-index: 2

Fangqiao Tian

Citations: 34

h-index: 3

Xuan Bi

Citations: 95

h-index: 7

Jayanth Srinivasa

Citations: 625

h-index: 14

Mingyi Hong

Citations: 94

h-index: 7

Tianxiao Li

Citations: 14

h-index: 2

Jie Ding

Citations: 79

h-index: 6

Ganghua Wang

Citations: 85

h-index: 6

R. Zhang

Citations: 7

h-index: 2

Galin L. Jones

Citations: 6,056

h-index: 27

데이터 과학은 다양한 분야에서 복잡한 데이터를 활용 가능한 통찰력으로 변환하는 데 중요한 역할을 합니다. 최근 대규모 언어 모델(LLM) 및 인공지능(AI) 에이전트의 발전으로 인해 데이터 과학 워크플로우가 크게 자동화되었습니다. 그러나 AI 에이전트가 도메인 특화 데이터 과학 작업에서 인간 전문가의 성능에 얼마나 부합하는지, 그리고 어떤 측면에서 인간 전문성이 여전히 이점을 제공하는지는 아직 명확하지 않습니다. 본 연구에서는 도메인 특화 데이터 과학 분야에서 AI 에이전트와 인간-AI 협업 성능을 평가하기 위한 벤치마크 및 경쟁인 AgentDS를 소개합니다. AgentDS는 상업, 식품 생산, 의료, 보험, 제조, 소매 금융 등 6개 산업 분야에 걸쳐 17개의 과제로 구성됩니다. 29개 팀, 80명의 참가자가 참여하는 공개 경쟁을 통해 인간-AI 협업 방식과 AI 기반의 초기 성능을 체계적으로 비교했습니다. 결과는 현재 AI 에이전트가 도메인 특화 추론에 어려움을 겪는다는 것을 보여줍니다. AI 기반 초기 성능은 경쟁 참가자들의 중앙값 수준이거나 그 이하이며, 가장 뛰어난 솔루션은 인간-AI 협업을 통해 도출되었습니다. 이러한 결과는 AI에 의한 완전 자동화라는 주장에 도전하며, 데이터 과학 분야에서 인간 전문성의 지속적인 중요성을 강조하고, 차세대 AI의 발전 방향을 제시합니다. AgentDS 웹사이트는 https://agentds.org/ 에서, 공개 데이터셋은 https://huggingface.co/datasets/lainmn/AgentDS 에서 확인할 수 있습니다.

Original Abstract

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform near or below the median of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .

2 Citations

0 Influential

33.5 Altmetric

169.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!