2605.03544v1 May 05, 2026 cs.CV

DALPHIN: 디지털 병리학 AI 코파일럿 성능 평가: 개방형 다기관 데이터셋을 활용한 병리학자 비교

DALPHIN: Benchmarking Digital Pathology AI Copilots Against Pathologists on an Open Multicentric Dataset

Mingzhe Lu

Citations: 1

h-index: 1

Carlijn M. Lems

Citations: 21

h-index: 2

S. Moonemans

Citations: 11

h-index: 2

Nat'alie Klub'ivckov'a

Citations: 0

h-index: 0

B. Brattoli

Citations: 81

h-index: 4

Seokhwi Kim

Citations: 766

h-index: 12

Verónica Vilaplana

Citations: 3

h-index: 1

Laura Pons

Citations: 8

h-index: 1

Sapir Hochman

Citations: 0

h-index: 0

Mauricio Eduardo Su'arez-Franck

Citations: 0

h-index: 0

P. Fernández

Citations: 27

h-index: 3

Julius Drachneris

Citations: 83

h-index: 4

D. Petroška

Citations: 300

h-index: 11

R. Augulis

Citations: 319

h-index: 11

A. Laurinavičius

Citations: 2,512

h-index: 29

Domingos Oliveira

Citations: 111

h-index: 4

D. Montezuma

Citations: 860

h-index: 17

Anouk B. Bouwmeester

Citations: 2

h-index: 1

Dominique van Midden

Citations: 17

h-index: 3

A. Vos

Citations: 264

h-index: 4

Shoko Vos

Citations: 36

h-index: 3

J. V. Ipenburg

Citations: 73

h-index: 3

M. Balkenhol

Citations: 5,839

h-index: 19

K. Winkler

Citations: 0

h-index: 0

Iris D. Nagtegaal

Citations: 35

h-index: 3

K. Hebeda

Citations: 3,951

h-index: 32

U. Flucke

Citations: 42

h-index: 3

K. Grunberg

Citations: 271

h-index: 5

Josef Skopal

Citations: 14

h-index: 2

B. Chohan

Citations: 65

h-index: 5

J. Temprana-Salvador

Citations: 262

h-index: 10

E. Munari

Citations: 33

h-index: 2

L. Cima

Citations: 40

h-index: 3

Giulia Querzoli

Citations: 90

h-index: 5

Yosamin M. Gonzalez Belisario

Citations: 0

h-index: 0

Jaeike W. Faber

Citations: 162

h-index: 7

G J Leenders

Citations: 165

h-index: 2

J. Thusen

Citations: 41

h-index: 2

L. Brosens

Citations: 5,953

h-index: 40

R. D. Krijger

Citations: 8

h-index: 1

P. Wesseling

Citations: 5

h-index: 1

Sandrine Florquin

Citations: 13

h-index: 2

Mateusz Maniewski

Citations: 1

h-index: 1

Adam Kowalewski

Citations: 10

h-index: 2

Robert Barna

Citations: 28

h-index: 3

Dina G. Tiniakos

Citations: 205

h-index: 6

J. L. Gros

Citations: 3

h-index: 1

R. Donders

Citations: 2,435

h-index: 22

J. S. Maurits

Citations: 67

h-index: 4

Chengkuan Chen

Citations: 2,213

h-index: 8

Faisal Mahmood

Citations: 2

h-index: 1

J. Laak

Citations: 17,046

h-index: 28

Nadieh Khalili

Citations: 95

h-index: 3

Frédérique Meeuwsen

Citations: 15

h-index: 2

Francesco Ciompi

Citations: 45

h-index: 3

Taebum Lee

Citations: 156

h-index: 5

디지털 병리학 분야에서 시각적 질의응답 기능을 갖춘 기초 모델이 등장하고 있습니다. 이러한 혁신적인 기술은 병리학자들이 일상적인 진단에 활용하는 데 잠재력을 평가하기 위해 독립적인 성능 평가가 필요합니다. 본 연구에서는 최초의 개방형 다기관 병리학 AI 코파일럿 성능 평가 도구인 DALPHIN을 개발했습니다. DALPHIN은 300건의 사례에서 추출된 1236장의 이미지로 구성되어 있으며, 130가지의 희귀 질환부터 흔한 질환까지 아우르며, 6개국, 14개의 세부 전문 분야를 포함합니다. 본 논문에서는 DALPHIN의 설계 및 데이터셋을 소개하고, 10개국에서 온 31명의 병리학자(다양한 전문 지식 보유)를 대상으로 한 인간 성능 기준을 제시합니다. 본 연구에서는 두 개의 범용 모델(GPT-5, Gemini 2.5 Pro)과 하나의 병리학 특화 모델(PathChat+)을 사용하여 순차적 및 독립적인 답변 생성 성능을 평가했습니다. PathChat은 6개의 작업 중 4개에서, Gemini는 2개에서, GPT는 1개에서 전문가 수준의 성능과 통계적으로 유의미한 차이가 없는 것으로 나타났습니다. DALPHIN은 공개적으로 제공되며, 안전하게 관리되고 간접적으로 접근 가능한 정답 데이터를 포함하여, 견고하고 지속적인 성능 평가를 지원합니다. 데이터, 방법 및 평가 플랫폼은 dalphin.grand-challenge.org를 통해 이용할 수 있습니다.

Original Abstract

Foundation models with visual question answering capabilities for digital pathology are emerging. Such unprecedented technology requires independent benchmarking to assess its potential in assisting pathologists in routine diagnostics. We created DALPHIN, the first multicentric open benchmark for pathology AI copilots, comprising 1236 images from 300 cases, spanning 130 rare to common diagnoses, 6 countries, and 14 subspecialties. The DALPHIN design and dataset are introduced alongside a human performance benchmark of 31 pathologists from 10 countries with varying expertise. We report results for two general-purpose (GPT-5, Gemini 2.5 Pro) and one pathology-specific copilot (PathChat+) for sequential and independent answer generation. We observed no statistically significant difference from expert-level performance in four of six tasks for PathChat, 2/6 tasks for Gemini, and 1/6 tasks for GPT. DALPHIN is publicly released with sequestered, indirectly accessible ground truth to foster robust and enduring benchmarking. Data, methods, and the evaluation platform are accessible through dalphin.grand-challenge.org.

0 Citations

0 Influential

20 Altmetric

100.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!