2603.19957v1 Mar 20, 2026 cs.CV

HiPath: 계층적 시각-언어 정렬을 통한 구조화된 병리 보고서 예측

HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction

Guang Yang

Citations: 26

h-index: 2

Zhenxuan Zhang

Citations: 76

h-index: 5

Rui Yuan

Citations: 5

h-index: 1

Anbang Wang

Citations: 19

h-index: 3

Liwei Hu

Citations: 6

h-index: 2

Xian-Sheng Hua

Citations: 17

h-index: 3

Yaya Peng

Citations: 9

h-index: 1

Jiawei Luo

Citations: 118

h-index: 5

병리 보고서는 진단 결론, 조직학적 등급, 그리고 하나 이상의 해부학적 부위에 대한 보조 검사 결과 등을 포함하는 구조화되고 다층적인 문서입니다. 그러나 기존의 병리 시각-언어 모델(VLM)은 이러한 출력을 단순한 레이블 또는 자유 형식의 텍스트로 축소합니다. 본 논문에서는 구조화된 보고서 예측을 주요 학습 목표로 하는, UNI2 및 Qwen3를 기반으로 구축된 경량 VLM 프레임워크인 HiPath를 제시합니다. 총 15M개의 파라미터를 가진 세 개의 학습 가능한 모듈이 문제를 해결하며, 이 모듈들은 다음과 같습니다. (1) 다중 이미지 시각 인코딩을 위한 계층적 패치 집계기(HiPA), (2) 최적 수송을 통한 양방향 정렬을 위한 계층적 대비 학습(HiCL), 그리고 (3) 구조화된 진단 생성을 위한 슬롯 기반 마스크 진단 예측(Slot-MDP). HiPath는 세 개의 병원에서 수집된 749,000개의 실제 중국 병리 사례를 사용하여 학습되었으며, 엄격한 정확도는 68.9%, 임상적으로 허용 가능한 정확도는 74.7%를 달성했으며, 안전성은 97.3%로, 동일한 고정된 기반 모델을 사용한 모든 기준 모델보다 우수한 성능을 보였습니다. 병원 간 평가 결과, 엄격한 정확도가 3.4%p 감소했지만, 안전성은 97.1%로 유지되어 일반화 성능이 입증되었습니다.

Original Abstract

Pathology reports are structured, multi-granular documents encoding diagnostic conclusions, histological grades, and ancillary test results across one or more anatomical sites; yet existing pathology vision-language models (VLMs) reduce this output to a flat label or free-form text. We present HiPath, a lightweight VLM framework built on frozen UNI2 and Qwen3 backbones that treats structured report prediction as its primary training objective. Three trainable modules totalling 15M parameters address complementary aspects of the problem: a Hierarchical Patch Aggregator (HiPA) for multi-image visual encoding, Hierarchical Contrastive Learning (HiCL) for cross-modal alignment via optimal transport, and Slot-based Masked Diagnosis Prediction (Slot-MDP) for structured diagnosis generation. Trained on 749K real-world Chinese pathology cases from three hospitals, HiPath achieves 68.9% strict and 74.7% clinically acceptable accuracy with a 97.3% safety rate, outperforming all baselines under the same frozen backbone. Cross-hospital evaluation confirms generalisation with only a 3.4pp drop in strict accuracy while maintaining 97.1% safety.

0 Citations

0 Influential

2.5 Altmetric

12.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!