2603.01124v1 Mar 01, 2026 cs.CV

ClinCoT: 임상 지식을 고려한 시각적 추론 체인을 활용한 의료용 시각-언어 모델

ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models

Yulong Li

Citations: 5

h-index: 1

Xiwei Liu

Citations: 13

h-index: 2

Xinlin Zhuang

Citations: 30

h-index: 2

Yutong Xie

Citations: 38

h-index: 3

I. Razzak

Citations: 7

h-index: 2

Jianxu Chen

Citations: 11

h-index: 1

Haolin Yang

Citations: 42

h-index: 3

Xuhui Li

Citations: 38

h-index: 2

의료용 시각-언어 모델은 임상 의사 결정 지원 분야에서 유망한 잠재력을 보여주었지만, 여전히 국소적인 병리적 증거에 대한 충분한 기반 부족으로 인해 사실과 다른 내용을 생성하는 경향이 있습니다. 기존의 의료 데이터 정렬 방법은 주로 응답 수준에서 선호도 최적화를 통해 작동하여 출력의 정확성을 향상시키지만, 중간 추론 과정이 시각적 영역과 제대로 연결되지 않는 문제가 있습니다. 추론 체인(CoT)은 다중 모드 추론을 향상시키지만, 주로 텍스트 중심적이며 임상적인 시각 정보를 효과적으로 통합하는 데 한계가 있습니다. 이러한 격차를 해소하기 위해, 우리는 임상 지식을 고려한 시각적 추론 체인 프레임워크인 ClinCoT를 제안합니다. ClinCoT는 선호도 최적화를 응답 수준의 수정에서 시각 정보를 기반으로 한 추론으로 전환합니다. 우리는 가설 기반의 영역 제안을 통해 임상적으로 의미 있는 선호도 쌍을 구축하는 자동 데이터 생성 파이프라인을 도입했습니다. 여러 Med-LLM 평가 모델이 각 응답에 순위를 매기고 점수를 부여하며, 이러한 순위는 대상 모델을 훈련시키는 데 사용됩니다. 또한, 우리는 선호도 순위와 점수 차이를 모두 활용하여 영역 수준의 추론 경로를 개선하는 점수 기반의 마진 인지 최적화 전략을 도입했습니다. 모델의 정책이 훈련 중에 변화함에 따라 정렬을 유지하기 위해, 우리는 반복적인 학습 방식을 채택하여 선호도 데이터를 동적으로 재생성합니다. 세 가지 의료용 시각적 질의 응답 및 보고서 생성 벤치마크에 대한 광범위한 실험 결과, ClinCoT는 사실 기반의 정확성을 지속적으로 향상시키며 기존의 선호도 기반 정렬 방법보다 우수한 성능을 달성함을 보여줍니다.

Original Abstract

Medical Vision-Language Models have shown promising potential in clinical decision support, yet they remain prone to factual hallucinations due to insufficient grounding in localized pathological evidence. Existing medical alignment methods primarily operate at the response level through preference optimization, improving output correctness but leaving intermediate reasoning weakly connected to visual regions. Although chain-of-thought (CoT) enhances multimodal reasoning, it remains largely text-centric, limiting effective integration of clinical visual cues. To address this gap, we propose ClinCoT, a clinical-aware visual chain-of-thought framework that transforms preference optimization from response-level correction to visual-driven reasoning. We introduce an automatic data generation pipeline that constructs clinically grounded preference pairs through reasoning with hypotheses-driven region proposals. Multiple Med-LLMs evaluators rank and assign scores to each response, and these rankings serve as supervision to train the target model. We further introduce a scoring-based margin-aware optimization strategy that incorporates both preference ranking and score difference to refine region-level reasoning trajectories. To maintain alignment as the model's policy evolves during training, we adopt an iterative learning scheme that dynamically regenerates preference data. Extensive experiments on three medical VQA and report generation benchmarks demonstrate that ClinCoT consistently improves factual grounding and achieves superior performance compared with existing preference-based alignment methods.

0 Citations

0 Influential

1.5 Altmetric

7.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!