2602.10619v1 Feb 11, 2026 cs.CV

지각 및 추론 증강을 통한 의료 시각 강화 미세 조정 방법 개선

Improving Medical Visual Reinforcement Fine-Tuning via Perception and Reasoning Augmentation

Guangjing Yang

Citations: 13

h-index: 2

Zhangyuan Yu

Citations: 0

h-index: 0

Ziyuan Qin

Citations: 11

h-index: 2

Xinyuan Song

Citations: 10

h-index: 1

Huahui Yi

Citations: 286

h-index: 7

Qingbo Kang

Citations: 307

h-index: 10

Jun Gao

Citations: 65

h-index: 4

Yiyue Li

Citations: 186

h-index: 7

Chenlin Du

Citations: 29

h-index: 3

Qicheng Lao

Citations: 1,098

h-index: 18

최근 강화 미세 조정(RFT)의 발전은 규칙 기반 보상 체계가 대규모 언어 모델의 효과적인 추가 학습을 가능하게 한다는 것을 보여주었지만, 이러한 기술을 다중 모드, 시각 중심 영역으로 확장하는 것은 아직 충분히 연구되지 않았습니다. 이러한 제한 사항은 특히 의료 영상 분야에서 두드러지는데, 효과적인 성능을 위해서는 강력한 시각적 지각 능력과 체계적인 추론 능력이 모두 필요하기 때문입니다. 본 연구에서는 이러한 격차를 해소하기 위해 의료 분야에 특화된 시각 강화 미세 조정 프레임워크인 VRFT-Aug를 제안합니다. VRFT-Aug는 시각적 지각 능력과 추론 능력을 향상시키기 위한 다양한 학습 전략을 도입하는데, 여기에는 사전 지식 주입, 지각 기반 정책 개선, 의료 정보를 활용한 보상 설계, 그리고 행동 모방 등이 포함됩니다. 이러한 방법들은 RFT 과정을 안정화하고 개선하는 것을 목표로 합니다. 다양한 의료 데이터 세트에 대한 광범위한 실험을 통해, 제안하는 방법들이 표준 지도 학습 및 RFT 기반 방법보다 일관되게 우수한 성능을 보인다는 것을 확인했습니다. 또한, 실제 데이터 기반의 통찰력과 실용적인 학습 지침을 제공하며, 이는 다른 의료 영상 작업에도 일반화될 수 있습니다. 본 연구가 신뢰할 수 있고 추론 능력을 갖춘 모델을 개발하기 위한 지속적인 노력에 실질적인 지침과 새로운 영감을 제공하기를 바랍니다. 특히, 고위험 의료 응용 분야에 적합한 모델 개발에 기여할 수 있기를 기대합니다.

Original Abstract

While recent advances in Reinforcement Fine-Tuning (RFT) have shown that rule-based reward schemes can enable effective post-training for large language models, their extension to cross-modal, vision-centric domains remains largely underexplored. This limitation is especially pronounced in the medical imaging domain, where effective performance requires both robust visual perception and structured reasoning. In this work, we address this gap by proposing VRFT-Aug, a visual reinforcement fine-tuning framework tailored for the medical domain. VRFT-Aug introduces a series of training strategies designed to augment both perception and reasoning, including prior knowledge injection, perception-driven policy refinement, medically informed reward shaping, and behavioral imitation. Together, these methods aim to stabilize and improve the RFT process. Through extensive experiments across multiple medical datasets, we show that our approaches consistently outperform both standard supervised fine-tuning and RFT baselines. Moreover, we provide empirically grounded insights and practical training heuristics that can be generalized to other medical image tasks. We hope this work contributes actionable guidance and fresh inspiration for the ongoing effort to develop reliable, reasoning-capable models for high-stakes medical applications.

0 Citations

0 Influential

9 Altmetric

45.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!