2601.20689v1 Jan 28, 2026 cs.CV

지각과 보정 분리: 효율적인 라벨 기반 이미지 품질 평가 프레임워크

Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework

Xinyue Li

Citations: 109

h-index: 8

Zhiming Xu

Citations: 7

h-index: 1

Zhichao Zhang

Citations: 119

h-index: 6

Xiongkuo Min

Citations: 11,423

h-index: 50

Yitong Chen

Citations: 12

h-index: 2

Guangtao Zhai

Citations: 2,866

h-index: 27

Shubo Xu

Citations: 9

h-index: 1

최근 멀티모달 대규모 언어 모델(MLLM)은 이미지 품질 평가(IQA) 작업에서 뛰어난 성능을 보여주었습니다. 그러나 이러한 대규모 모델을 적용하는 데는 상당한 계산 비용이 필요하며, 여전히 많은 수의 평균 주관 점수(MOS) 어노테이션이 필요합니다. 우리는 MLLM 기반 IQA의 핵심적인 문제는 MLLM의 품질 지각 능력에 있는 것이 아니라, MOS 스케일 보정에 있다는 점을 주장합니다. 따라서 우리는 LEAF라는 효율적인 라벨 기반 이미지 품질 평가 프레임워크를 제안합니다. LEAF는 MLLM 티처 모델로부터 지각 품질 정보를 추출하여 경량화된 학생 모델에 전달함으로써, 최소한의 인간 감독으로 MOS 보정을 가능하게 합니다. 구체적으로, 티처 모델은 점별 판단과 쌍별 선호도를 통해 밀도 높은 지침을 제공하며, 의사 결정의 신뢰도를 추정합니다. 이러한 지침에 따라 학생 모델은 합동 지식 증류를 통해 티처 모델의 품질 지각 패턴을 학습하고, 적은 수의 MOS 데이터 세트를 사용하여 인간 어노테이션과 일치하도록 보정됩니다. 사용자 생성 및 AI 생성 IQA 벤치마크에서 수행된 실험 결과, 제안하는 방법은 인간 어노테이션의 필요성을 크게 줄이면서도 높은 수준의 MOS 상관관계를 유지하며, 제한된 어노테이션 예산 하에서 경량화된 IQA 시스템을 실용적으로 사용할 수 있도록 합니다.

Original Abstract

Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive and still relies on substantial Mean Opinion Score (MOS) annotations. We argue that for MLLM-based IQA, the core bottleneck lies not in the quality perception capacity of MLLMs, but in MOS scale calibration. Therefore, we propose LEAF, a Label-Efficient Image Quality Assessment Framework that distills perceptual quality priors from an MLLM teacher into a lightweight student regressor, enabling MOS calibration with minimal human supervision. Specifically, the teacher conducts dense supervision through point-wise judgments and pair-wise preferences, with an estimate of decision reliability. Guided by these signals, the student learns the teacher's quality perception patterns through joint distillation and is calibrated on a small MOS subset to align with human annotations. Experiments on both user-generated and AI-generated IQA benchmarks demonstrate that our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations, making lightweight IQA practical under limited annotation budgets.

0 Citations

0 Influential

25 Altmetric

125.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!