2601.11670v2 Jan 16, 2026 cs.LG

반준지도 학습에서의 가짜 라벨 선택을 위한 신뢰도-분산 이론

A Confidence-Variance Theory for Pseudo-Label Selection in Semi-Supervised Learning

Pan Liu

Citations: 9

h-index: 2

Jinshi Liu

Citations: 5

h-index: 1

Lei He

Citations: 3

h-index: 1

대부분의 반준지도 학습에서 가짜 라벨 선택 전략은 고정된 신뢰도 임계값을 사용하며, 이는 예측 신뢰도가 정확성을 나타낸다는 암묵적인 가정을 기반으로 합니다. 그러나 실제로는 심층 신경망이 종종 과신뢰되는 경향이 있습니다. 높은 신뢰도를 가진 예측이라도 틀릴 수 있으며, 의사 결정 경계 근처에 있는 유용한 저신뢰도 샘플들이 버려지는 경우가 있습니다. 본 논문에서는 가짜 라벨 선택을 위한 체계적인 신뢰도 기준을 제공하는 신뢰도-분산(CoVar) 이론 프레임워크를 소개합니다. 엔트로피 최소화 원리로부터 시작하여, 최대 신뢰도(MC)와 잔여 클래스 분산(RCV)을 결합한 신뢰도 측정 방법을 유도합니다. RCV는 최대 신뢰도가 아닌 다른 클래스에 얼마나 확률 질량이 분포되어 있는지를 나타냅니다. 유도 과정에서 신뢰할 수 있는 가짜 라벨은 높은 MC와 낮은 RCV를 모두 가져야 하며, 신뢰도가 증가함에 따라 RCV의 영향이 증가하여 과신뢰적이지만 불안정한 예측을 수정할 수 있음을 보여줍니다. 이러한 관점에서 가짜 라벨 선택을 신뢰도-분산 특징 공간에서 분리성을 최대화하는 스펙트럴 릴랙세이션 문제로 정의하고, 고신뢰도 예측과 저신뢰도 예측을 구별하기 위한 임계값 없는 선택 메커니즘을 설계했습니다. CoVar를 대표적인 반준지도 의미 분할 및 이미지 분류 방법의 플러그인 모듈로 통합했습니다. PASCAL VOC 2012, Cityscapes, CIFAR-10 및 Mini-ImageNet 데이터셋에서 다양한 라벨 비율과 백본을 사용하여 실험한 결과, CoVar는 기존의 강력한 방법보다 일관되게 성능이 향상되었으며, 이는 신뢰도와 잔여 클래스 분산을 결합하면 고정된 신뢰도 임계값보다 더 신뢰할 수 있는 가짜 라벨 선택의 기반을 제공한다는 것을 시사합니다. (코드: https://github.com/ljs11528/CoVar_Pseudo_Label_Selection.git)

Original Abstract

Most pseudo-label selection strategies in semi-supervised learning rely on fixed confidence thresholds, implicitly assuming that prediction confidence reliably indicates correctness. In practice, deep networks are often overconfident: high-confidence predictions can still be wrong, while informative low-confidence samples near decision boundaries are discarded. This paper introduces a Confidence-Variance (CoVar) theory framework that provides a principled joint reliability criterion for pseudo-label selection. Starting from the entropy minimization principle, we derive a reliability measure that combines maximum confidence (MC) with residual-class variance (RCV), which characterizes how probability mass is distributed over non-maximum classes. The derivation shows that reliable pseudo-labels should have both high MC and low RCV, and that the influence of RCV increases as confidence grows, thereby correcting overconfident but unstable predictions. From this perspective, we cast pseudo-label selection as a spectral relaxation problem that maximizes separability in a confidence-variance feature space, and design a threshold-free selection mechanism to distinguish high- from low-reliability predictions. We integrate CoVar as a plug-in module into representative semi-supervised semantic segmentation and image classification methods. Across PASCAL VOC 2012, Cityscapes, CIFAR-10, and Mini-ImageNet with varying label ratios and backbones, it consistently improves over strong baselines, indicating that combining confidence with residual-class variance provides a more reliable basis for pseudo-label selection than fixed confidence thresholds. (Code: https://github.com/ljs11528/CoVar_Pseudo_Label_Selection.git)

0 Citations

0 Influential

21 Altmetric

105.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!