데이터 마이닝에서의 확장 가능한 호기심 기반 게임 이론 프레임워크: 장기 분포 다중 레이블 학습
A Scalable Curiosity-Driven Game-Theoretic Framework for Long-Tail Multi-Label Learning in Data Mining
현실 세계의 데이터 마이닝 응용 분야에서, 소수의 주요 레이블이 지배적이고 희귀한 꼬리 레이블이 많은 장기 분포는 대규모 다중 레이블 분류(MLC)에 지속적인 과제를 제시합니다. 기존의 리샘플링 및 재가중화 전략은 종종 레이블 간의 의존성을 파괴하거나, 특히 레이블 공간이 수만 개의 레이블로 확장될 때, 취약한 하이퍼파라미터 튜닝을 요구합니다. 이러한 문제를 해결하기 위해, 우리는 호기심 기반 게임 이론 다중 레이블 학습(CD-GTMLL)을 제안합니다. CD-GTMLL은 확장 가능한 협력 프레임워크로서, 장기 분포 MLC를 다중 플레이어 게임으로 재구성합니다. 각 하위 예측기(
The long-tail distribution, where a few head labels dominate while rare tail labels abound, poses a persistent challenge for large-scale Multi-Label Classification (MLC) in real-world data mining applications. Existing resampling and reweighting strategies often disrupt inter-label dependencies or require brittle hyperparameter tuning, especially as the label space expands to tens of thousands of labels. To address this issue, we propose Curiosity-Driven Game-Theoretic Multi-Label Learning (CD-GTMLL), a scalable cooperative framework that recasts long-tail MLC as a multi-player game - each sub-predictor ("player") specializes in a partition of the label space, collaborating to maximize global accuracy while pursuing intrinsic curiosity rewards based on tail label rarity and inter-player disagreement. This mechanism adaptively injects learning signals into under-represented tail labels without manual balancing or tuning. We further provide a theoretical analysis showing that our CD-GTMLL converges to a tail-aware equilibrium and formally links the optimization dynamics to improvements in the Rare-F1 metric. Extensive experiments across 7 benchmarks, including extreme multi-label classification datasets with 30,000+ labels, demonstrate that CD-GTMLL consistently surpasses state-of-the-art methods, with gains up to +1.6% P@3 on Wiki10-31K. Ablation studies further confirm the contributions of both game-theoretic cooperation and curiosity-driven exploration to robust tail performance. By integrating game theory with curiosity mechanisms, CD-GTMLL not only enhances model efficiency in resource-constrained environments but also paves the way for more adaptive learning in imbalanced data scenarios across industries like e-commerce and healthcare.
No Analysis Report Yet
This paper hasn't been analyzed by Gemini yet.