2601.08430v2 Jan 13, 2026 cs.AI

RubricHub: 자동화된 거칠고 세밀한 생성 방식을 통한 포괄적이고 차별성이 높은 채점 기준 데이터셋

RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation

Sunzhu Li

Citations: 30

h-index: 2

Jiale Zhao

Citations: 33

h-index: 2

Miteto Wei

Citations: 6

h-index: 1

Huimin Ren

Citations: 6

h-index: 1

Yang Zhou

Citations: 30

h-index: 2

Jingwen Yang

Citations: 29

h-index: 2

Kaike Zhang

Citations: 6

h-index: 1

Wei Chen

Citations: 7

h-index: 1

Shunyu Liu

Citations: 555

h-index: 9

강화 학습과 검증 가능한 보상(RLVR)은 수학과 같은 추론 집약적인 분야에서 상당한 발전을 이끌었습니다. 그러나 정답 데이터의 부족으로 인해 개방형 생성 최적화는 여전히 어려운 과제입니다. 채점 기준 기반 평가는 검증을 위한 구조화된 대안을 제공하지만, 기존 방법은 확장성 문제와 거친 기준으로 인해 지도 학습의 한계를 보입니다. 이러한 문제를 해결하기 위해, 우리는 자동화된 거칠고 세밀한 채점 기준 생성 프레임워크를 제안합니다. 원칙 기반 합성, 다중 모델 집계 및 난이도 진화를 결합함으로써, 우리의 접근 방식은 미묘한 뉘앙스를 포착할 수 있는 포괄적이고 차별성이 높은 기준을 생성합니다. 이 프레임워크를 기반으로, 우리는 대규모(약 11만 개)의 다중 도메인 데이터셋인 RubricHub를 소개합니다. 우리는 두 단계의 사후 학습 파이프라인인 채점 기준 기반 거부 샘플링 미세 조정(RuFT) 및 강화 학습(RuRL)을 통해 RubricHub의 유용성을 검증합니다. 실험 결과는 RubricHub가 상당한 성능 향상을 가져온다는 것을 보여줍니다. 사후 학습된 Qwen3-14B 모델은 HealthBench에서 최고(69.3)의 결과를 달성했으며, GPT-5와 같은 독점적인 최첨단 모델을 능가했습니다. 우리의 코드는 다음 URL에서 확인할 수 있습니다: [https://github.com/teqkilla/RubricHub](https://github.com/teqkilla/RubricHub)

Original Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has driven substantial progress in reasoning-intensive domains like mathematics. However, optimizing open-ended generation remains challenging due to the lack of ground truth. While rubric-based evaluation offers a structured proxy for verification, existing methods suffer from scalability bottlenecks and coarse criteria, resulting in a supervision ceiling effect. To address this, we propose an automated Coarse-to-Fine Rubric Generation framework. By synergizing principle-guided synthesis, multi-model aggregation, and difficulty evolution, our approach produces comprehensive and highly discriminative criteria capable of capturing the subtle nuances. Based on this framework, we introduce RubricHub, a large-scale ($\sim$110k) and multi-domain dataset. We validate its utility through a two-stage post-training pipeline comprising Rubric-based Rejection Sampling Fine-Tuning (RuFT) and Reinforcement Learning (RuRL). Experimental results demonstrate that RubricHub unlocks significant performance gains: our post-trained Qwen3-14B achieves state-of-the-art (SOTA) results on HealthBench (69.3), surpassing proprietary frontier models such as GPT-5. Our code is available at \href{https://github.com/teqkilla/RubricHub}{ this URL}.

6 Citations

0 Influential

44.715256339173 Altmetric

229.6 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!