2604.25325v1 Apr 28, 2026 cs.SE

R³-SQL: 텍스트-SQL 변환을 위한 순위 결정, 보상 및 재샘플링

R$^3$-SQL: Ranking Reward and Resampling for Text-to-SQL

Yuxiong He

Citations: 1,644

h-index: 19

Zhewei Yao

Citations: 124

h-index: 5

Hojae Han

Citations: 334

h-index: 6

Yeonseok Jeong

Citations: 14

h-index: 2

Seung-won Hwang

Citations: 2

h-index: 1

최신 텍스트-SQL 시스템은 여러 개의 후보 SQL 쿼리를 생성하고, 이들을 순위를 매겨 최종 예측을 수행합니다. 그러나 기존 방법은 두 가지 제한점을 가지고 있습니다. 첫째, 기능적으로 동일하지만 실행 결과가 동일한 SQL 쿼리에 대해서도 일관성 없는 점수가 부여되는 경우가 많습니다. 둘째, 순위 결정은 올바른 SQL이 후보 풀에 없을 때 회복 불가능한 문제를 야기합니다. 본 논문에서는 이러한 문제를 해결하기 위해 통합된 보상 시스템과 재샘플링 방법을 사용하는 텍스트-SQL 프레임워크인 R³-SQL을 제안합니다. R³-SQL은 먼저 실행 결과를 기준으로 후보들을 그룹화하고, 그룹 내 일관성을 확보하기 위해 그룹별로 순위를 매깁니다. 각 그룹의 점수는 그룹 간의 상대적 선호도, 일관성 및 후보 품질을 포착하기 위해 그룹 내 최고 순위 및 크기에 대한 pointwise 유틸리티와 함께 pairwise 선호도를 결합하여 계산합니다. 후보 검색률을 향상시키기 위해 R³-SQL은 생성된 후보 풀을 평가하고 올바른 SQL이 없을 가능성이 높을 때 선택적으로 재샘플링하는 agentic resampling 방법을 도입합니다. R³-SQL은 공개된 크기의 모델을 사용하는 기존 방법 중 최고 수준인 75.03의 실행 정확도를 BIRD-dev 데이터셋에서 달성했으며, 5개의 벤치마크에서 일관된 성능 향상을 보였습니다.

Original Abstract

Modern Text-to-SQL systems generate multiple candidate SQL queries and rank them to judge a final prediction. However, existing methods face two limitations. First, they often score functionally equivalent SQL queries inconsistently despite identical execution results. Second, ranking cannot recover when the correct SQL is absent from the candidate pool. We propose R$^3$-SQL, a Text-to-SQL framework that addresses both issues through unified reward for ranking and resampling. R$^3$-SQL first groups candidates by execution result and ranks groups for consistency. To score each group, it combines a pairwise preference across groups with a pointwise utility from the best group rank and size, capturing relative preference, consistency, and candidate quality. To improve candidate recall, R$^3$-SQL introduces agentic resampling, which judges the generated candidate pool and selectively resamples when the correct SQL is likely absent. R$^3$-SQL achieves 75.03 execution accuracy on BIRD-dev, a new state of the art among methods using models with disclosed sizes, with consistent gains across five benchmarks.

0 Citations

0 Influential

9.5 Altmetric

47.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!