2602.16720v1 Feb 11, 2026 cs.DB

APEX-SQL: 에이전트 기반 탐색을 통한 텍스트-SQL 변환

APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL

Bowen Cao

Citations: 4

h-index: 1

Weibin Liao

Citations: 257

h-index: 6

Dong Fang

Citations: 61

h-index: 3

Haitao Li

Citations: 231

h-index: 3

Wai Lam

Citations: 207

h-index: 4

Yushi Sun

Citations: 212

h-index: 6

대규모 언어 모델(LLM)을 기반으로 하는 텍스트-SQL 시스템은 학문적 벤치마크에서는 뛰어난 성능을 보이지만, 복잡한 기업 환경에서는 어려움을 겪습니다. 주요 제한 사항은 정적 스키마 표현에 대한 의존성으로, 이는 의미적 모호성을 해결하지 못하고 대규모 복잡한 데이터베이스에 효과적으로 확장되지 못합니다. 이러한 문제를 해결하기 위해, 우리는 APEX-SQL이라는 에이전트 기반 텍스트-SQL 프레임워크를 제안합니다. 이 프레임워크는 수동적인 번역에서 에이전트 기반 탐색으로 패러다임을 전환합니다. 우리의 프레임워크는 모델 추론을 실제 데이터에 연결하기 위해 가설 검증 루프를 사용합니다. 스키마 연결 단계에서는 논리적 계획을 사용하여 가설을 표현하고, 이중 경로 가지치기를 사용하여 검색 공간을 줄이며, 병렬 데이터 프로파일링을 사용하여 실제 데이터를 기반으로 열의 역할을 검증하고, 마지막으로 전역 합성을 통해 위상적 연결성을 보장합니다. SQL 생성의 경우, 탐색 지침을 검색하기 위한 결정론적 메커니즘을 도입하여 에이전트가 데이터 분포를 효과적으로 탐색하고, 가설을 개선하며, 의미적으로 정확한 SQL 쿼리를 생성할 수 있도록 합니다. BIRD (70.65% 실행 정확도) 및 Spider 2.0-Snow (51.01% 실행 정확도) 데이터셋에 대한 실험 결과, APEX-SQL은 경쟁 모델보다 우수한 성능을 보이며 토큰 소비량도 줄였습니다. 추가 분석 결과, 에이전트 기반 탐색은 성능을 향상시키는 역할을 하며, 기업 환경에서 기반 모델의 잠재적인 추론 능력을 최대한 활용할 수 있도록 합니다. 삭제 실험(Ablation study)을 통해, 견고하고 정확한 데이터 분석을 보장하는 각 구성 요소의 중요한 기여도를 확인했습니다.

Original Abstract

Text-to-SQL systems powered by Large Language Models have excelled on academic benchmarks but struggle in complex enterprise environments. The primary limitation lies in their reliance on static schema representations, which fails to resolve semantic ambiguity and scale effectively to large, complex databases. To address this, we propose APEX-SQL, an Agentic Text-to-SQL Framework that shifts the paradigm from passive translation to agentic exploration. Our framework employs a hypothesis-verification loop to ground model reasoning in real data. In the schema linking phase, we use logical planning to verbalize hypotheses, dual-pathway pruning to reduce the search space, and parallel data profiling to validate column roles against real data, followed by global synthesis to ensure topological connectivity. For SQL generation, we introduce a deterministic mechanism to retrieve exploration directives, allowing the agent to effectively explore data distributions, refine hypotheses, and generate semantically accurate SQLs. Experiments on BIRD (70.65% execution accuracy) and Spider 2.0-Snow (51.01% execution accuracy) demonstrate that APEX-SQL outperforms competitive baselines with reduced token consumption. Further analysis reveals that agentic exploration acts as a performance multiplier, unlocking the latent reasoning potential of foundation models in enterprise settings. Ablation studies confirm the critical contributions of each component in ensuring robust and accurate data analysis.

3 Citations

2 Influential

3 Altmetric

22.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!