2601.15728v2 Jan 22, 2026 cs.AI

텍스트-SQL과 텍스트-Python 성능 비교 분석: 명시적 논리와 모호성의 영향

Benchmarking Text-to-Python against Text-to-SQL: The Impact of Explicit Logic and Ambiguity

Chenyu Hou

Citations: 221

h-index: 7

Bin Cao

Citations: 7

h-index: 1

Ruizhe Li

Citations: 4

h-index: 1

Han Hu

Citations: 1,207

h-index: 3

텍스트-SQL은 데이터베이스 상호 작용의 주요 접근 방식으로 자리 잡았지만, 실제 분석에서는 파일 기반 데이터 관리 및 복잡한 분석 워크플로우를 위해 Python 또는 Pandas와 같은 범용 프로그래밍 언어의 유연성이 점점 더 중요해지고 있습니다. 이러한 필요성이 증가함에도 불구하고, 핵심 데이터 검색 측면에서 텍스트-Python의 신뢰성은 SQL 생태계에 비해 상대적으로 연구가 부족합니다. 이러한 격차를 해소하기 위해, 우리는 교차 패러다임 평가를 위한 벤치마크인 BIRD-Python을 소개합니다. 우리는 원래 데이터셋을 체계적으로 개선하여 어노테이션 오류를 줄이고 실행 의미를 일치시켜, 일관되고 표준화된 비교 기준을 확립했습니다. 우리의 분석 결과, 근본적인 패러다임 차이가 있음을 알 수 있었습니다. SQL은 선언적 구조를 통해 암시적인 DBMS 동작을 활용하는 반면, Python은 명시적인 절차적 논리가 필요하며, 이는 사용자 의도의 불명확성에 매우 민감합니다. 이러한 과제를 해결하기 위해, 우리는 잠재적인 도메인 지식을 생성 프로세스에 통합하여 모호성을 해결하는 논리 완성 프레임워크(Logic Completion Framework, LCF)를 제안합니다. 실험 결과는 (1) 성능 차이가 주로 코드 생성 자체의 한계보다는 누락된 도메인 컨텍스트에서 비롯되며, (2) 이러한 격차가 해결되면 텍스트-Python이 텍스트-SQL과 동등한 성능을 달성한다는 것을 보여줍니다. 이러한 결과는 Python이 분석 에이전트의 실행 가능한 기반이 될 수 있음을 시사합니다. 단, 시스템이 모호한 자연어 입력을 실행 가능한 논리적 사양으로 효과적으로 변환해야 합니다. 관련 자료는 https://anonymous.4open.science/r/Bird-Python-43B7/ 에서 확인할 수 있습니다.

Original Abstract

While Text-to-SQL remains the dominant approach for database interaction, real-world analytics increasingly require the flexibility of general-purpose programming languages such as Python or Pandas to manage file-based data and complex analytical workflows. Despite this growing need, the reliability of Text-to-Python in core data retrieval remains underexplored relative to the mature SQL ecosystem. To address this gap, we introduce BIRD-Python, a benchmark designed for cross-paradigm evaluation. We systematically refined the original dataset to reduce annotation noise and align execution semantics, thereby establishing a consistent and standardized baseline for comparison. Our analysis reveals a fundamental paradigmatic divergence: whereas SQL leverages implicit DBMS behaviors through its declarative structure, Python requires explicit procedural logic, making it highly sensitive to underspecified user intent. To mitigate this challenge, we propose the Logic Completion Framework (LCF), which resolves ambiguity by incorporating latent domain knowledge into the generation process. Experimental results show that (1) performance differences primarily stem from missing domain context rather than inherent limitations in code generation, and (2) when these gaps are addressed, Text-to-Python achieves performance parity with Text-to-SQL. These findings establish Python as a viable foundation for analytical agents-provided that systems effectively ground ambiguous natural language inputs in executable logical specifications. Resources are available at https://anonymous.4open.science/r/Bird-Python-43B7/.

0 Citations

0 Influential

3.5 Altmetric

17.5 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!