2603.02081v1 Mar 02, 2026 cs.DB

GenDB: 차세대 쿼리 처리 - 설계된 것이 아닌 합성된 방식

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Citations: 162

h-index: 7

Citations: 1,607

h-index: 22

기존의 쿼리 처리 방식은 많은 전문가들에 의해 세심하게 최적화되고 설계된 엔진에 의존합니다. 그러나 새로운 기술과 사용자 요구 사항은 빠르게 변화하며, 기존 시스템은 종종 이러한 변화에 발맞추지 못합니다. 또한, 이러한 시스템은 내부 복잡성으로 인해 확장하기 어렵고, 새로운 시스템을 개발하는 데는 상당한 엔지니어링 노력과 비용이 필요합니다. 본 논문에서는 최근의 대규모 언어 모델(LLM) 발전이 차세대 쿼리 처리 시스템의 모습을 변화시키기 시작하고 있다고 주장합니다. 저희는 LLM을 사용하여 각 쿼리에 대한 실행 코드를 생성하는 방식을 제안합니다. 이는 복잡한 쿼리 처리 엔진을 지속적으로 구축, 확장 및 유지 관리하는 방식 대신, LLM을 활용하여 쿼리 실행 코드를 합성하는 방식입니다. 개념 증명으로, 저희는 GenDB라는 LLM 기반의 에이전트 시스템을 소개합니다. GenDB는 특정 데이터, 워크로드 및 하드웨어 리소스에 최적화되고 맞춤화된 쿼리 실행 코드를 생성합니다. 저희는 Claude Code Agent를 다중 에이전트 시스템의 핵심 구성 요소로 사용하는 GenDB의 초기 프로토타입을 구현하고, OLAP 워크로드에서 성능을 평가했습니다. 저희는 널리 알려진 TPC-H 벤치마크의 쿼리와 함께, LLM 학습 데이터로부터의 잠재적인 데이터 유출을 줄이기 위해 설계된 새로운 벤치마크를 사용했습니다. GenDB를 DuckDB, Umbra, MonetDB, ClickHouse 및 PostgreSQL을 포함한 최첨단 쿼리 엔진과 비교한 결과, GenDB는 현저히 더 나은 성능을 달성했습니다. 마지막으로, GenDB의 현재 한계를 논하고, 향후 확장 방향 및 관련 연구 과제를 제시합니다.

Original Abstract

Traditional query processing relies on engines that are carefully optimized and engineered by many experts. However, new techniques and user requirements evolve rapidly, and existing systems often cannot keep pace. At the same time, these systems are difficult to extend due to their internal complexity, and developing new systems requires substantial engineering effort and cost. In this paper, we argue that recent advances in Large Language Models (LLMs) are starting to shape the next generation of query processing systems. We propose using LLMs to synthesize execution code for each incoming query, instead of continuously building, extending, and maintaining complex query processing engines. As a proof of concept, we present GenDB, an LLM-powered agentic system that generates instance-optimized and customized query execution code tailored to specific data, workloads, and hardware resources. We implemented an early prototype of GenDB that uses Claude Code Agent as the underlying component in the multi-agent system, and we evaluate it on OLAP workloads. We use queries from the well-known TPC-H benchmark and also construct a new benchmark designed to reduce potential data leakage from LLM training data. We compare GenDB with state-of-the-art query engines, including DuckDB, Umbra, MonetDB, ClickHouse, and PostgreSQL. GenDB achieves significantly better performance than these systems. Finally, we discuss the current limitations of GenDB and outline future extensions and related research challenges.

2 Citations

0 Influential

11 Altmetric

57.0 Score

Original PDF

No Analysis Report Yet

This paper hasn't been analyzed by Gemini yet.

댓글을 작성하려면 로그인하세요.

아직 댓글이 없습니다. 첫 번째 댓글을 남겨보세요!