DuckDB on xNVMe
By: Marius Ottosen, Magnus Keinicke Parlo, Philippe Bonnet
Potential Business Impact:
Makes computer data storage much faster.
DuckDB is designed for portability. It is also designed to run anywhere, and possibly in contexts where it can be specialized for performance, e.g., as a cloud service or on a smart device. In this paper, we consider the way DuckDB interacts with local storage. Our long term research question is whether and how SSDs could be co-designed with DuckDB. As a first step towards vertical integration of DuckDB and programmable SSDs, we consider whether and how DuckDB can access NVMe SSDs directly. By default, DuckDB relies on the POSIX file interface. In contrast, we rely on the xNVMe library and explore how it can be leveraged in DuckDB. We leverage the block-based nature of the DuckDB buffer manager to bypass the synchronous POSIX I/O interface, the file system and the block manager. Instead, we directly issue asynchronous I/Os against the SSD logical block address space. Our preliminary experimental study compares different ways to manage asynchronous I/Os atop xNVMe. The speed-up we observe over the DuckDB baseline is significant, even for the simplest scan query over a TPC-H table. As expected, the speed-up increases with the scale factor, and the Linux NVMe passthru improves performance. Future work includes a more thorough experimental study, a flexible solution that combines raw NVMe access and legacy POSIX file interface as well the co-design of DuckDB and SSDs.
Similar Papers
MobilityDuck: Mobility Data Management with DuckDB
Databases
Makes analyzing moving things super fast.
Multi-Queue SSD I/O Modeling & Its Implications for Data Structure Design
Data Structures and Algorithms
Makes computer storage faster by understanding new parts.
HDDB: Efficient In-Storage SQL Database Search Using Hyperdimensional Computing on Ferroelectric NAND Flash
Hardware Architecture
Lets computers search data super fast using noisy memory.