Apache Arrow ADBC Database Drivers

Apache Arrow ADBC Database Drivers

Confessions of a Data Guy
Confessions of a Data GuyJan 16, 2026

Key Takeaways

  • ADBC transports Arrow RecordBatches directly between app and database
  • Removes row‑by‑row marshaling, cutting serialization costs
  • Python ADBC reaches ~275k rows/sec, beating psycopg2
  • Provides near‑COPY performance with far simpler code
  • Integrates with DuckDB, Polars, Spark, Pandas ecosystems

Summary

Apache Arrow’s ADBC (Arrow Database Connectivity) introduces a modern, columnar‑native driver that can replace or complement traditional ODBC/JDBC stacks. By moving Arrow RecordBatches end‑to‑end, it eliminates row‑by‑row marshaling and dramatically reduces serialization overhead. Benchmarks show Python ADBC achieving roughly 275 k rows per second—near the speed of PostgreSQL’s COPY command but with far simpler code. The driver already integrates with popular tools like DuckDB, Polars, Spark, and Pandas, signaling broader adoption across the data ecosystem.

Pulse Analysis

The data‑access layer has long been dominated by ODBC and JDBC, standards that were designed for row‑oriented databases and require heavyweight drivers, DSN configuration, and often platform‑specific quirks. While these protocols remain ubiquitous, they impose serialization penalties that become noticeable at scale, especially when moving large, column‑oriented datasets between analytical tools and storage engines. Apache Arrow’s rise—thanks to its in‑memory columnar format and language‑agnostic bindings—has already reshaped data interchange in engines such as DuckDB, Spark, and Polars, setting the stage for a driver built on the same principles.

Arrow Database Connectivity (ADBC) leverages Arrow’s RecordBatch format to stream data directly from an application to a database without intermediate row materialization. This design cuts out multiple copy and conversion steps, delivering near‑COPY ingestion speeds while keeping client code concise. In a recent Python benchmark, ADBC posted roughly 275 k rows per second, outperforming traditional psycopg2 inserts and approaching the performance of PostgreSQL’s COPY command, yet with a fraction of the implementation complexity. The driver’s C‑based API also ensures cross‑platform compatibility, making it a viable drop‑in for existing ETL pipelines that already rely on Arrow‑compatible libraries.

For enterprises, ADBC represents a strategic upgrade to data pipelines: faster load times, reduced CPU and memory footprints, and a unified data format that eases integration across heterogeneous tools. As more database vendors expose ADBC endpoints and analytics platforms adopt Arrow natively, organizations can expect smoother end‑to‑end workflows, lower operational overhead, and the ability to scale analytics workloads without resorting to custom serialization hacks. Early adopters who modernize their ingestion layer with ADBC are poised to gain a competitive edge in real‑time analytics and cost‑effective data engineering.

Apache Arrow ADBC Database Drivers

Comments

Want to join the conversation?