Apache Arrow ADBC Database Drivers

•January 16, 2026

Confessions of a Data Guy•Jan 16, 2026

Why It Matters

ADBC speeds data ingestion and extraction while simplifying code, giving enterprises a more efficient path to high‑performance analytics pipelines.

Key Takeaways

•ADBC transports Arrow RecordBatches directly between app and database
•Removes row‑by‑row marshaling, cutting serialization costs
•Python ADBC reaches ~275k rows/sec, beating psycopg2
•Provides near‑COPY performance with far simpler code
•Integrates with DuckDB, Polars, Spark, Pandas ecosystems

Pulse Analysis

The data‑access layer has long been dominated by ODBC and JDBC, standards that were designed for row‑oriented databases and require heavyweight drivers, DSN configuration, and often platform‑specific quirks. While these protocols remain ubiquitous, they impose serialization penalties that become noticeable at scale, especially when moving large, column‑oriented datasets between analytical tools and storage engines. Apache Arrow’s rise—thanks to its in‑memory columnar format and language‑agnostic bindings—has already reshaped data interchange in engines such as DuckDB, Spark, and Polars, setting the stage for a driver built on the same principles.

Arrow Database Connectivity (ADBC) leverages Arrow’s RecordBatch format to stream data directly from an application to a database without intermediate row materialization. This design cuts out multiple copy and conversion steps, delivering near‑COPY ingestion speeds while keeping client code concise. In a recent Python benchmark, ADBC posted roughly 275 k rows per second, outperforming traditional psycopg2 inserts and approaching the performance of PostgreSQL’s COPY command, yet with a fraction of the implementation complexity. The driver’s C‑based API also ensures cross‑platform compatibility, making it a viable drop‑in for existing ETL pipelines that already rely on Arrow‑compatible libraries.

For enterprises, ADBC represents a strategic upgrade to data pipelines: faster load times, reduced CPU and memory footprints, and a unified data format that eases integration across heterogeneous tools. As more database vendors expose ADBC endpoints and analytics platforms adopt Arrow natively, organizations can expect smoother end‑to‑end workflows, lower operational overhead, and the ability to scale analytics workloads without resorting to custom serialization hacks. Early adopters who modernize their ingestion layer with ADBC are poised to gain a competitive edge in real‑time analytics and cost‑effective data engineering.

Apache Arrow ADBC Database Drivers

Anyone who’s been around for more than a decade or so in the programming, development, and data world might get a slight eye twitch when the word database driver appears. Before the modern times we live in came along, the entire data world was driven by SQL Server, Oracle, with just a sprinkling of Postgres and MySQL … a‑la AWS. That’s just the way it was.

Part of that joy was dealing with database drivers, such as JDBC and ODBC, which are language‑neutral, OS‑level standards for accessing databases.

How it works

Your application → ODBC Driver Manager → ODBC Driver → Database

API based on C
Very common on Windows (but exists on Linux/macOS)
Used by tools like Excel, Power BI, Tableau, and many BI/ETL tools
Requires configuring DSNs (Data Source Names) or connection strings

Ahhh … the good (bad) old days.

Apache Arrow has quietly eaten the data world, small bite by small bite. It’s lightweight, columnar, fast, and has bindings across all popular programming languages. Low or no serialization and deserialization, depending on the tools that are used and passing around Arrow data.

Image: “Apache Arrow is eating the world”

I’m sure you fine folk can think of another area that pulls and pushes a lot of tabular data, eh?

Yeah, database drivers.

A tale as old as time… we push data into a database, we pull it back in, we push it, we pull it. Forever till you’re dead and buried.

Apache ADBC (Arrow Database Connectivity) is a modern database driver standard built on Apache Arrow’s columnar in‑memory format. It’s designed to replace or complement ODBC/JDBC in environments where Arrow‑native tools are already in use.

Instead of row‑by‑row marshaling, ADBC moves Arrow RecordBatches end‑to‑end, drastically reducing copies, conversions, and overhead.

At a high level:

Application → Arrow / ADBC → Database

(no row materialization in between)

Arrow is already everywhere in the data landscape, yet many people are simply unaware of it. Arrow underpins DuckDB, Polars, Spark, DataFusion, Pandas (interop), Flight / Flight SQL – you can use Arrow with all these tools. So it would make sense to add Arrow at the database driver layer … this is a big part of what we deal with today.

3. Performance without exotic tricks

This is what my benchmark shows clearly:

| Method | Rows/sec | Notes |

|----------------------------|----------|---------------------------|

| psycopg2 (row inserts) | ~79 k | Slow, simple |

| psycopg2 + COPY | ~194 k | Faster, complex |

| Python ADBC (Arrow) | ~275 k| Simple + fast |

| Polars + ADBC | ~215 k | Solid |

| DuckDB | ~1.15 M | Still the king 🐐 |

ADBC delivers near‑COPY performance with much simpler code.

Image: Code block showing Python commands for loading data into a table and appending to that table

Check the GitHub repo for the full code

Very interesting times we live in; it’s nice to see Arrow via ADBC drivers creeping into new parts of the data stack. It’s nothing but a brighter future from here.

Read Original Article

Comments

Want to join the conversation?

Loading comments...

Part of that joy was dealing with database drivers, such as JDBC and ODBC, which are language‑neutral, OS‑level standards for accessing databases.

How it works

Your application → ODBC Driver Manager → ODBC Driver → Database

API based on C
Very common on Windows (but exists on Linux/macOS)
Used by tools like Excel, Power BI, Tableau, and many BI/ETL tools
Requires configuring DSNs (Data Source Names) or connection strings

Ahhh … the good (bad) old days.

Image: “Apache Arrow is eating the world”

I’m sure you fine folk can think of another area that pulls and pushes a lot of tabular data, eh?

Yeah, database drivers.

A tale as old as time… we push data into a database, we pull it back in, we push it, we pull it. Forever till you’re dead and buried.

Instead of row‑by‑row marshaling, ADBC moves Arrow RecordBatches end‑to‑end, drastically reducing copies, conversions, and overhead.

At a high level:

Application → Arrow / ADBC → Database

(no row materialization in between)

3. Performance without exotic tricks

This is what my benchmark shows clearly:

| Method | Rows/sec | Notes |

|----------------------------|----------|---------------------------|

| psycopg2 (row inserts) | ~79 k | Slow, simple |

| psycopg2 + COPY | ~194 k | Faster, complex |

| Python ADBC (Arrow) | ~275 k| Simple + fast |

| Polars + ADBC | ~215 k | Solid |

| DuckDB | ~1.15 M | Still the king 🐐 |

ADBC delivers near‑COPY performance with much simpler code.

Image: Code block showing Python commands for loading data into a table and appending to that table

Check the GitHub repo for the full code

Very interesting times we live in; it’s nice to see Arrow via ADBC drivers creeping into new parts of the data stack. It’s nothing but a brighter future from here.

Big Data Pulse

Apache Arrow ADBC Database Drivers

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Apache Arrow ADBC Database Drivers

How it works

3. Performance without exotic tricks

Comments

Big Data Pulse

Apache Arrow ADBC Database Drivers

Why It Matters

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Apache Arrow ADBC Database Drivers

How it works

3. Performance without exotic tricks

Comments