Skip‑scan can dramatically reduce I/O for queries on non‑leading columns when the leading key has few distinct values, directly affecting latency and cost in production workloads. Understanding when it applies helps DBAs choose indexes that fully exploit PostgreSQL 18’s optimizer improvements.
The skip‑scan addition in PostgreSQL 18 refines how the query planner navigates multicolumn B‑tree structures. Rather than reading every leaf page, the optimizer now isolates each distinct value of the leading column and performs a focused seek on the secondary key. This approach mirrors index‑only scans but reduces unnecessary page reads when the leading column’s cardinality is low, delivering a more efficient execution path without altering the underlying index definition.
Benchmark data underscores the conditional nature of the benefit. In a test where the leading column "bid" had only ten distinct values, the skip‑scan reduced buffer hits from 845 to 39 and lifted transactions per second from roughly 2,500 to 14,400. Conversely, when the leading column "aid" was unique for every row, the planner correctly avoided skip‑scan, as the cost of one million seeks outweighed sequential access. The planner’s cost model, informed by updated statistics, therefore selects the optimal strategy on a case‑by‑case basis.
For practitioners, the practical takeaway is twofold. First, monitor the "Index Searches" metric in EXPLAIN (ANALYZE); a count above one confirms skip‑scan activation. Second, maintain accurate column statistics and consider low‑cardinality leading columns when designing multicolumn indexes, especially for workloads that filter on non‑leading attributes. While older PostgreSQL versions already performed full‑index scans when advantageous, the native skip‑scan in v18 offers a measurable edge for the right data distributions, making it a valuable tool in modern performance‑tuning arsenals.
PostgreSQL 18 introduces native "Skip Scan" for multicolumn B-tree indexes, a major optimizer enhancement. However, a common misconception is that pre-v18 versions purely resort to sequential scans when the leading column isn't filtered. In reality, the cost-based planner has long been capable of leveraging such indexes when efficient.
How it works under the hood:
In pre-v18 versions, if an index is significantly smaller than the table (the heap), scanning the entire index to find matching rows—even without utilizing the tree structure to "skip"—can be cheaper than a sequential table scan. This is sometimes also referred as a Full-Index Scan. While this lacks v18’s ability to hop between distinct leading keys, it effectively still speeds up queries on non-leading columns for many common workloads.
Notably, we would see why this v18 improvement is not a game-changer for all workloads, and why you shouldn't assume speed-up for all kinds of datasets.
To understand when the new v18 Skip Scan helps, we tested two distinct scenarios.
The Setup: We created an index on (bid, abalance) and ran the following query:
SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;
Note: We did not run a read-write workload, so abalance is 0 for all rows. The query returns 0 rows.
Data Statistics:
Table Rows: 1,000,000
Leading Column (bid): ~10 distinct values (Low Cardinality).
Filter Column (abalance): 1 distinct value (Uniform).
Results:
Version
Strategy
TPS
Avg Latency
Buffers Hit
v17
Index Only Scan (Full)
~2,511
0.40 ms
845
v18
Index Only Scan (Skip Scan)
~14,382
0.07 ms
39
Why the massive 5.7x gain?
Pre-v18 Postgres sees that bid is not constrained. It decides its only option is to read the entire index (all leaf pages) to find rows where abalance = -2187. It scans 1,000,000 items (845 pages).
Postgres v18 uses Skip Scan. It knows bid comes first. instead of scanning sequentially, it:
Seeks to the first bid (e.g., 1). Checks for abalance = -2187.
"Skips" to the next unique bid (e.g., 2). Checks for abalance = -2187.
Repeats for all 10 branches.
It effectively turns one giant scan into 10 small seeks. The EXPLAIN output confirms this with Index Searches: 11 (10 branches + 1 stop condition) and only 39 buffer hits.
v18 Skip Scan Plan (Scenario A):
Index Only Scan using pgbench_accounts_bid_abalance_idx on public.pgbench_accounts ...
Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
Heap Fetches: 0
Index Searches: 11
Buffers: shared hit=39
The Setup: We used the same query, but against an index on (aid, abalance).
Data Statistics:
aid): 1,000,000 distinct values (100% Unique / High Cardinality).Results:
Version
Strategy
TPS
Avg Latency
Buffers Hit
v17
Index Only Scan (Full)
~55.8
17.92 ms
2735
v18
Index Only Scan (Full)
~51.2
19.52 ms
2741
Why no improvement? If v18 attempted to "Skip Scan" here, it would have to skip 1,000,000 times (once for every unique aid). Performing 1 million seeks is significantly heavier than simply reading the 1 million index entries linearly (which benefits from sequential I/O and page pre-fetching). The planner correctly estimated this cost and fell back to the standard "Index Only Scan" used in v17.
v18 Plan (Identical to v17):
Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts ...
Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=2741
You might notice that the node type in the EXPLAIN output remains Index Scan or Index Only Scan in both cases. PostgreSQL 18 does not introduce a special "Skip Scan" node.
Instead, you must look at the Index Searches line (visible with EXPLAIN (ANALYZE)):
v17 (and older): The Index Searches row does not appear.
v18 (Standard Scan): Index Searches: 1. The scan started at one point and read sequentially (as seen in Scenario B).
v18 (Skip Scan): Index Searches: > 1. The engine performed multiple seeks (as seen in Scenario A which had 11).
In our Scenario A (Success), Index Searches: 11 tells us it performed ~11 hops, confirming the feature was used.
Note: We use PostgreSQL 13 in this section to meaningfully demonstrate that even older versions (long before v18) could efficiently utilize multicolumn indexes for these types of queries, to clarify that older Postgres versions were capable of avoiding a Full Table Scan (and instead fallback to Full Index Scan) in such scenarios.
PostgreSQL 13 (a running cluster)
A pgbench-initialized database (run pgbench -i to create pgbench_accounts, pgbench_branches, pgbench_tellers, and pgbench_history)
Reasonable data loaded (default pgbench -i -s 10 or similar)
Follow these steps in your terminal to set up the test environment. This assumes you have PostgreSQL 13 (or a similar version) installed and its command-line tools are in your PATH.
createdb testdb # Create a new database
pgbench -i -s 10 testdb # Initialize it with 1 million rows
To simulate some database activity, run a standard 10-second read-write benchmark.
pgbench -T 10 testdb
Now, connect to the database with psql to run the SQL commands that demonstrate the index scan behavior.
SQL steps (run in psql):
psql testdb
-- create a multicolumn index (first column aid, second column abalance)
testdb=# CREATE INDEX CONCURRENTLY IF NOT EXISTS pgbench_accounts_aid_abalance_idx
ON pgbench_accounts(aid, abalance);
testdb=# select abalance, count(*) from pgbench_accounts group by abalance order by count(*) desc limit 5;
abalance | count
----------+--------
0 | 995151
-2187 | 4
-4621 | 4
-4030 | 4
-2953 | 4
(5 rows)
-- run an explain on a selection that filters only on the second column (abalance)
testdb=# EXPLAIN (ANALYSE, VERBOSE, COSTS)
SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;
Aggregate (cost=18480.77..18480.78 rows=1 width=8) (actual time=18.672..18.674 rows=1 loops=1)
Output: count(*)
-> Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts (cost=0.42..18480.69 rows=33 width=0) (actual time=8.583..18.659 rows=4 loops=1)
Output: aid, abalance
Index Cond: (pgbench_accounts.abalance = '-2187'::integer)
Heap Fetches: 0
Planning Time: 0.328 ms
Execution Time: 18.732 ms
(8 rows)
Notes on the output above:
Index Only Scan using pgbench_accounts_aid_abalance_idx along with Index Cond: (pgbench_accounts.abalance = '-2187'::integer) — this demonstrates the planner chose an index scan that probes the multicolumn btree for entries where the second column matches the predicate.Why this is not the v18 "skip-scan" feature in full:
Index Cond to the second column. That can be effective for many queries and data layouts, but it lacks the specialized skip-scan internals that v18 adds for certain other cases.If you need queries on a non-leading column to be fast on older Postgres versions, create the right index and keep statistics current (ANALYZE). The planner may prefer an index scan over a seq scan when selectivity and costs align.
Consider partial or expression indexes when appropriate; they let you make an index that directly serves the filter you need.
When portability across versions is important, test on the earliest Postgres version you need to support; planner behavior can vary by release, statistics, and data distribution.
Postgres v18's documented skip-scan addition for B-tree indexes is a welcome and useful optimizer enhancement, specifically for low-cardinality leading columns. However, for high-cardinality leading columns (like our first example), the standard Full Index Scan remains optimal, and pre-v18 versions handle them just fine.
PostgreSQL 18 release notes (skip-scan, indexes): https://www.postgresql.org/docs/18/release-18.html
nbtree skip-scan optimization: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=92fe23d93
Further optimize nbtree search scan key comparisons: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8a510275d
Comments
Want to join the conversation?
Loading comments...