Big Data News and Headlines
  • All Technology
  • AI
  • Autonomy
  • B2B Growth
  • Big Data
  • BioTech
  • ClimateTech
  • Consumer Tech
  • Crypto
  • Cybersecurity
  • DevOps
  • Digital Marketing
  • Ecommerce
  • EdTech
  • Enterprise
  • FinTech
  • GovTech
  • Hardware
  • HealthTech
  • HRTech
  • LegalTech
  • Nanotech
  • PropTech
  • Quantum
  • Robotics
  • SaaS
  • SpaceTech
AllNewsDealsSocialBlogsVideosPodcastsDigests

Big Data Pulse

EMAIL DIGESTS

Daily

Every morning

Weekly

Sunday recap

NewsDealsSocialBlogsVideosPodcasts
Big DataNewsRobins Tharakan: The "Skip Scan" You Already Had Before V18
Robins Tharakan: The "Skip Scan" You Already Had Before V18
Big Data

Robins Tharakan: The "Skip Scan" You Already Had Before V18

•February 4, 2026
0
Planet PostgreSQL (aggregator)
Planet PostgreSQL (aggregator)•Feb 4, 2026

Why It Matters

Skip‑scan can dramatically reduce I/O for queries on non‑leading columns when the leading key has few distinct values, directly affecting latency and cost in production workloads. Understanding when it applies helps DBAs choose indexes that fully exploit PostgreSQL 18’s optimizer improvements.

Key Takeaways

  • •PostgreSQL 18 adds native skip‑scan for multicolumn B‑trees.
  • •Low‑cardinality leading columns gain up to 5× faster scans.
  • •High‑cardinality leading columns see no performance improvement.
  • •`Index Searches` >1 in EXPLAIN signals skip‑scan usage.
  • •Proper statistics and index choice remain critical across versions.

Pulse Analysis

The skip‑scan addition in PostgreSQL 18 refines how the query planner navigates multicolumn B‑tree structures. Rather than reading every leaf page, the optimizer now isolates each distinct value of the leading column and performs a focused seek on the secondary key. This approach mirrors index‑only scans but reduces unnecessary page reads when the leading column’s cardinality is low, delivering a more efficient execution path without altering the underlying index definition.

Benchmark data underscores the conditional nature of the benefit. In a test where the leading column "bid" had only ten distinct values, the skip‑scan reduced buffer hits from 845 to 39 and lifted transactions per second from roughly 2,500 to 14,400. Conversely, when the leading column "aid" was unique for every row, the planner correctly avoided skip‑scan, as the cost of one million seeks outweighed sequential access. The planner’s cost model, informed by updated statistics, therefore selects the optimal strategy on a case‑by‑case basis.

For practitioners, the practical takeaway is twofold. First, monitor the "Index Searches" metric in EXPLAIN (ANALYZE); a count above one confirms skip‑scan activation. Second, maintain accurate column statistics and consider low‑cardinality leading columns when designing multicolumn indexes, especially for workloads that filter on non‑leading attributes. While older PostgreSQL versions already performed full‑index scans when advantageous, the native skip‑scan in v18 offers a measurable edge for the right data distributions, making it a valuable tool in modern performance‑tuning arsenals.

Robins Tharakan: The "Skip Scan" You Already Had Before v18

PostgreSQL 18 introduces native "Skip Scan" for multicolumn B-tree indexes, a major optimizer enhancement. However, a common misconception is that pre-v18 versions purely resort to sequential scans when the leading column isn't filtered. In reality, the cost-based planner has long been capable of leveraging such indexes when efficient.

How it works under the hood:

In pre-v18 versions, if an index is significantly smaller than the table (the heap), scanning the entire index to find matching rows—even without utilizing the tree structure to "skip"—can be cheaper than a sequential table scan. This is sometimes also referred as a Full-Index Scan. While this lacks v18’s ability to hop between distinct leading keys, it effectively still speeds up queries on non-leading columns for many common workloads.

Notably, we would see why this v18 improvement is not a game-changer for all workloads, and why you shouldn't assume speed-up for all kinds of datasets.

Benchmarks: v17 vs v18

To understand when the new v18 Skip Scan helps, we tested two distinct scenarios.

Scenario A: Low Cardinality (The "Success" Case)

The Setup: We created an index on (bid, abalance) and ran the following query:


SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;

Note: We did not run a read-write workload, so abalance is 0 for all rows. The query returns 0 rows.

Data Statistics:

  • Table Rows: 1,000,000

  • Leading Column (bid): ~10 distinct values (Low Cardinality).

  • Filter Column (abalance): 1 distinct value (Uniform).

Results:

Version

Strategy

TPS

Avg Latency

Buffers Hit

v17

Index Only Scan (Full)

~2,511

0.40 ms

845

v18

Index Only Scan (Skip Scan)

~14,382

0.07 ms

39

Why the massive 5.7x gain?

Pre-v18 Postgres sees that bid is not constrained. It decides its only option is to read the entire index (all leaf pages) to find rows where abalance = -2187. It scans 1,000,000 items (845 pages).

Postgres v18 uses Skip Scan. It knows bid comes first. instead of scanning sequentially, it:

  1. Seeks to the first bid (e.g., 1). Checks for abalance = -2187.

  2. "Skips" to the next unique bid (e.g., 2). Checks for abalance = -2187.

  3. Repeats for all 10 branches.

It effectively turns one giant scan into 10 small seeks. The EXPLAIN output confirms this with Index Searches: 11 (10 branches + 1 stop condition) and only 39 buffer hits.

v18 Skip Scan Plan (Scenario A):


 Index Only Scan using pgbench_accounts_bid_abalance_idx on public.pgbench_accounts ...

   Index Cond: (pgbench_accounts.abalance = '-2187'::integer)

   Heap Fetches: 0

   Index Searches: 11

   Buffers: shared hit=39

Scenario B: High Cardinality (The "Failure" Case)

The Setup: We used the same query, but against an index on (aid, abalance).

Data Statistics:

  • Leading Column (aid): 1,000,000 distinct values (100% Unique / High Cardinality).

Results:

Version

Strategy

TPS

Avg Latency

Buffers Hit

v17

Index Only Scan (Full)

~55.8

17.92 ms

2735

v18

Index Only Scan (Full)

~51.2

19.52 ms

2741

Why no improvement? If v18 attempted to "Skip Scan" here, it would have to skip 1,000,000 times (once for every unique aid). Performing 1 million seeks is significantly heavier than simply reading the 1 million index entries linearly (which benefits from sequential I/O and page pre-fetching). The planner correctly estimated this cost and fell back to the standard "Index Only Scan" used in v17.

v18 Plan (Identical to v17):


 Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts ...

   Index Cond: (pgbench_accounts.abalance = '-2187'::integer)

   Heap Fetches: 0

   Index Searches: 1

   Buffers: shared hit=2741

How to Identify a Skip Scan

You might notice that the node type in the EXPLAIN output remains Index Scan or Index Only Scan in both cases. PostgreSQL 18 does not introduce a special "Skip Scan" node.

Instead, you must look at the Index Searches line (visible with EXPLAIN (ANALYZE)):

  • v17 (and older): The Index Searches row does not appear.

  • v18 (Standard Scan): Index Searches: 1. The scan started at one point and read sequentially (as seen in Scenario B).

  • v18 (Skip Scan): Index Searches: > 1. The engine performed multiple seeks (as seen in Scenario A which had 11).

In our Scenario A (Success), Index Searches: 11 tells us it performed ~11 hops, confirming the feature was used.

How to reproduce:

Note: We use PostgreSQL 13 in this section to meaningfully demonstrate that even older versions (long before v18) could efficiently utilize multicolumn indexes for these types of queries, to clarify that older Postgres versions were capable of avoiding a Full Table Scan (and instead fallback to Full Index Scan) in such scenarios.

  • PostgreSQL 13 (a running cluster)

  • A pgbench-initialized database (run pgbench -i to create pgbench_accounts, pgbench_branches, pgbench_tellers, and pgbench_history)

  • Reasonable data loaded (default pgbench -i -s 10 or similar)

Steps to reproduce the example:

Follow these steps in your terminal to set up the test environment. This assumes you have PostgreSQL 13 (or a similar version) installed and its command-line tools are in your PATH.

Step 1: Create and Initialize the Database


createdb testdb                # Create a new database

pgbench -i -s 10 testdb         # Initialize it with 1 million rows

Step 2: Run a Quick Read-Write Test

To simulate some database activity, run a standard 10-second read-write benchmark.


pgbench -T 10 testdb

Step 3: Connect and Run the Test Query

Now, connect to the database with psql to run the SQL commands that demonstrate the index scan behavior.

SQL steps (run in psql):


psql testdb

-- create a multicolumn index (first column aid, second column abalance)

testdb=# CREATE INDEX CONCURRENTLY IF NOT EXISTS pgbench_accounts_aid_abalance_idx

  ON pgbench_accounts(aid, abalance);



testdb=# select abalance, count(*) from pgbench_accounts group by abalance order by count(*) desc limit 5;

 abalance | count

----------+--------

        0 | 995151

    -2187 |      4

    -4621 |      4

    -4030 |      4

    -2953 |      4

(5 rows)



-- run an explain on a selection that filters only on the second column (abalance)

testdb=# EXPLAIN (ANALYSE, VERBOSE, COSTS)

SELECT COUNT(*) FROM pgbench_accounts WHERE abalance = -2187;



Aggregate  (cost=18480.77..18480.78 rows=1 width=8) (actual time=18.672..18.674 rows=1 loops=1)

   Output: count(*)

   ->  Index Only Scan using pgbench_accounts_aid_abalance_idx on public.pgbench_accounts  (cost=0.42..18480.69 rows=33 width=0) (actual time=8.583..18.659 rows=4 loops=1)

         Output: aid, abalance

         Index Cond: (pgbench_accounts.abalance = '-2187'::integer)

         Heap Fetches: 0

 Planning Time: 0.328 ms

 Execution Time: 18.732 ms

(8 rows)

Notes on the output above:

  • The important line is Index Only Scan using pgbench_accounts_aid_abalance_idx along with Index Cond: (pgbench_accounts.abalance = '-2187'::integer) — this demonstrates the planner chose an index scan that probes the multicolumn btree for entries where the second column matches the predicate.

Why this is not the v18 "skip-scan" feature in full:

  • The v18 skip-scan adds optimizer logic to efficiently iterate distinct values of leading columns and probe the remainder of the index; it's a targeted algorithmic addition. What we show above is the planner choosing an index scan over a multicolumn index and applying the Index Cond to the second column. That can be effective for many queries and data layouts, but it lacks the specialized skip-scan internals that v18 adds for certain other cases.

Production Tips

  • If you need queries on a non-leading column to be fast on older Postgres versions, create the right index and keep statistics current (ANALYZE). The planner may prefer an index scan over a seq scan when selectivity and costs align.

  • Consider partial or expression indexes when appropriate; they let you make an index that directly serves the filter you need.

  • When portability across versions is important, test on the earliest Postgres version you need to support; planner behavior can vary by release, statistics, and data distribution.

Conclusion

Postgres v18's documented skip-scan addition for B-tree indexes is a welcome and useful optimizer enhancement, specifically for low-cardinality leading columns. However, for high-cardinality leading columns (like our first example), the standard Full Index Scan remains optimal, and pre-v18 versions handle them just fine.

References

  • PostgreSQL 18 release notes (skip-scan, indexes): https://www.postgresql.org/docs/18/release-18.html

  • nbtree skip-scan optimization: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=92fe23d93

  • Further optimize nbtree search scan key comparisons: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=8a510275d

Read Original Article
0

Comments

Want to join the conversation?

Loading comments...