Radim Marek: PostgreSQL Statistics: Why Queries Run Slow

Radim Marek: PostgreSQL Statistics: Why Queries Run Slow

Planet PostgreSQL
Planet PostgreSQLFeb 26, 2026

Why It Matters

Out‑of‑date statistics lead to costly mis‑plans, directly affecting application performance and resource utilization. Maintaining accurate stats is essential for reliable cost‑based optimization in PostgreSQL environments.

Key Takeaways

  • Stale pg_class/pg_statistic cause misestimated row counts.
  • ANALYZE samples 300 * default_statistics_target rows.
  • MCV and histogram guide equality vs range selectivity.
  • Correlation influences index versus sequential scan cost.
  • Extended stats capture column interdependencies for better joins.

Pulse Analysis

Accurate planner statistics are the backbone of PostgreSQL’s cost‑based optimizer. The planner reads table‑level metadata from pg_class—relpages, reltuples, relallvisible—to gauge size and I/O cost, then consults column‑level data in pg_statistic for selectivity. When these numbers drift from reality, the optimizer may favor nested loops, index scans, or hash joins that are sub‑optimal, inflating CPU and I/O consumption. Understanding the statistical pipeline helps DBAs diagnose why a query that once ran in milliseconds now stalls, and highlights the importance of timely ANALYZE runs.

ANALYZE works by sampling a statistically significant subset of rows, typically 300 × default_statistics_target (default 100) rows, and then computing per‑column metrics such as null_frac, avg_width, n_distinct, most_common_vals, and histogram_bounds. MCV lists give the planner precise selectivity for frequent equality predicates, while histograms approximate range predicates through bucketed distributions. Correlation values inform the planner whether an index scan will behave like sequential I/O. For columns with skewed distributions or many distinct values, raising the statistics target or defining column‑specific targets can sharpen estimates without overwhelming the planner.

In practice, the best defense against stale statistics is a well‑tuned autovacuum configuration that triggers ANALYZE after a defined percentage of rows change. For bulk data loads, schedule manual ANALYZE or use the VACUUM (ANALYZE) command. Consider extended statistics for column groups that exhibit strong dependencies, and create expression indexes for computed predicates that otherwise lack stats. Monitoring pg_stat_user_tables for last_analyze timestamps and reltuples drift can alert teams before performance regressions surface, ensuring the optimizer continues to choose the most efficient execution paths.

Radim Marek: PostgreSQL Statistics: Why queries run slow

Comments

Want to join the conversation?

Loading comments...