PG18 Hacktober: 31 Days of New Features : Planning ahead: PG18 retains planner stats during Upgrades

Welcome to the Day2 of the PG18 Hacktober!!
Upgrading PostgreSQL major versions has always been a balancing act: you get shiny new features, but you also risk a temporary performance dip while your database “re-learns” how to optimize queries. With PostgreSQL 18, that compromise is finally history.

What does the documentation say:

PostgreSQL now keeps planner statistics through a major version upgrade, which helps an upgraded cluster reach expected performance more quickly after the upgrade. See the statistics section in pg_upgrade for more information.

Let’s dive into how pg_upgrade just got smarter, faster, and friendlier for production workloads.

Why planner statistics matter

One of PostgreSQL’s power is its cost-based query planner, which chooses the most efficient way to execute your queries. To do this, it relies on planner statistics: information about table sizes, data distribution, and correlations between columns.

These stats are built via the ANALYZE command. Without them, PostgreSQL is essentially generating inefficient plans. On busy systems, this often meant painful slowdowns right after a major upgrade until ANALYZE completed.

PostgreSQL 18: Faster upgrades, better performance

With PostgreSQL 18, planner statistics are now preserved through a major version upgrade.That means:

Immediate performance : Your freshly upgraded cluster starts off close to peak performance immediately.
Minimal disruption : No more long waits for statistics collection to catch up.
Smarter operations : Significantly smoother upgrades for large or mission-critical databases.

In short:PostgreSQL makes upgrades without post-upgrade slowdowns to make more production friendly.

A new approach in the upgrade

vacuumdb --all --missing-stats-only

Since pg_upgrade now preserves planner statistics, you don’t need to run a full ANALYZE anymore after an upgrade. Let’s check the pg_upgrade workflow between PG17 and PG18.

Let’s take a PG16 instance and check the stats

#Connect to the PG16 database 
[postgres@ip-172-31-11-103 ~]$ psql -d test_db
psql (18.0, server 16.10)
Type "help" for help.

#Check the live_tuples and last_analyze
test_db=# SELECT relname, last_analyze, n_live_tup
FROM pg_stat_user_tables
WHERE schemaname = 'planner_stats_test';
-[ RECORD 1 ]+-----------------------------
relname      | tbl_test
last_analyze | 2025-10-02 06:59:50.87828+00
n_live_tup   | 10000

#Check the pg_stats output
test_db=# SELECT schemaname,tablename,attname,n_distinct,most_common_vals,correlation FROM pg_stats WHERE schemaname = 'planner_stats_test' AND tablename = 'tbl_test';
-[ RECORD 1 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | id
n_distinct       | -1
most_common_vals |
correlation      | 1
-[ RECORD 2 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | value
n_distinct       | -1
most_common_vals |
correlation      | -0.026743531

Upgrade to PG17

Let’s upgrade the PG16 to PG17 and check the output as well as stats.Here we are using --link flag instead of --swap flag because we need to PG16 in recoverable state.
Note: Condiser only the last part of the upgrade command output

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /usr/pgsql-17/bin/vacuumdb --all --analyze-in-stages
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Here, we can see the output that the “optimizer statistics are not transferred by pg_upgrade”.Let’s connect to the PG17 and check the stats without analyze.

#Starting the PG17 without analyzing
[postgres@ip-172-31-11-103 ~]$ /usr/pgsql-17/bin/pg_ctl -D /var/lib/pgsql/17/data/ start
waiting for server to start....2025-10-02 07:32:07.891 UTC [4132] LOG:  redirecting log output to logging collector process
2025-10-02 07:32:07.891 UTC [4132] HINT:  Future log output will appear in directory "log".
 done
server started

#Checking the stats
test_db=# \x
Expanded display is on.
test_db=# SELECT relname, last_analyze, n_live_tup
FROM pg_stat_user_tables
WHERE schemaname = 'planner_stats_test';
-[ RECORD 1 ]+---------
relname      | tbl_test
last_analyze |
n_live_tup   | 0

test_db=# SELECT schemaname,tablename,attname,n_distinct,most_common_vals,correlation FROM pg_stats WHERE schemaname = 'planner_stats_test' AND tablename =
'tbl_test';
(0 rows)

test_db=#

In PG17 , without analyzing post-upgrade , here we can see that the stats are not transferred. Let’s check the same for PG18.
Note: Condiser only the last part of the upgrade command output

Upgrade Complete
----------------
Some statistics are not transferred by pg_upgrade.
Once you start the new server, consider running these two commands:
    /usr/pgsql-18/bin/vacuumdb --all --analyze-in-stages --missing-stats-only
    /usr/pgsql-18/bin/vacuumdb --all --analyze-only
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Here, we can see output states that “some statistics are not transferred by pg_upgrade”.In addition,we can see the --missing-stats-only flag. Let’s connect to the PG18 and check the stats without analyze.

#Start the PG18 without running the --missing-stats-only flag
[postgres@ip-172-31-11-103 ~]$ /usr/pgsql-18/bin/pg_ctl -D /var/lib/pgsql/18/data/ start
waiting for server to start....2025-10-02 07:55:43.013 UTC [4893] LOG:  redirecting log output to logging collector process
2025-10-02 07:55:43.013 UTC [4893] HINT:  Future log output will appear in directory "log".
 done
server started

#Connect to the PG18 instances and check the stats
[postgres@ip-172-31-11-103 ~]$ psql -d test_db
psql (18.0)
Type "help" for help.

test_db=# \x
Expanded display is on.
test_db=# SELECT relname, last_analyze, n_live_tup
FROM pg_stat_user_tables
WHERE schemaname = 'planner_stats_test';
-[ RECORD 1 ]+---------
relname      | tbl_test
last_analyze |
n_live_tup   | 0

test_db=# SELECT schemaname,tablename,attname,n_distinct,most_common_vals,correlation FROM pg_stats WHERE schemaname = 'planner_stats_test' AND tablename = 'tbl_test';
-[ RECORD 1 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | id
n_distinct       | -1
most_common_vals |
correlation      | 1
-[ RECORD 2 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | value
n_distinct       | -1
most_common_vals |
correlation      | -0.026743531

Here , we can see that statistics are transferred.

Let’s compare vacuumdb –analyze-in-stages (old option) vs. the new –missing-stats-only flag.

Old option : vacuumdb –analyze-in-stages

#Using the old option --analyze-in-stages
[postgres@ip-172-31-11-103 ~]$ time /usr/pgsql-17/bin/vacuumdb --all --analyze-in-stages
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "test_db": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "test_db": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics
vacuumdb: processing database "test_db": Generating default (full) optimizer statistics

real    0m0.974s
user    0m0.004s
sys     0m0.017s

#Check the stats post analyzing
[postgres@ip-172-31-11-103 ~]$ psql -d test_db
psql (18.0, server 17.6)
Type "help" for help.

test_db=# \x
Expanded display is on.
test_db=# SELECT relname, last_analyze, n_live_tup                                                                                                          FROM pg_stat_user_tables
WHERE schemaname = 'planner_stats_test';
-[ RECORD 1 ]+------------------------------
relname      | tbl_test
last_analyze | 2025-10-02 08:14:16.064645+00
n_live_tup   | 10000

test_db=# SELECT schemaname,tablename,attname,n_distinct,most_common_vals,correlation FROM pg_stats WHERE schemaname = 'planner_stats_test' AND tablename = 'tbl_test';
-[ RECORD 1 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | id
n_distinct       | -1
most_common_vals |
correlation      | 1
-[ RECORD 2 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | value
n_distinct       | -1
most_common_vals |
correlation      | -0.026743531

New option: vacuumdb –missing-stats-only

#Using --missing-stats-only and --all --analyze-only
[postgres@ip-172-31-11-103 ~]$ time /usr/pgsql-18/bin/vacuumdb --all --analyze-in-stages --missing-stats-only
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "test_db": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "test_db": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics
vacuumdb: processing database "test_db": Generating default (full) optimizer statistics

real    0m0.143s
user    0m0.004s
sys     0m0.006s

[postgres@ip-172-31-11-103 ~]$ time /usr/pgsql-18/bin/vacuumdb --all --analyze-only
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming database "template1"
vacuumdb: vacuuming database "test_db"

real    0m0.415s
user    0m0.005s
sys     0m0.004s

#Check the stats post analyze
[postgres@ip-172-31-11-103 ~]$ psql -d test_db
psql (18.0)
Type "help" for help.

test_db=# \x
Expanded display is on.
test_db=# SELECT relname, last_analyze, n_live_tup                                                                                                          FROM pg_stat_user_tables
WHERE schemaname = 'planner_stats_test';
-[ RECORD 1 ]+------------------------------
relname      | tbl_test
last_analyze | 2025-10-02 08:15:42.768555+00
n_live_tup   | 10000

test_db=# SELECT schemaname,tablename,attname,n_distinct,most_common_vals,correlation FROM pg_stats WHERE schemaname = 'planner_stats_test' AND tablename = 'tbl_test';
-[ RECORD 1 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | value
n_distinct       | -1
most_common_vals |
correlation      | -0.026743531
-[ RECORD 2 ]----+-------------------
schemaname       | planner_stats_test
tablename        | tbl_test
attname          | id
n_distinct       | -1
most_common_vals |
correlation      | 1

Compare the time taken for this small instance around 3.5 MB,the new approach (--missing-stats-only along with --analyze-only) is taking 0.558s and the old approach(--analyze-in-stages) is taking 0.974s which is approx. 42.73% faster (in the above example).

Running ANALYZE on the whole cluster can be time-consuming and resource-intensive, especially for large databases.
The –missing-stats-only option means vacuumdb will only analyze relations that lack statistics, skipping the rest. This makes the post-upgrade workflow much leaner and avoids redundant work.

As we mentioned above , PG18 gives us immediate performance. Here we have taken query execution plan of a specific query from both PG17 and PG18 without post upgradation analyzing.

#In PG17 
test_db=# EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM planner_stats_test.tbl_test WHERE id < 100;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl_test  (cost=67.84..196.29 rows=3556 width=36) (actual time=0.033..0.729 rows=99 loops=1)
   Recheck Cond: (id < 100)
   Heap Blocks: exact=1
   Buffers: shared read=3
   ->  Bitmap Index Scan on tbl_test_pkey  (cost=0.00..66.95 rows=3556 width=0) (actual time=0.019..0.020 rows=99 loops=1)
         Index Cond: (id < 100)
         Buffers: shared read=2
 Planning:
   Buffers: shared hit=22 read=1
 Planning Time: 0.134 ms
 Execution Time: 0.789 ms
(11 rows)

#PG18
test_db=# EXPLAIN (ANALYZE,BUFFERS) SELECT * FROM planner_stats_test.tbl_test WHERE id < 100;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tbl_test_pkey on tbl_test  (cost=0.29..10.02 rows=99 width=37) (actual time=0.016..0.068 rows=99.00 loops=1)
   Index Cond: (id < 100)
   Index Searches: 1
   Buffers: shared hit=2 read=1
 Planning:
   Buffers: shared hit=28 read=6
 Planning Time: 0.185 ms
 Execution Time: 0.126 ms
(8 rows)

In the above example,PG18 delivers a immediate preformance that to ~83% faster (0.126 vs 0.789 ms). This is the kind of gain we can expect PG18 to bring to our production workload

pg_upgrade --no-statistics

What if you don’t want statistics to carry over? PostgreSQL 18 introduces a new flag for pg_upgrade --no-statistics .This gives you backward compatibility with the old behavior, useful if you want a clean slate or prefer to regenerate statistics yourself. Flexibility is key.

Checksums & upgrades: A Helpful New Initdb Option

Another subtle but important improvement: PostgreSQL 18 lets you disable checksums at cluster initialization with: initdb --no-data-checksums

#While upgrading without initialize using --no-data-checksums
[postgres@ip-172-31-11-103 global]$ /usr/pgsql-18/bin/pg_upgrade --old-datadir /var/lib/pgsql/16/data --new-datadir /var/lib/pgsql/18/data --old-bindir /usr/pgsql-16/bin/ --new-bindir /usr/pgsql-18/bin/ --check
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                     ok

old cluster does not use data checksums but the new one does
Failure, exiting
#Note: We will encounter this error if we initialize the new PG18 major version without --no-data-checksums

#Using the initdb with --no-data-checksums
[postgres@ip-172-31-11-103 global]$ /usr/pgsql-18/bin/initdb -D /var/lib/pgsql/18/data --no-data-checksums
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /var/lib/pgsql/18/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/pgsql-18/bin/pg_ctl -D /var/lib/pgsql/18/data -l logfile start

[postgres@ip-172-31-11-103 global]$ /usr/pgsql-18/bin/pg_upgrade --old-datadir /var/lib/pgsql/16/data --new-datadir /var/lib/pgsql/18/data --old-bindir /usr/pgsql-16/bin/ --new-bindir /usr/pgsql-18/bin/ --check
Performing Consistency Checks on Old Live Server
------------------------------------------------
Checking cluster versions                                     ok
Checking database connection settings                         ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for contrib/isn with bigint-passing mismatch         ok
Checking data type usage                                      ok
Checking for not-null constraint inconsistencies              ok
Checking for presence of required libraries                   ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for new cluster tablespace directories               ok

*Clusters are compatible*
[postgres@ip-172-31-11-103 global]$

Why is this relevant? Because pg_upgrade requires matching checksum settings between old and new clusters. If your existing cluster doesn’t use checksums, you can now match that setup during initdb and perform the upgrade seamlessly.

If you want to dive deeper:
PG18 Hacktober: 31 Days of New Features Unveiling pg_upgrade –swap option:
https://opensource-db.com/pg18-hacktober-31-days-of-new-features-unveiling-pg_upgrade-swap-option/

If you want to understand how PostgreSQL perserves the optimizer statistics , see commit.

The improvements to pg_upgrade in PostgreSQL 18 are all about making upgrades less disruptive:

Preserved planner statistics → faster recovery to optimal performance.
–no-statistics option → control and backward compatibility.
–no-data-checksums → easier upgrades from non-checksum clusters.

Upgrading no longer has to mean starting from scratch. With PostgreSQL 18, your database is ready to run at full speed the moment the upgrade completes.

Stay tuned for Day 3 of PG18Hacktober, where we’ll keep exploring what’s new in PostgreSQL 18!

What does the documentation say:

Why planner statistics matter

PostgreSQL 18: Faster upgrades, better performance

A new approach in the upgrade

Leave a Comment