PostgreSQL 14 New Features – DBA Lifesavers

We have covered a considerable amount of PostgreSQL 14 features in our previous posts, they are:

Well, today is going to be a fun ride – New DBA Features in PG14, we are calling them Lifesavers, as they’d save a lot of time and effort for most of my DBA friends working on some of the complex PostgreSQL Ecosystems. Following are some of the interesting features added to PostgreSQL 14 for DBAs.

  • Table partitioning improvements
  • Data Corruption/anamolies detection
  • Native Logical replication enhancements
  • Better Indexing strategies

Table partitioning improvements

DETACH PARTITION CONCURRENTLY

Now users can detach the partitions without blocking other sessions by using the CONCURRENTLY. So users are not blocked from running queries on the table while detach is in progress.

Below is the syntax for detaching the partition

ALTER TABLE table_name DETACH PARTITION partition_name [ FINALIZE | CONCURRENTLY ]

A few important points to note:

  • Detach partition runs in two transactions, it cannot be used in a transaction block
  • doesn’t work when a DEFAULT partition exists
  • In case the second transaction is canceled or a crash occurs, there’s ALTER TABLE .. DETACH PARTITION .. FINALIZE, which executes the final steps.

A new column inhdetachpending is added to the pg_inherits which represents DETACH state of the partition.

When the first transaction is executed (as the detach creates two transactions), it marks the catalog view pg_inherits.inhdetachpending column to true for that partition, and users who are running queries after that statement can’t see the data from that partition.

Use the following SQL to track the change”

SELECT
  parent.relname      AS parent,
  child.relname       AS child,
  inhdetachpending AS detach_status
FROM pg_inherits
  JOIN pg_class parent            ON pg_inherits.inhparent = parent.oid
  JOIN pg_class child             ON pg_inherits.inhrelid   = child.oid
  JOIN pg_namespace nmsp_parent   ON nmsp_parent.oid  = parent.relnamespace
  JOIN pg_namespace nmsp_child    ON nmsp_child.oid   = child.relnamespace
WHERE parent.relname='<table-name>';

Data Corruption/anomalies detection

A new contrib module pg_surgery can be used to forcibly overwrite the visibility information for corrupted tuples, potentially allowing it to be recovered or removed when that would otherwise fail. It also allows forcing a tuple to freeze.

postgres=# create table test(id int, name varchar);
CREATE TABLE
postgres=# 
postgres=# insert into test values (1,'test');
INSERT 0 1
postgres=# select xmin, ctid, * FROM test;
 xmin | ctid  | id | name 
------+-------+----+------
  748 | (0,1) |  1 | test
(1 row)


postgres=# select heap_force_freeze('test'::regclass, ARRAY['(0, 1)']::tid[]);
 heap_force_freeze 
-------------------
 
(1 row)

postgres=# select xmin, ctid, * FROM test;
 xmin | ctid  | id | name 
------+-------+----+------
    2 | (0,1) |  1 | test
(1 row)

postgres=# select heap_force_kill('test'::regclass, ARRAY['(0, 1)']::tid[]);
 heap_force_kill 
-----------------
 
(1 row)

postgres=# select xmin, ctid, * FROM test;
 xmin | ctid | id | name 
------+------+----+------
(0 rows)

NOTE: These functions are unsafe by design and using them may corrupt (or further corrupt) your database. For example, these functions can easily be used to make a table inconsistent with its own indexes, to cause UNIQUE or FOREIGN KEY constraint violations, or even to make tuples visible which, when read, will cause a database server crash. They should be used with great caution and only as a last resort.

Native Logical replication enhancements

PostgreSQL 14 introduced a lot of new features around Native logical replication to improve the performance and reduce replication delay. Below are some of the interesting improvements.

  1. Logical streaming replication
  2. Improved Initial table sync
  3. Logical decoding of two-phase commits
  4. Binary transfer mode

Logical streaming replication

Now the logical replication will stream in the in-progress transactions, this will reduce the lag for streaming large transactions. In older versions, the transactions were streamed only at commit time which leads to a large apply lag for large transactions. With this feature, apply lag will be reduced and improve performance.

To use this feature set the streaming ON when creating the subscription:

CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost dbname=postgres user=postgres password=******' PUBLICATION mypub WITH (streaming = on);

Improved Initial table sync

In logical replication, initial table synchronization involves copying the initial snapshot of the table by the table sync worker, and then the table is brought up to a synchronized state with the main apply worker. This whole work use to be done in a single transaction using a temporary replication slot which has major drawbacks:

(a) The slot will hold the WAL till the entire sync is complete.
(b) Any error during the sync phase will roll back the entire copy which is painful for large copies.
(c) There is a risk of exceeding the CID limit.

Below improvements are added in PG14

  1. Allow multiple transactions in tablesync phase.
  2. Used permanent slots and origins to track the progress of tablesync

Logical decoding of two-phase commits

This will allow us to decode the transactions at prepare time and send the same to the output plugin instead of doing it at commit time. This will allow the plugins to decipher the transaction at prepare time and route it to another node if required.

This has two advantages

(a) allows two-phase distributed transactions across multiple nodes via logical replication
(b) reduces the apply-lag by sending and replaying the transaction on another node at prepare time.

Binary transfer mode

This feature provides an option during Create/Alter Subscription to allow data from publishers to be sent in binary format. The default value of this option is false. Even when this option is enabled, only data types that have binary send and receive functions will be transferred in binary.

When doing cross-version replication, if the subscriber lacks a binary receive function for the type, the data transfer will fail, and this option can’t be used. This mode is generally faster.

CREATE SUBSCRIPTION mysub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION mypub1 WITH (binary = true);

Better Indexing strategies

Below are some of the interesting features added for indexes in PostgreSQL 14.

  • BRIN Indexes
  • GiST and Include feature for SP-GiST
  • REINDEX partition Tables and Indexes

BRIN Indexes

New operator classes are added to the BRIN indexes.

Bloom operator class

Bloom operator class {data type}_bloom_ops can accept below parameters

n_distinct_per_range: Defines the estimated number of distinct non-null values in the block range, used by BRIN
bloom indexes for sizing of the Bloom filter.

false_positive_rate: Defines the desired false positive rate used by BRIN bloom indexes for sizing of the Bloom filter.

Minmax-multi operator class

minmax operator class {data type}_minmax_ops can accept below parameter:

values_per_range: Defines the maximum number of values stored by BRIN minmax indexes to summarize a block range. Each value may represent either a point, or a boundary of an interval.

GiST and SP-Gist

GiST index by sorting: There’s a new index build method that is faster and produces smaller indexes, but so far it’s only supported for the point data type.

GiST access method: sortsupport – If it is defined, the GiST index is built by sorting all data to the order defined by the sortsupport’s comparator function, and packing the tuples in that order to GiST pages.

INCLUDE clause can be specified for SP-Gist indexes. The optional INCLUDE clause specifies a list of columns that will be included in the index as non-key columns. A non-key column cannot be used in an index scan search qualification, and it is disregarded for purposes of any uniqueness or exclusion constraint enforced by the index. However, an index-only scan can return the contents of non-key columns without having to visit the index’s table, since they are available directly from the index entry. Thus, the addition of non-key columns allows index-only scans to be used for queries that otherwise could not use them.

REINDEX partition Tables and Indexes

REINDEX rebuilds an index or table using the data stored, replacing the old copy of the table/index. With PostgreSQL 14, REINDEX works on partition tables and indexes. Also, you can mention the TABLESPACE clause with the REINDEX command.

Points to note:

  • Both the CONCURRENTLY and the non-concurrent cases are supported
  • When using TABLESPACE with a REINDEX command that targets a partitioned table or index, all the indexes of the leaf partitions are moved to the new tablespace. The tablespace references of the non-leaf, partitioned tables in pg_class.reltablespace are not changed
  • This requires an extra ALTER TABLE SET TABLESPACE.

Now that the DBA in me is happy digging into these new features, let’s continue tomorrow to look at some of the Developer centric features that were part of PG14. Stay tuned!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>