This is the third installment of 9-part Blog series talking about some of the PostgreSQL 14 New features.
In the first part and second parts, we discussed a few of the important System-level & Data handling performance enhancements introduced in the PostgreSQL 14 major version. We’d like to conclude the Performance enhancements in PostgreSQL 14 major version with this part.
The topic for today is Workload Performance enhancements released with PostgreSQL 14 that we admired and tested out with great results.
Parallelization of ForeignScan
To lay the groundwork, let’s look at the following architecture:
To set the context, we have a bunch of API servers sending in Writes & Reads via the Connection concentration layer – PgBouncer. The PostgreSQL cluster hosting all the application data.
A few important points we would like to highlight:
- We can setup snapshots of the primary DB server into new instances
- Setup a FDW connection with the primary DB and the new instances
- Our idea is that the primary instance should only have data for the most recent period (as identified to be 6 months) and the other shards to have older data.
- A one-time deletion script to remove the older monthly logical partitions on the primary instance – Set up a foreign table (postgres_fdw) link with the smaller shards.
- Another one-time deletion script that deleted partitions for the current year from the smaller shards
- Dropping partitions in Postgres is fast, so, the maintenance window should also be small.
While this architecture has been complimenting the heterogeneous database platform set up, we always felt limited with the postgres_fdw approach – PG 13 doesn’t have Parallelization of queries across the node.
Now, Foreign data wrappers used to work with federated workloads across PostgreSQL and other databases can now leverage query parallelism in PostgreSQL 14. Having this feature in PG-14 is a major step forward in the direction of PostgreSQL Sharding using FDWs. It opens up the whole arena of working with the OLAP/Analytical queries as the target for multiple sharded clusters.
The important point is the async_capable ‘true’ option. This enables parallel scans on foreign tables across all foreign tables created for that server. This option controls whether postgres_fdw allows foreign tables to be scanned concurrently for asynchronous execution. It can be specified for a foreign table or a foreign server. A table-level option overrides a server-level option. The default is false.
An example for creating an FDW server with the new option:
CREATE SERVER postgres_fdw_server01 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'xx.xx.xx.xx', dbname 'postgres', async_capable 'true'); -- this is the new option
The commit message is a good read to understand more about the code/internals.
PostgreSQL 14 offers the ability to pipeline queries to a database, which can greatly increase performance across high-latency connections or for workloads with a high number of tiny write operations (INSERT/UPDATE/DELETE). Due to the fact that pipeline mode is a client-side feature, you can use it with any current PostgreSQL database using either the version 14 client or a client driver created using version 14 of libpq.
In PostgreSQL, libpq is a set of library functions that allow client programs to pass queries to the postgres backend server and to receive the results of these queries.
Taking advantage of the pipeline mode, a client will wait less for the server, since multiple queries/results can be sent/received in a single network transaction. Now, in the world of ORM, this is a huge improvement as application workloads need to be split into multiple transactions.
Object-Relational Mapping – ORM is a technique that lets you query and manipulate data from a database using an object-oriented paradigm. When talking about ORM, most people are referring to a library that implements the Object-Relational Mapping technique.
libpq pipeline mode allows applications to send a query without having to read the result of the previously sent query. Taking advantage of the pipeline mode, a client will wait less for the server, since multiple queries/results can be sent/received in a single network transaction. This is actually a huge improvement as workloads tend to be broken up into a lot of little things being sent to the database.
This wraps up the discussion about our personal favorite PostgreSQL 14 Performance improvements. In our next part tomorrow, we’ll discuss the new configuration parameters introduced as part of PostgreSQL 14.