This is the second installment of 9-part Blog series talking about some of the PostgreSQL 14 New features.
In the first part, we discussed a few of the important System-level performance enhancements introduced in the PostgreSQL 14 major version. Mind you, the features discussed in the blog series are our most favourable ones and are being used in our daily activities. It goes without saying that PostgreSQL 14 came out with more than 200+ small, medium, large performance enhancements.
In this part, let’s delve into some of the interesting features around Data handling and modification performance enhancements.
We are a big fan of FOREIGN DATA WRAPPERs!
So, it goes without saying, the first Data handling performance enhancement we are excited about is:
postgres_fdw: Optimization of Bulk INSERT
postgres_fdw can now also bulk insert data on foreign tables and import table partitions with the IMPORT FOREIGN SCHEMA directive.
What is a Foreign Data Wrapper?
PostgreSQL has a useful feature called Foreign Data Wrapper – FDWs, it is an extension that allows you to access a table or schema in one database from another. FDWs are part of SQL Standard called SQL/MED (“SQL Management of External Data”) When you make a query against a foreign table, the Foreign Data Wrapper will query the external data source and return the results as if they were coming from a table in your database.
FDWs serve a variety of use-cases:
- Customized Data Segregation: Data may be stored across databases (both PostgreSQL and few other supported RDBMs) and still have the visibility to Read/Write data onto the FDWs.
- Data Security using ACL: It allows DBAs to come up with a customized Access control around the seggregated data.
- Heterogenous Data sources: Ability to perform aggregations from multiple data sources.
To know more about the FDWs and the various Generic SQL Database and Specific Database Wrappers (like Oracle, MySQL) spend some time on Postgres Wiki for FDWs.
Now, let’s look at the Bulk INSERT performance optimization. This feature of postgres_fdw is added to the PostgreSQL Core.
The key to this feature is using the remote option “batch_size“
batch_size specifies the number of rows postgres_fdw should insert in each insert operation. It can be specified for a foreign table or a foreign server. The option specified on a table overrides an option specified for the server.
An example to create a FOREIGN TABLE or FOREIGN SERVER:
--Foreign server CREATE SERVER postgres_fdw_server01 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'xx.xx.xx.xx', batch_size = '50'); -- this is the new option --Foreign table CREATE FOREIGN TABLE my_remote_table ( col1 integer, col2 char, col3 text, ) SERVER postgres_fdw_server01 OPTIONS(table_name 'my_local_table', batch_size '50');
FDW Support for TRUNCATE
We now have a new option with FDW API for TRUNCATE. It extends TRUNCATE command so that it accepts foreign tables as the target to truncate and invokes that API. So, when using postgres_fdw, we can issue TRUNCATE command to foreign servers/tables.
A quick example using the Foreign table created in the above code:
Handling UPDATEs and DELETEs
Before delving into the actual performance enhancement around the planner changes to UPDATEs and DELETEs, Let’s look at an important point from the PostgreSQL documentation:
SELECT FOR UPDATE, and
SELECT FOR SHAREcommands behave the same as
SELECTin terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the
WHEREclause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of
SELECT FOR UPDATEand
SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.
Always remember that the UPDATE does not update the tuples in place, it INSERTs a new version of the tuple and leaves a dead tuple for VACUUM to later removal.
Now, for the planner changes to fasten UPDATE and DELETE:
- For UPDATE, the subplan now only delivers the new values of the changed columns (i.e., the expressions computed in the query’s SET clause) plus row identity information such as CTID.
- For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT’s plan, then update the table on top of that.
Take a look at the commit message of Rework planning and execution of UPDATE and DELETE for more code-level changes.
In our next part tomorrow, we’ll discuss the Performance enhancements and features specifically optimizing the worldloads.