“Readers don’t block readers, writers don’t block writers”
Is a common principle in database management that emphasizes the importance of concurrency and scalability. The principle states that readers should be able to read data simultaneously without blocking each other, and writers should be able to write to the database without blocking each other.
PostgreSQL’s Multi-version Concurrency Control (MVCC) is a powerful feature that allows multiple transactions to access and modify data concurrently without interfering with each other. MVCC provides a consistent and isolated view of the database to each transaction, which allows for high concurrency and scalability. However, as transactions modify and delete data, they can leave behind obsolete or dead rows that occupy space in the database and may cause performance degradation over time. This is where PostgreSQL Autovacuum comes in – it is a background process that automatically manages the database’s storage and removes obsolete or dead data, ensuring that the database remains healthy and performant. In this blog, we will explore the inner workings of Autovacuum and how it can be tuned to optimize performance and prevent issues in PostgreSQL databases.
The rows which are marked as deleted are called dead rows. They serve no real purpose, but continue to occupy the disk space and could eventually lead to query slowness as the queries need to scan even the dead tuples.
What is vacuum ?
PostgreSQL implemented vacuuming to clean these dead rows and reclaim the space for reuse by new rows. There is a background process called ‘autovacuum’ which automatically does the vacuuming at regular intervals based on the pre-defined configuration parameters .
What is the problem ?
We have a table in our database which gets high DML transactions, and hence generates a large number of dead tuples.
To make the autovacuum process more aggressive, I have modified the table level autovacuum parameters
autovacuum_vacuum_threshold = 50000
autovacuum_vacuum_scale_factor = 0
So, as soon as the count of the dead tuples reaches 50000, the autovacuum process should kick in and clean the rows. I thought it was all set, and that is when I noticed something peculiar. With the above settings, I was hoping I would never see in excess of 50000 dead tuples at any point of time; but contrary to my belief, the reality was completely different – the table had accumulated more than 300K dead tuples. The autovacuum process was not cleaning the dead tuples, resulting in a database bloat, as well as causing a gradual increase in the query execution times.
What is causing the problem ?
I set out to do a deep-dive analysis of the issue to understand why autovacuum was failing to clean up the dead rows. This is what I found in the log file:
2022-10-17 03:49:41.355 CDT [2469] LOG: 00000: automatic vacuum of
table "postgres.public.vacuum_test": index scans: 0
pages: 0 removed, 123457 remain, 0 skipped due to pins, 0 skipped frozen
tuples: 0 removed, 9985983 remain, 8000 are dead but not yet
removable, oldest xmin: 794
index scan bypassed: 26 pages from table (0.02% of total) have 2000
dead item identifiers
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 311 hits, 0 misses, 0 dirtied
WAL usage: 1 records, 0 full page images, 281 bytes
system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.02 s
2022-10-17 03:49:41.355 CDT [2469] LOCATION: heap_vacuum_rel, vacuumlazy.c:864
As you can see there are 8000 dead rows but they are not getting cleaned.
To get further insight into the issue, I ran the following query on pg_stat_activity with the backend_xmin (i.e., oldest xmin) to pin-point why the autovacuum process was not succeeding in removing the dead tuples.
postgres=# select * from pg_stat_activity where backend_xmin = 794;
-[ RECORD 1 ]----+------------------------------------
datid | 14483
datname | postgres
pid | 2027
leader_pid |
usesysid | 10
usename | postgres
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2022-10-17 03:47:10.307107-05
xact_start | 2022-10-17 03:47:12.279856-05
query_start | 2022-10-17 03:47:30.34069-05
state_change | 2022-10-17 03:47:30.358111-05
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid |
backend_xmin | 794
query_id |
query | explain analyze select * from test;
backend_type | client backend
This is when I noticed that one of the queries in a transaction was idle as reflected in the state ‘idle in transaction’, and this was preventing the cleanup of the dead tuples.
For the uninitiated, here’s what happens when a transaction is started:
- Postgres creates a transaction id for the transaction.
- It creates a snapshot of all data involved in the transaction.
- For each of the tuples, Postgres adds the transaction id to the list of transactions that involve the tuple, and still being processed (i.e., open transactions)
This is what happens during an autovacuum process:
- For each row that is marked for deletion (i.e., dead tuples), Postgres fetches the list of active (i.e., open) transaction ids involving the selected row, then selects the minimum id among those and sets it as the xmin id.
- The autovacuum process adds the location of each of the rows that is not part of any other active transaction to the free space map so that it can be reused by other transactions.
If the row is part of any other active transaction (i.e., it is still being referred by another transaction), it is not added to the free space map, and hence cannot be overwritten by a new record.
As you can see from the above, if there is a long-running transaction on a heavily updated table, the dead tuples can quickly add up, causing the database to bloat, and ultimately result in performance deterioration.
What is the solution ?
The solution is to make sure transactions are gracefully closed, or force close the transactions which remain in pending state for a long period of time by updating the value of the parameter “idle_in_transaction_session_timeout”. This would also automatically kill the preexisting sessions that remain idle in transaction for a time longer than the interval defined. This needs to be set cautiously according to the maximum transaction time in your environment.
Here’s what we did on this database:
postgres=# SET idle_in_transaction_session_timeout = '300s';
It’s important to note that this setting does not terminate the transaction itself, but rather the session that is associated with the transaction. If a session is terminated due to an idle in transaction session timeout, any work that was performed as part of the transaction will be rolled back.
Conclusion
In conclusion, Autovacuum is a critical tool for maintaining the health and performance of PostgreSQL databases. By “automagically” managing the database statistics and removing obsolete or dead data, Autovacuum helps to performance optimization by providing Optimizer, the very important part – Statistics.
While Autovacuum is generally effective in keeping databases running smoothly, it can also introduce overhead and cause issues in certain situations. Careful tuning and monitoring of Autovacuum parameters is therefore recommended to ensure optimal performance.
Overall, Autovacuum is an essential component of PostgreSQL’s powerful and flexible database management system. When used correctly, it can help to ensure that PostgreSQL databases are reliable, performant, and scalable.
Request a Database System Architecture Health Check Quote
We urge you to reach us,right away if you’ve noticed that the performance of your database has been slowly getting worse or if you’ve been trying to fix some annoying problems! We can quickly bring it back up to speed on any platform.
For Business Enquiries Contact
Raviteja Reddy K
Customer Success Manager – OpenSource DB
Hyderabad, India | +91-8886 818 173 | https://opensource-db.com/
https://www.linkedin.com/in/ravi-tej-reddy/
GET IN TOUCH