Managing PostgreSQL LOs: A Practical Perspective

In our previous blog, we explored how to handle Large Objects (LO) in PostgreSQL and potential challenges, including orphaned LOs. The importance of managing LOs efficiently became evident during an operational scenario that involved using tools like vacuumlo to clean up orphaned LOs and reclaim space.

Observations from a recent LO cleanup exercise:

Before starting the cleanup process, we ensured data safety by performing a full physical backup using pg_basebackup on the witness server. This backup was taken in TAR format, providing a recovery path in case anything went wrong during the upcoming operations.

Archiving considerations before cleanup:

Before initiating any heavy maintenance operations like vacuumlo or VACUUM FULL, we checked the archive settings to prevent unnecessary WAL file buildup during the process.

SHOW archive_command;
The output is cp %p /pgdb/pgarchive/%f

We then disabled WAL archiving temporarily to avoid unnecessary disk space usage during cleanup.

ALTER SYSTEM SET archive_command = '/bin/true';
SELECT pg_reload_conf();

`vacuumlo` execution time:

The vacuumlo utility was run to remove orphaned LOs which we already explored in the previous blog. With the archive command disabled and a backup safely stored, we proceeded to identify and remove orphaned LOs from the database.

time  vacuumlo -n -v testdb

This --dry-run reveals number of orphaned large objects to be removed from the database.

nohup time vacuumlo -v testdb &

This is something that we keep in mind as a vacuumlo can be time-consuming, especially on large databases make sure to have some downtime while working on this exercise.

Database size reduction after vacuum full:

Once the orphaned LOs are removed, we ran vacuum full operation to reclaim storage space from bloated tables using parallel jobs to speed up the process

nohup time vacuumdb -f -j 6 -v -d testdb &

This significantly reduced the database size from 460Gb to 8GB demonstrating the impact of thorough maintenance. This exercise resulted in better resource utilization and improved system performance. The overall data directory size decreased from 575GB to 301GB. To monitor which table was being processed during the operation.

SELECT pid, usename, datname, state, query 
FROM pg_stat_activity 
WHERE state = 'active';

Orphaned Large Objects cleanup:

After dropping a table that referenced LOs, we again ran vacuumlo to identify and clean orphaned LOs. The --dry-run confirmed that 9 large objects were ready for removal, and the cleanup operation completed successfully, showing the importance of periodic LO cleanup in preventing disk space from being consumed by unreferenced objects.

Conclusion:

This experience highlighted several important aspects of working with LOs:

Timely Cleanup: It’s crucial to run a utility like vacuumlo periodically to avoid accumulation of orphaned large objects.
Archiving considerations: Disabling the archive_command during the intensive maintenance tasks can prevent unnecessary WAL generation and reduce disk space usage.
Post-cleanup monitoring: After executing VACUUM FULL monitoring the systems performance and disk usage is essential to ensure that the cleanup operations are effective.

A big thank you to my colleague Shashidhar for sharing detailed insights that were instrumental in executing a smooth and effective cleanup process. His understanding of PostgreSQL Large Object (LO) management helped us navigate the intricacies involved and contributed significantly to creating a leaner, more efficient database environment—free from unnecessary bloat and potential performance issues.