Efficient Management of PostgreSQL Large Objects

When working with relational databases like PostgreSQL, the norm is to manage structured data organized into rows and columns. However, in many real-world applications, we encounter the need to store and manage larger unstructured data—such as multimedia files, OIDs, and documents, that don’t fit neatly into conventional columns. PostgreSQL offers support for Large Objects, also known as LOs, to cater to such needs. This guide dives into how PostgreSQL handles LOs, the associated challenges, and how tools like vacuumlo can assist in maintaining database hygiene.

What are Large Objects in PostgreSQL?

Large Objects (LOs) in PostgreSQL are used to store massive binary data that exceeds standard column storage capacity. Examples include high-resolution images, PDFs, audio/video files, and other binary formats. Unlike BYTEA, which stores binary data inline within a table, LOs are stored separately and referenced by an Object Identifier (OID). This separation improves performance and scalability when dealing with sizable datasets.

PostgreSQL internally manages LOs using the system catalog tables pg_largeobject and pg_largeobject_metadata. Each LO is broken into smaller chunks to optimize storage. These chunks are stored as rows in pg_largeobject, while metadata such as access permissions and comments reside in pg_largeobject_metadata. This chunking mechanism means that writing to an LO at an offset (say 1,000,000) doesn’t require allocating all preceding space, it simply stores what’s necessary and returns zeroes for unwritten portions.

PostgreSQL provides a read/write API for creating, modifying, and deleting large objects, closely resembling traditional file operations. Large objects can be added in multiple ways, including programmatically creating them or importing files directly into the database.

Here is the way I have added the Large Objects to the table.

Before going for inserting LO into the table, we need to enable lo extension to your database.
CREATE EXTENSION lo;

In this table I’m creating an OID column which stores the LO data object. I have some .txt files in my /tmp directory which is an accessible location for Postgres. Inserted those files into the table.

test_db=# CREATE TABLE image_data (id SERIAL PRIMARY KEY,  image_oid OID);

INSERT INTO image_data (image_oid) VALUES (lo_import('/tmp/sample.txt'));

test_db=# select * from image_data;
 id | image_oid 
----+-----------
  4 |     55547
  5 |     55554
  6 |     55555
  7 |     55556
  8 |     55557
  9 |     55563
 10 |     55564
(7 rows

However, PostgreSQL does not automatically delete the large object data when a referencing row is deleted from a user table. This can lead to orphaned large objects, data that still exists in the system but is no longer being used.

The Problem of Orphaned Large Objects
One major caveat of using LOs is that PostgreSQL does not automatically delete them when their referencing row is removed. If you delete a row or drop a table containing an LO reference, the large object itself remains in the pg_largeobject table, unlinked and unused. Over time, these orphaned LOs can consume substantial disk space.

Manual LO Cleanup Challenges
Cleaning up orphaned LOs requires deliberate administrative actions. Because PostgreSQL’s built-in cleanup routines (e.g., VACUUM) do not cover LOs, orphaned LOs can quietly build up over time, especially in systems with frequent updates or deletions. This results in inefficient disk usage and potential performance degradation.

Identifying Orphaned Large Objects
You can identify orphaned LOs by comparing the list of all LOs in pg_largeobject against those referenced in user-defined tables.

vacuumlo is a PostgreSQL client utility that scans for large objects in the system and deletes the ones that are no longer referenced in any table. It helps in cleaning up orphaned LOs and reclaiming space, much like VACUUM helps remove dead tuples in regular tables.For example:
I have some LO inserted into the table after sometime if i go ahead and DELETE rows which consists LO in the table. Only the oid which represents the LO are deleted but not the actual files located in other locations. Those files which are now after dropping the table the files will become orphaned files and which should be deleted manually.

test_db=# DROP TABLE image_data ;
DROP TABLE

postgres@pgtest:~$ vacuumlo -n -v test_db
Password: 
Connected to database "test_db"
Test run: no large objects will be removed!
Checking file_oid in public.my_docs
Would remove 9 large objects from database "test_db".

Here I’m listing all the dead tuples of LO with the command. Even after dropping the table with LargeObjects. Now we will go ahead and vacuum all the large objects to clean the unwanted space occupied by the tuples.

postgres@pgtest:~$ vacuumlo -v test_db Password: Connected to database "test_db" Checking file_oid in public.my_docs Successfully removed 9 large objects from database "test_db".

In PostgreSQL, another way to store large binary data is the bytea datatype, which stores the binary directly into tables and relies on TOAST for compression.

PostgreSQL’s Large Objects (LOs) are a powerful way to handle large binary data, but they require careful management. Orphaned LOs can accumulate over time, and tools like vacuumlo help clean up unused data. In this post, we covered how to insert, manage, and clean LOs efficiently.

In the next blog, we’ll explore LOs vs. bytea, and how PostgreSQL’s TOAST mechanism fits into the picture.

What are Large Objects in PostgreSQL?

Leave a Comment