Exploring the depths of PostgreSQL's Memory: A Practical Guide to pg_buffercache

Memory management is crucial to the performance of PostgreSQL databases. How PostgreSQL allocates and utilizes memory directly impacts query speed, throughput, and overall system stability. While the PostgreSQL configuration file postgresql.conf provides parameters to control memory settings, understanding what’s actually happening with that allocated memory is a different challenge entirely.

PostgreSQL’s performance depends heavily on how effectively it uses memory, particularly the shared buffer cache. PostgreSQL excels at managing memory internally, but its operations largely occur behind the scenes. You configure the settings, but you don’t automatically see what’s actually residing in memory at any given moment. This is where pg_buffercache becomes an invaluable tool.

pg_buffercache is a contrib extension that provides direct visibility into PostgreSQL’s shared buffer cache, the primary memory area where PostgreSQL caches recently accessed table and index pages. It doesn’t modify how memory works or change PostgreSQL’s caching behavior; instead, it simply reveals how memory is currently being utilized, transforming a system into something you can observe, measure, and optimize.

Why Shared Buffers Matter:

The shared buffer cache exists to minimize disk I/O. Disk access is orders of magnitude slower than memory access. PostgreSQL tries to keep frequently accessed data blocks in memory so that future queries can read them directly from RAM instead of hitting disk.

The size of this cache is controlled by the shared_buffers parameter, but without pg_buffercache, you can’t easily answer questions like:

Which tables are actually using my memory?
Are indexes dominating my cache?
How much of my cache is dirty?
Is my cache mostly hot or mostly idle?

Visibility Into Shared Buffers

Shared buffer cache exists to minimize disk I/O by keeping frequently accessed data in memory, where it can be retrieved orders of magnitude faster than reading from disk. Without pg_buffercache, this cache operates as essentially a black box. You can configure its size through the shared_buffers parameter, but you have limited insight into what’s actually stored inside or how effectively that space is being used.

pg_buffercache changes this by exposing critical details about cache contents:

Which tables and indexes are occupying memory: See exactly which database objects have pages cached, and understand your cache composition.
How many pages each object is using: Quantify the cache footprint of individual tables and indexes to identify major memory consumers.
Whether pages are dirty: Identify modified pages that haven’t yet been written to disk, which helps you understand write patterns and checkpoint behavior.
Usage density patterns: Observe which portions of your cache are actively used versus sitting idle.

How do we use pg_buffercache?

Step 1: Create the pg_buffercache Extension

To enable buffer cache visibility in PostgreSQL, first create the pg_buffercache extension. This extension allows us to inspect the contents of PostgreSQL’s shared buffer cache.

postgres=# CREATE EXTENSION pg_buffercache;
CREATE EXTENSION

The pg_buffercache extension has been created successfully.

Step 2: View Real-Time Buffer Cache Data

Once the extension is enabled, PostgreSQL exposes real-time information about the data currently stored in shared memory. Using the pg_buffercache view, we can:

See which tables and indexes are cached in memory.
Analyze how shared buffers are being utilized across different database objects.
Understand real-time memory usage patterns and identify potential optimization opportunities.

Understanding the pg_buffercache Schema:

postgres=# \d pg_buffercache 
                 View "public.pg_buffercache"
      Column      |   Type   | Collation | Nullable | Default 
------------------+----------+-----------+----------+---------
 bufferid         | integer  |           |          | 
 relfilenode      | oid      |           |          | 
 reltablespace    | oid      |           |          | 
 reldatabase      | oid      |           |          | 
 relforknumber    | smallint |           |          | 
 relblocknumber   | bigint   |           |          | 
 isdirty          | boolean  |           |          | 
 usagecount       | smallint |           |          | 
 pinning_backends | integer  |           |          |

Summary:

bufferid: Unique identifier for each buffer slot (1 to shared_buffers count).
relfilenode: Internal file identifier for the relation (table/index).
reldatabase: Database OID (0 means shared across all databases).
isdirty: Whether the page has been modified but not yet written to disk.
usagecount: How frequently this buffer is accessed (higher = more important to keep cached).
pinning_backends: Number of backends currently using this buffer.

Examples: Viewing Buffer Cache Contents

SELECT * FROM pg_buffercache;
 bufferid | relfilenode | reltablespace | reldatabase | relforknumber | relblocknumber | isdirty | usagecount | pinning_backends 
----------+-------------+---------------+-------------+---------------+----------------+---------+------------+------------------
        1 |        1262 |          1664 |           0 |             0 |              0 | f       |          5 |                0
        2 |        1260 |          1664 |           0 |             0 |              0 | f       |          5 |                0
        3 |        1259 |          1663 |           1 |             0 |              0 | f       |          5 |                0
        4 |        1259 |          1663 |           1 |             0 |              1 | f       |          5 |                0

This output shows which relations currently reside in shared buffers and how actively they are being used.

Step 3: Analyze Buffer Cache Usage

PostgreSQL versions 14 and above provide a summarized view of buffer cache usage:

postgres=# SELECT * FROM pg_buffercache_summary();
 buffers_used | buffers_unused | buffers_dirty | buffers_pinned |  usagecount_avg   
--------------+----------------+---------------+----------------+-------------------
         2976 |          13408 |          1464 |              0 | 1.971774193548387
(1 row)

Summary:

buffers_used: Number of buffers currently holding valid data.
buffers_unused: Free buffer slots that have never been used or were released.
buffers_dirty: Modified pages waiting to be flushed to disk.
buffers_pinned: Buffers currently locked by active queries.
usagecount_avg: Average access frequency across cached pages.

PostgreSQL Buffer Cache Internals (Important Insights):

Understanding how PostgreSQL manages its buffer cache internally helps you interpret pg_buffercache data more effectively.

Structure and Management:

Hash Table & Array: The cache is managed as a set of buffers, with a hash table used for rapid lookups to determine if a required page is already in memory.
Pin Count: When a process accesses a page, it “pins” the buffer, incrementing the pin count. While a buffer is pinned, it cannot be evicted, ensuring data consistency for concurrent operations.
Dirty Buffers: When data is modified in memory but not yet written to disk, the buffer is marked as “dirty.” The Background Writer process periodically flushes these dirty pages to disk, preventing the cache from becoming bloated with modified data.

The Clock-Sweep Eviction Algorithm:

When the cache is full, PostgreSQL must choose a victim buffer to evict to make room for new data. It uses a Clock-Sweep algorithm:

Usage Count: Each buffer has a usage count (0-5). A higher count means the page is popular and should stay in memory.
The Sweep: The algorithm moves a pointer circularly through the buffers.
Decrementation: If the algorithm passes a buffer, it decreases its usage count.
Eviction: A page is evicted only if its usage count is 0 and it is not currently pinned.
New Pages: When a new page is loaded, it is given a low usage count (often 1 or 2), allowing it to prove its worth before being protected from eviction.

Double Buffering (PostgreSQL + OS):

PostgreSQL does not typically use “Direct I/O.” It relies on standard file system calls, meaning data is often cached twice: once in shared_buffers and again in the operating system’s page cache. While this might seem inefficient, it provides a safety net if data is evicted from the PostgreSQL cache but remains in the OS cache.

Important Performance Insights:

Shared vs. Local Buffers: The shared cache is for shared table and index data. Temporary tables are exceptions; they use a separate “local buffer” in the backend’s private memory.
Large Scans: Large sequential scans can “pollute” the cache, forcing out frequently used data.
PostgreSQL alleviates this by using small, 32-page “buffer rings” for large sequential scans, preventing them from washing out the entire cache.
Optimal Sizing: A common, effective starting point is setting shared_buffers to 25% of total system RAM, though this should be adjusted based on your specific workload.
EXPLAIN (ANALYZE, BUFFERS): To see the buffer cache in action during query execution, use EXPLAIN (ANALYZE, BUFFERS).

Observing Cache Behavior with EXPLAIN

Use EXPLAIN (ANALYZE, BUFFERS) to see buffer usage during query execution:

postgres=# explain (analyze, buffers) select * from customer ;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
 Seq Scan on customer  (cost=0.00..1.34 rows=34 width=29) (actual time=2.346..3.827 rows=34.00 loops=1)
   Buffers: shared hit=1
 Planning:
   Buffers: shared hit=17
 Planning Time: 7.138 ms
 Execution Time: 9.330 ms
(6 rows)

This reveals:

Shared Hit: Number of pages found in the cache
Shared Read: Number of pages that had to be read from disk/OS.

Conclusion :

pg_buffercache provides valuable visibility into PostgreSQL’s shared buffer cache, turning memory usage from a black box into something you can observe and analyze. It helps DBAs understand which tables and indexes occupy memory, how efficiently the cache is used, and where write or I/O pressure originates.

While it does not replace other performance tools, pg_buffercache complements them by revealing what PostgreSQL is actually keeping in memory. When combined with knowledge of PostgreSQL’s buffer internals, it enables informed memory tuning decisions and more predictable database performance.

Exploring the depths of PostgreSQL’s Memory: A Practical Guide to pg_buffercache