PostgreSQL MVCC - OpenSourceDB

Introduction:

PostgreSQL is The Most powerful open-source relational database system that uses a sophisticated concurrency control mechanism called Multi-Version Concurrency Control (MVCC). MVCC is fundamental for maintaining data consistency and enabling concurrent transactions without locking issues. This blog post will delve into the intricacies of PostgreSQL MVCC, provide practical examples, and outline best practices to follow.

What is MVCC?
MVCC, or Multi-Version Concurrency Control, is a database management technique that provides a consistent view of the database to each transaction by maintaining multiple versions of data. This allows concurrent transactions to read and write data without interfering with each other, thus improving performance and maintaining data integrity.

How MVCC Works in PostgreSQL
PostgreSQL implements MVCC by keeping multiple versions of a data row. Each row version is tagged with transaction IDs that indicate when the row was created and when it was deleted (if applicable). This mechanism allows PostgreSQL to provide a snapshot of the database at the start of each transaction, ensuring a consistent view.

Key Concepts
Transaction ID (TXID): A unique identifier assigned to each transaction.
Snapshot: A view of the database at a particular point in time.
Visibility Rules: Determine which version of a row is visible to a transaction based on TXIDs.

Step1: create table

creating the table phone with the columns id and balance:

mydb=# create table phone (id int, balance money);
CREATE TABLE

insert a row into the table:

mydb=# insert into phone values (1,100);
INSERT 0 1

insert a row into the table:

mydb=# insert into phone values (2,200);
INSERT 0 1

We then add two more rows and check the values:

mydb=# insert into phone values (3,300),(4,400);
INSERT 0 2

Verify the update:

mydb=# select * from phone ;
 id | balance 
----+---------
  1 | $100.00
  2 | $200.00
  3 | $300.00
  4 | $400.00
(4 rows)

Step 2: Hidden columns

The * does return all the columns but hidden:

mydb=# select ctid,xmin,Xmax, *from phone;
 ctid  | xmin | xmax | id | balance 
-------+------+------+----+---------
 (0,1) |  742 |    0 |  1 | $100.00
 (0,2) |  743 |    0 |  2 | $200.00
 (0,3) |  744 |    0 |  3 | $300.00
 (0,4) |  744 |    0 |  4 | $400.00
(4 rows)

ctid – a row identifier within the page.
xmin – an ID of the transaction created the row.
xmax – an ID of the transaction deleted the row

It’s necessary for its MVCC engine. Postgres can store several versions of a single row at a time and those hidden columns help to control the visibility of the versions for running transactions/queries

Step 3: Begin a Transaction

Start a transaction and run a few experiments:

Begin the transaction:

mydb=# begin;
BEGIN

Get the current value of the second row:

select ctid, xmin, xmax, * from account where id = 2;
ctid  | xmin | xmax | id | balance
-------+------+------+----+---------
(0,2) |  743 |    0 |  2 | $200.00
(1 row)

Update the balance:

mydb=*#  update phone set balance = 450 where id=2;
UPDATE 1

Read the row back:

mydb=*# select  ctid, xmin, xmax, * from phone where id=2;
 ctid  | xmin | xmax | id | balance 
-------+------+------+----+---------
 (0,5) |  745 |    0 |  2 | $450.00
(1 row)

mydb=*# select txid_current_if_assigned();
 txid_current_if_assigned 
--------------------------
                      745
(1 row)

See that the xmin is equal to the ID of the current started transaction:

Step 4: Connect a database

Open another session with Postgres and read the value of the record that is being modified:

connect to database

[postgres@localhost ~]$ psql -p 5432 -d mydb -U postgres
psql (16.3)

Read the row:

mydb=# select  ctid, xmin, xmax, * from phone where id=2;
 ctid  | xmin | xmax | id | balance 
-------+------+------+----+---------
 (0,2) |  743 |  745 |  2 | $200.00
(1 row)

This version has different ctid, xmin and xmax when compared to the version that was created by the first transaction.

Now, in the first session, commit the transaction:

mydb=*# commit;
COMMIT

In the second session, read the row one more time:

mydb=# select  ctid, xmin, xmax, * from phone where id=2;
 ctid  | xmin | xmax | id | balance 
-------+------+------+----+---------
 (0,5) |  745 |    0 |  2 | $450.00
(1 row)

That new version of the row was committed and is visible to all future queries/transactions.

Best Practices for Using MVCC in PostgreSQL

1. Use Appropriate Isolation Levels

PostgreSQL supports different isolation levels: Read Uncommitted, Read Committed, Repeatable Read, and Serializable. Choose the appropriate level based on your application’s consistency and performance requirements.

Read Committed: Suitable for most applications, providing a balance between performance and consistency.
Repeatable Read: Use when you need to ensure that reads within a transaction are consistent.
Serializable: Provides the highest level of isolation but can impact performance due to increased locking.

2. Keep Transactions Short

Long-running transactions can lead to increased bloat and impact database performance. Aim to keep transactions as short as possible to reduce the number of old row versions and improve MVCC efficiency.

3. Regularly Vacuum the Database

Vacuuming helps to reclaim storage occupied by dead row versions and maintain MVCC performance. Schedule regular VACUUM operations, especially on frequently updated tables.

4. Monitor and Manage Bloat

Bloat occurs when the database accumulates too many dead row versions. Use tools like pg_stat_user_tables and pgstattuple to monitor bloat and take corrective actions.

5. Optimize Queries and Indexes

Efficient queries and proper indexing can minimize the impact on MVCC performance. Use EXPLAIN to analyze query plans and create indexes to speed up data retrieval.

Conclusion

PostgreSQL’s MVCC is a robust mechanism that allows for efficient and concurrent data access while maintaining consistency. By understanding how MVCC works and following best practices, you can optimize your PostgreSQL database for better performance and reliability. Whether you’re handling basic transactions or managing complex concurrent operations, MVCC ensures your data remains consistent and accessible.