Imagine that you were running a promotion for your blog or website, and it caught the attention of Sarah Connor, who immediately tweeted a link to your website on her channel. How would you feel about it? Would you feel elated and excited that your website has gone viral and could potentially attract thousands or millions of visitors? Or would this thought make you nervous wondering if your application and database would be able to handle all the visitors and the traffic that is pouring like rain in the Amazon
Well, if you truly are in such a situation, it is probably a good thing, and I promise we will do our part to alleviate your anxiety.
When it comes to scalability, application architects resort to different tools and techniques, and one of the most rudimentary, yet under-utilized techniques is Connection pooling.
In this post, we discuss what Connection Pooling is, and then talk about how we could implement Connection Pooling in a Postgres ecosystem using pgBouncer.
Ready to go on the ride? Strap on …
What is Connection Pooling?
Typically, when a user interacts with an application, the application, in turn, requests a connection to the database that serves as a conduit for all the exchanges that happen between the application/user and the database. This connection would be maintained as long as the user remains active, and the application automatically disconnects from the database when the user logs out or his session gets timed out.
Now, opening a database connection and closing it each time we need to pull some data out of the database can be a costly affair, and people have come up with the idea of a connection pool as a way to offset some of the costs and gain some performance benefit while at it.
What exactly is a connection pool then?
Well, a connection pool is a set of always-on, shared database connections that are recycled for use across multiple users. You could think of it like a bunch of hotlines – when you need to communicate, you just need to pick up the phone, and you would be instantly connected to the other side.
Since these are typically always-on connections, the overheads associated with establishing a connection to the database, as well as releasing the connection after use, are greatly minimized.
What is pgBouncer?
pgBouncer is an open-source, lightweight, single-binary connection pooler for PostgreSQL. It can pool connections to one or more databases (on possibly different servers) and serve clients over TCP and Unix domain sockets. pgBouncer either creates a new database connection for a client or reuses an existing connection for the same user and database. When the client disconnects, pgBouncer returns the connection to the pool for re-use.
Why do we need pgBouncer?
Here are some reasons why you may need pgBouncer:
- Improve performance: pgBouncer can help improve database performance by reducing the overhead associated with establishing and closing connections.
- Handle connection limits: pgBouncer can handle connection limits, which can be useful in situations where the database server has a limited number of connections available.
- Manage connections: pgBouncer allows you to manage connections to the database, such as setting up idle timeout thresholds or limiting the number of connections per user.
- Security: pgBouncer can also enhance database security by providing authentication and authorization mechanisms that can help prevent unauthorized access to the database.
How does pgBouncer work?
pgBouncer shares connections in one of three pool modes:
- Session pooling – When a client connects, a connection is assigned to it as long as it remains connected. When the client disconnects, the connection is placed back into the pool.
- Transaction pooling – A connection is assigned to a client for the duration of a transaction. When pgBouncer notices the transaction is done, the connection is placed back into the pool. This mode can be used only with applications that do not use features that depend upon a session.
- Statement pooling – Statement pooling is like transaction pooling, but multi-statement transactions are not allowed. This mode is intended to enforce auto-commit mode on the client and is targeted for PL/Proxy on PostgreSQL.
How to Install pgBouncer?
- pgBouncer can be installed just like any other Linux packages using apt or yum depending on the Linux distribution.
- Here’s the command we used to install pgBouncer on an Ubuntu 22.04 system running Postgres 15.0:
sudo apt-get install -y pgbouncer
How to configure pgBouncer?
- Create a configuration file: Next, you need to create a configuration file for pgBouncer. The configuration file is usually named
pgbouncer.ini
(located in the /etc/pgbouncer folder) and contains various settings such as the database server hostname, port, database name, username, and password. - Configure pgBouncer settings: After creating the configuration file, you need to configure pgBouncer settings. This includes setting the maximum number of connections, timeouts, and other parameters that control how pgBouncer operates.
Here is an example configuration file for pgBouncer:
[databases]
template1 = host=127.0.0.1 port=5432 dbname=template1
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/user.txt
pool_mode = session
max_client_conn = 100
default_pool_size = 20
This configuration file specifies a single database (template1) with the hostname 127.0.0.1 and port 5432. It also sets the maximum number of client connections to 100 and the default pool size to 20. The auth_file setting points to a file containing the usernames and passwords allowed to connect to pgBouncer.
Reload pgBouncer:
We can use reload command to reload the configuration file without interrupting the active connections.
postgres@localhost:/etc/postgresql/15/main $ psql -p 6432 -U pgbouncer pgbouncer
pgbouncer=# reload;
Restart pgBouncer:
To stop pgBouncer:
pgbouncer -d /etc/pgbouncer/pgbouncer.ini -R -s
To start pgBouncer:
pgbouncer -d /etc/pgbouncer/pgbouncer.ini
When making changes that affect a large number of database users, it is often recommended to stop and start pgBouncer instead of restarting the Postgres database service:
Performance Benefits – Does pgBouncer really help?
As members of the tech community, we took it upon ourselves to validate if using a Connection Pooling application really helps to improve the database performance, and to what extent, and here is what we found:
(We ran a bare-bones pgBench test with 100 clients and 10000 transactions per client to simulate the load)
Connection Management without pgBouncer:
Given below output is the result of the without pgBouncer
postgres@ubuntu-vm:~$ pgbench -c 100 -j 2 -t 10000 osdb_pgb_test
pgbench (15.0 (Ubuntu 15.0-1.pgdg22.04+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 100
number of threads: 2
maximum number of tries: 1
number of transactions per client: 10000
number of transactions actually processed: 500000/500000
number of failed transactions: 0 (0.000%)
latency average = 96.766 ms
initial connection time = 526.886 ms
tps = 516.713001 (without initial connection time)
Connection Management with pgBouncer:
Given output is the result of the with pgBouncer
postgres@ubuntu-vm:~$ pgbench -c 100 -j 2 -t 10000 osdb_pgb_test
pgbench (15.0 (Ubuntu 15.0-1.pgdg22.04+1))
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 100
number of threads: 2
maximum number of tries: 1
number of transactions per client: 10000
number of transactions actually processed: 500000/500000
number of failed transactions: 0 (0.000%)
latency average = 60.036 ms
initial connection time = 203.134 ms
tps = 832.838500 (without initial connection time)
As you can see from the above test, the initial connection time and average latency are both reduced by about 50%, and the tps increased by more than 50% in the case of pgBouncer which would be enough to validate our claims.
Conclusion:
pgBouncer is a lightweight open-source connection pooling tool for PostgreSQL that can improve the database performance and reduce the server load by optimizing database connections. It provides different pooling methods for different requirements and provides the users the flexibility to configure it as per their business needs.
Leave a Reply