Introduction:
For developers, DevOps engineers, QA analysts, and product managers alike, the moment an application goes live in production is the culmination of weeks (or months) of development, testing, and planning. But ask around, and you’ll hear war stories: broken features, last-minute rollbacks, angry users, and hotfixes deployed at 2 AM.
So, what separates a smooth go-live from a disastrous one?
Pre-production environments:
The unsung hero of modern database deployment delivery, a well-configured pre-production environment mimics production as closely as possible, giving teams a final “dress rehearsal” before the main show.
In this series, we’re diving deep into:
What a good pre-prod setup actually looks like
Why most teams underutilize it
The essential testing types you must do before release
Checklists, tools, and common pitfalls
Whether you’re launching your first product or deploying at scale, this guide will give you practical, battle-tested insights to improve your go-live process.
Before moving into production, it is critical to validate your application and database stack in a pre-production (pre-prod) or UAT environment. I would like to highlight a recent incident, where one of our customers ran into some serious Issues when validating their application’s performance on the pre-prod environment.
We’ve seen that inserts were failing with the following error:
“was aborted: ERROR: could not read block 326 in file “base/36895/46694”: read only 0 of 8192 bytes. Call getNextException to see other errors in the batch”
By catching these errors in the pre-prod environment, we were able to minimize costly downtime and avoid the rush to implement quick fixes.
To get around these errors, I recommended a set of practices to ensure stability, scalability, and performance readiness.
By simulating production conditions in pre-prod, we can:
- Benchmark the servers using open-source load testing tools.
- Validate whether the system can handle high traffic and heavy workloads.
- Ensure scalability, maintainability, and reliability of both the infrastructure and applications.
- Tune PostgreSQL parameters, OS kernel settings, and authentication before go-live.
To address this issue, we performed steps such as:
- Stopping traffic to the production server,
- Running VACUUM FULL to reclaim space and remove bloat,
- Taking a full pg_basebackup for recovery testing,
- Validating OS kernel parameters, PostgreSQL logging parameters, memory, and connection configurations.
Why Pre-Prod Matters
A pre-production (pre-prod) environment is a near-identical copy of the production setup. It mirrors the:
- Architecture (same design and tiers)
- Hardware sizing (CPU, RAM, Storage)
- Configurations (OS, database, application stack)
This ensures that when changes are applied in production, the steps have already been tested and proven in pre-production. The benefits are:
Real-World Problem Statement:
In another scenario, we tested the production system with up to 500 users successfully. However, when customer demand increased and connections reached 900 concurrent users, the system could not handle the load. The database server experienced very high CPU usage, read queries spiked, and the application became unresponsive.
To stabilize the system, I suggested tuning the OS kernel configuration and database parameters. But this highlighted a major gap, the system had never been tested in pre-prod at this scale.
Based on this, I recommended building a pre-prod environment capable of simulating up to 3000 users, aligned with the customer’s future growth plans.
Another Issue – Bloat and Data Errors:
During backup and restore tests in pre-prod, I also noticed table bloat. When inserting new data, some operations failed with table ID errors. Upon investigation, we identified index corruption caused by bloat. Running a REINDEX on the affected tables resolved the issue, and subsequent inserts worked fine.
This reinforced why pre-prod testing is vital:
- It exposes hidden issues like bloat and index corruption before they affect production.
- It ensures that backup and restore processes are reliable.
- It validates that the system can scale safely to thousands of users without downtime.
Step-by-Step Process:
Step 1: Stop Incoming Traffic
- Stop all traffic to the database server.
- Verify there are no active connections from applications.
- Bring down the Load Balancer in production to block incoming sessions.
Step 2: Clean Up Database Bloat
- Run VACUUM FULL across all tables to reclaim space.
- Validate whether the database bloat has been cleared successfully.
Step 3: Backup and Restore
- After vacuuming, take a full backup of the server using pg_basebackup.
- Restore the backup on a standby server.
- Verify that all data files and schemas are intact.
Step 4: Perform Load Testing
- Use open-source load testing tools (such as JMeter, sysbench, or pgbench) to simulate production traffic.
- Validate the database server’s ability to handle peak load conditions.
Step 5: Verify OS and Kernel Configurations
- Ensure OS kernel parameters are correctly tuned.
- Validate network availability and open ports.
- Confirm database role configuration: Write DB (Primary) and Read DB (Replica).
Conclusion:
In this first part of the blog, we have covered the pre-production preparation process up to the OS and PostgreSQL-level validation. These steps establish the foundation for a stable production go-live.
In Part 2, we will continue with:
- Cleaning up load test data.
- Running VACUUM FULL again after testing.
- Database tier installation and configuration.
- Setting up DNS, SSL, schemas, parameters, and replication.
- Restoring production-like data for realistic load testing.
This structured approach ensures that by the time you reach production, your system is battle-tested, resilient, and optimized.
What’s your story?
Have you ever had a painful go-live experience… or a flawless one that you’re proud of? Maybe you’ve learned a trick or two the hard way?
Drop your story in the comments or tag someone who needs to read this. Let’s build a community of shared knowledge, and fewer production nightmares!
