Welcome to the “Indexing & Constraints” segment of PG18 Hacktober!
Over the next two days, we’re diving into powerful enhancements that boost the performance and flexibility of PostgreSQL’s indexing system. Today, we start with Parallelized CREATE INDEX for GIN indexes, which dramatically speeds up index creation on large datasets.
Tomorrow, we’ll explore another exciting feature: Skip scan on multicolumn B-tree indexes, a long-awaited optimization that enables PostgreSQL to use indexes more effectively when the leading column isn’t part of the filter condition.
Introduction
PostgreSQL’s GIN indexes (Inverted Indexes) have long been optimal for full-text search, JSONB queries, and array operations. But until now, building a GIN index, especially on large datasets was strictly single-threaded. That’s no longer the case.
With PostgreSQL 18, GIN joins the family of index types (like btree and hash) that support parallel index creation.
This seemingly small change can deliver good speedups for developers and DBAs working with large-scale search, JSON, or document stores inside Postgres.
Let’s say you’re indexing a documents table with millions of rows of searchable text:
CREATE INDEX idx_fts ON documents
USING gin(to_tsvector('english', content));
In PostgreSQL ≤ 17, the above would run in a single process, chewing through rows one by one, with random disk I/O as it builds the index.
In PostgreSQL 18, the engine can now spawn multiple workers to:
- Scan and sort tuples in parallel
- Insert into the GIN index in a cache-efficient, ordered manner
- Avoid redundant work and random buffer I/O
The result? Improvements in build times, and lower disk impact for large indexes.
How it works (under the hood)
The magic comes from a change in how the GIN build process is performed when parallel workers are enabled.
Pre-18 Behavior:
- Index is built incrementally.
- Tuples are read in table order.
- Each insertion is essentially random (from the index’s perspective).
- Pages are split frequently and inefficiently.
PostgreSQL 18 behavior:
- Tuples are sorted by their GIN keys before insertion.
- Insertion into the GIN structure is now mostly sequential:
- New keys are appended to the rightmost pages.
- This reduces random I/O and improves page packing.
This is conceptually similar to how btree indexes benefit from sorted builds, but tailored for GIN’s inverted structure.
When is parallelization used?
Use max_parallel_maintenance_workers setting
You can encourage parallelism using:
SET max_parallel_maintenance_workers = 4;
You can disable it explicitly via:
CREATE INDEX ... WITH (parallel_workers = 0);
Performance expectations
According to community benchmarks and source insights:
- Speedups of 2–4 times are common on large datasets with appropriate hardware.
- Performance scales roughly with the number of available CPU cores.
- Disk I/O is reduced significantly due to more sequential writes and fewer page splits.
Let’s test a table(with million records) with searchable text.
#With Parallelism enabled
postgres=# SET max_parallel_maintenance_workers = 4;
SET
postgres=# \timing
Timing is on.
postgres=# CREATE INDEX idx_fts_parallel ON documents
USING gin (to_tsvector('english', content));
CREATE INDEX
Time: 102569.562 ms (01:42.570)
#With Parallelism disabled
postgres=# ALTER TABLE documents SET (parallel_workers = 0);
ALTER TABLE
Time: 2.464 ms
postgres=# CREATE INDEX idx_fts_serial ON documents
USING gin (to_tsvector('english', content));
CREATE INDEX
Time: 185202.991 ms (03:05.203)
Testing on a table with searchable text demonstrates the new feature in PostgreSQL 18: parallelized GIN index creation. When building a GIN index with parallel workers enabled , completion time was approximately 1:42 mins. When parallelism was disabled at the table level , the same index build took about 3:05 mins. This shows a significant reduction in build time , parallel GIN index creation was nearly 45% faster when compared to the serial approach.
Important notes
Only applies to CREATE INDEX, not REINDEX or incremental builds (yet).
Parallelism may be disabled if:
- The table is too small.
- The planner cost estimate is too low.
- Your system is under heavy load or lacks available workers.
This feature does not change query-time behavior—only index creation time.
Coming up next
Tomorrow in PG18 Hacktober, we’ll talk about “Skip scan on multicolumn B-tree indexes” feature. Stay tuned!
