pg_search: Modern Full-Text Search Inside PostgreSQL : Part 1

pg_search is a PostgreSQL extension developed by ParadeDB that enhances native PostgreSQL full-text search capabilities by introducing a high-performance BM25-based index access method.

Unlike traditional PostgreSQL full-text search (FTS), which relies on tsvector, tsquery, and GIN/GiST indexes, pg_search provides:

A dedicated BM25 index access method
The @@@ search operator
Built-in relevance scoring
Snippet generation for highlighted results
Support for fuzzy matching and typo tolerance
Optimized execution of search combined with structured filtering

The extension operates entirely within PostgreSQL. It does not require:

An external search engine
A distributed search cluster
A synchronization pipeline
Data duplication
Eventual consistency handling

Because it integrates directly with PostgreSQL’s extensibility framework and query planner, pg_search enhances search capability while preserving architectural simplicity.

Motivation and Need

Limitations of Native PostgreSQL Full-Text Search

PostgreSQL’s built-in full-text search is mature and powerful. However, in production-scale systems, certain limitations become apparent.

Ranking Model Constraints

Native FTS uses the ts_rank function for scoring. While effective, it is primarily frequency-based and lacks:

Advanced probabilistic relevance modeling
Strong document length normalization
Industry-standard BM25 scoring

As user expectations evolve toward search-engine-level relevance quality, these limitations impact result ordering.

Performance Considerations at Scale

When combining:

Full-text search
Structured filters (e.g., category, status, timestamps)
Sorting by relevance

Native FTS may require multiple index interactions and additional ranking computation, which can degrade performance on large datasets.

Modern Application Requirements

Contemporary systems require:

Intelligent relevance ordering
Fast query execution
Integrated search and filtering
Highlighted snippets
Real-time indexing

pg_search is designed to address these requirements within PostgreSQL itself.

Architectural Design

Custom BM25 Index Access Method

pg_search introduces a new index access method (USING bm25) that integrates with PostgreSQL’s planner.

The index:

Implements an inverted index structure
Maps terms to document identifiers efficiently
Supports indexing across multiple columns
Can function as a covering index for combined filtering and ranking
Allows ranking and filtering within a single optimized execution path

Because it is implemented as a native PostgreSQL extension, it participates directly in query planning and execution.

BM25 Ranking Algorithm

BM25 (Best Matching 25) is an industry-standard probabilistic ranking algorithm widely used in modern search systems, including Elasticsearch.

BM25 scoring considers:

Term Frequency (TF): Frequency of a term within a document
Inverse Document Frequency (IDF): Rarity of the term across the corpus
Document Length Normalization: Adjustment to avoid bias toward longer documents

This model produces more intuitive and context-aware ranking compared to simple frequency-based approaches.

Performance Characteristics

pg_search is designed to improve both ranking quality and execution efficiency.

Performance advantages include:

Efficient inverted index lookups
Reduced need for multiple index scans
Optimized ranking computation within the index
Single-pass execution for search and structured filtering
Scalability across large datasets

These characteristics make pg_search suitable for applications where search latency and result quality are critical.

Advantages

Improved Search Relevance: BM25 scoring provides a higher-quality ranking.
Simplified Architecture: All functionality operates within PostgreSQL.
No External Infrastructure: Eliminates the need for separate search clusters.
Real-Time Index Updates: Changes are searchable immediately after commit.
SQL-Based Integration: Developers use familiar SQL syntax.
Reduced Operational Overhead: Fewer systems to deploy, monitor, and maintain.
Scalable Design: Optimized for performance on large datasets.

Typical Use Cases

pg_search is particularly suitable for:

Product catalogs with relevance ranking
Blog and content platforms
Documentation systems
Knowledge bases
Support ticket systems
Internal enterprise search tools

It is especially effective when search must remain tightly coupled with transactional consistency.

Limitations and Considerations

pg_search may not be ideal when:

Distributed, multi-node search across independent clusters is required
Extremely large horizontal scaling beyond a single PostgreSQL instance is necessary
Advanced semantic search or NLP models are required
Complex distributed analytics workloads dominate system requirements

In such scenarios, distributed search engines such as Elasticsearch may be more appropriate.

Conclusion

pg_search represents a significant advancement in PostgreSQL-based search capabilities. By introducing a BM25-powered index access method and integrating ranking with efficient filtering, it bridges the gap between native PostgreSQL full-text search and external search engines.

For applications requiring strong relevance ranking, operational simplicity, and high performance within a single database system, pg_search provides a robust and architecturally clean solution.

In the part 2, we look at the implementation of the pg_search with different scenarios.

See this in action at PGConf India 2026 – pg_search: Bringing Elasticsearch-Grade Search to PostgreSQL presented by Mithun Chicklore Yogendra.