pg_search: Modern Full-Text Search Inside PostgreSQL : Part 1

pg_search is a PostgreSQL extension developed by ParadeDB that enhances native PostgreSQL full-text search capabilities by introducing a high-performance BM25-based index access method.

Unlike traditional PostgreSQL full-text search (FTS), which relies on tsvector, tsquery, and GIN/GiST indexes, pg_search provides:

  • A dedicated BM25 index access method
  • The @@@ search operator
  • Built-in relevance scoring
  • Snippet generation for highlighted results
  • Support for fuzzy matching and typo tolerance
  • Optimized execution of search combined with structured filtering

The extension operates entirely within PostgreSQL. It does not require:

  • An external search engine
  • A distributed search cluster
  • A synchronization pipeline
  • Data duplication
  • Eventual consistency handling

Because it integrates directly with PostgreSQL’s extensibility framework and query planner, pg_search enhances search capability while preserving architectural simplicity.

Motivation and Need

Limitations of Native PostgreSQL Full-Text Search

PostgreSQL’s built-in full-text search is mature and powerful. However, in production-scale systems, certain limitations become apparent.

Ranking Model Constraints

Native FTS uses the ts_rank function for scoring. While effective, it is primarily frequency-based and lacks:

  • Advanced probabilistic relevance modeling
  • Strong document length normalization
  • Industry-standard BM25 scoring

As user expectations evolve toward search-engine-level relevance quality, these limitations impact result ordering.

Performance Considerations at Scale

When combining:

  • Full-text search
  • Structured filters (e.g., category, status, timestamps)
  • Sorting by relevance

Native FTS may require multiple index interactions and additional ranking computation, which can degrade performance on large datasets.

Modern Application Requirements

Contemporary systems require:

  • Intelligent relevance ordering
  • Fast query execution
  • Integrated search and filtering
  • Highlighted snippets
  • Real-time indexing

pg_search is designed to address these requirements within PostgreSQL itself.

Architectural Design

Custom BM25 Index Access Method

pg_search introduces a new index access method (USING bm25) that integrates with PostgreSQL’s planner.

The index:

  • Implements an inverted index structure
  • Maps terms to document identifiers efficiently
  • Supports indexing across multiple columns
  • Can function as a covering index for combined filtering and ranking
  • Allows ranking and filtering within a single optimized execution path

Because it is implemented as a native PostgreSQL extension, it participates directly in query planning and execution.

BM25 Ranking Algorithm

BM25 (Best Matching 25) is an industry-standard probabilistic ranking algorithm widely used in modern search systems, including Elasticsearch.

BM25 scoring considers:

  • Term Frequency (TF): Frequency of a term within a document
  • Inverse Document Frequency (IDF): Rarity of the term across the corpus
  • Document Length Normalization: Adjustment to avoid bias toward longer documents

This model produces more intuitive and context-aware ranking compared to simple frequency-based approaches.

Performance Characteristics

pg_search is designed to improve both ranking quality and execution efficiency.

Performance advantages include:

  • Efficient inverted index lookups
  • Reduced need for multiple index scans
  • Optimized ranking computation within the index
  • Single-pass execution for search and structured filtering
  • Scalability across large datasets

These characteristics make pg_search suitable for applications where search latency and result quality are critical.

Advantages

  • Improved Search Relevance: BM25 scoring provides a higher-quality ranking.
  • Simplified Architecture: All functionality operates within PostgreSQL.
  • No External Infrastructure: Eliminates the need for separate search clusters.
  • Real-Time Index Updates: Changes are searchable immediately after commit.
  • SQL-Based Integration: Developers use familiar SQL syntax.
  • Reduced Operational Overhead: Fewer systems to deploy, monitor, and maintain.
  • Scalable Design: Optimized for performance on large datasets.

Typical Use Cases

pg_search is particularly suitable for:

  • Product catalogs with relevance ranking
  • Blog and content platforms
  • Documentation systems
  • Knowledge bases
  • Support ticket systems
  • Internal enterprise search tools

It is especially effective when search must remain tightly coupled with transactional consistency.

Limitations and Considerations

pg_search may not be ideal when:

  • Distributed, multi-node search across independent clusters is required
  • Extremely large horizontal scaling beyond a single PostgreSQL instance is necessary
  • Advanced semantic search or NLP models are required
  • Complex distributed analytics workloads dominate system requirements

In such scenarios, distributed search engines such as Elasticsearch may be more appropriate.

Conclusion

pg_search represents a significant advancement in PostgreSQL-based search capabilities. By introducing a BM25-powered index access method and integrating ranking with efficient filtering, it bridges the gap between native PostgreSQL full-text search and external search engines.

For applications requiring strong relevance ranking, operational simplicity, and high performance within a single database system, pg_search provides a robust and architecturally clean solution.

In the part 2, we look at the implementation of the pg_search with different scenarios.

See this in action at PGConf India 2026 – pg_search: Bringing Elasticsearch-Grade Search to PostgreSQLĀ presented by Mithun Chicklore Yogendra.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top