pg_search is a PostgreSQL extension developed by ParadeDB that enhances native PostgreSQL full-text search capabilities by introducing a high-performance BM25-based index access method.
Unlike traditional PostgreSQL full-text search (FTS), which relies on tsvector, tsquery, and GIN/GiST indexes, pg_search provides:
- A dedicated BM25 index access method
- The @@@ search operator
- Built-in relevance scoring
- Snippet generation for highlighted results
- Support for fuzzy matching and typo tolerance
- Optimized execution of search combined with structured filtering
The extension operates entirely within PostgreSQL. It does not require:
- An external search engine
- A distributed search cluster
- A synchronization pipeline
- Data duplication
- Eventual consistency handling
Because it integrates directly with PostgreSQLās extensibility framework and query planner, pg_search enhances search capability while preserving architectural simplicity.
Motivation and Need
Limitations of Native PostgreSQL Full-Text Search
PostgreSQLās built-in full-text search is mature and powerful. However, in production-scale systems, certain limitations become apparent.
Ranking Model Constraints
Native FTS uses the ts_rank function for scoring. While effective, it is primarily frequency-based and lacks:
- Advanced probabilistic relevance modeling
- Strong document length normalization
- Industry-standard BM25 scoring
As user expectations evolve toward search-engine-level relevance quality, these limitations impact result ordering.
Performance Considerations at Scale
When combining:
- Full-text search
- Structured filters (e.g., category, status, timestamps)
- Sorting by relevance
Native FTS may require multiple index interactions and additional ranking computation, which can degrade performance on large datasets.
Modern Application Requirements
Contemporary systems require:
- Intelligent relevance ordering
- Fast query execution
- Integrated search and filtering
- Highlighted snippets
- Real-time indexing
pg_search is designed to address these requirements within PostgreSQL itself.
Architectural Design
Custom BM25 Index Access Method
pg_search introduces a new index access method (USING bm25) that integrates with PostgreSQLās planner.
The index:
- Implements an inverted index structure
- Maps terms to document identifiers efficiently
- Supports indexing across multiple columns
- Can function as a covering index for combined filtering and ranking
- Allows ranking and filtering within a single optimized execution path
Because it is implemented as a native PostgreSQL extension, it participates directly in query planning and execution.
BM25 Ranking Algorithm
BM25 (Best Matching 25) is an industry-standard probabilistic ranking algorithm widely used in modern search systems, including Elasticsearch.
BM25 scoring considers:
- Term Frequency (TF): Frequency of a term within a document
- Inverse Document Frequency (IDF): Rarity of the term across the corpus
- Document Length Normalization: Adjustment to avoid bias toward longer documents
This model produces more intuitive and context-aware ranking compared to simple frequency-based approaches.
Performance Characteristics
pg_search is designed to improve both ranking quality and execution efficiency.
Performance advantages include:
- Efficient inverted index lookups
- Reduced need for multiple index scans
- Optimized ranking computation within the index
- Single-pass execution for search and structured filtering
- Scalability across large datasets
These characteristics make pg_search suitable for applications where search latency and result quality are critical.
Advantages
- Improved Search Relevance: BM25 scoring provides a higher-quality ranking.
- Simplified Architecture: All functionality operates within PostgreSQL.
- No External Infrastructure: Eliminates the need for separate search clusters.
- Real-Time Index Updates: Changes are searchable immediately after commit.
- SQL-Based Integration: Developers use familiar SQL syntax.
- Reduced Operational Overhead: Fewer systems to deploy, monitor, and maintain.
- Scalable Design: Optimized for performance on large datasets.
Typical Use Cases
pg_search is particularly suitable for:
- Product catalogs with relevance ranking
- Blog and content platforms
- Documentation systems
- Knowledge bases
- Support ticket systems
- Internal enterprise search tools
It is especially effective when search must remain tightly coupled with transactional consistency.
Limitations and Considerations
pg_search may not be ideal when:
- Distributed, multi-node search across independent clusters is required
- Extremely large horizontal scaling beyond a single PostgreSQL instance is necessary
- Advanced semantic search or NLP models are required
- Complex distributed analytics workloads dominate system requirements
In such scenarios, distributed search engines such as Elasticsearch may be more appropriate.
Conclusion
pg_search represents a significant advancement in PostgreSQL-based search capabilities. By introducing a BM25-powered index access method and integrating ranking with efficient filtering, it bridges the gap between native PostgreSQL full-text search and external search engines.
For applications requiring strong relevance ranking, operational simplicity, and high performance within a single database system, pg_search provides a robust and architecturally clean solution.
In the part 2, we look at the implementation of the pg_search with different scenarios.
See this in action at PGConf India 2026 ā pg_search: Bringing Elasticsearch-Grade Search to PostgreSQLĀ presented by Mithun Chicklore Yogendra.
