Summary
Use Postgres to create a robust search engine. Build a retrieval system with semantic, full-text, and fuzzy search to be used as a backbone in RAG pipelines.
I’ve replaced the Supabase function with just CTEs and a query, as well as prefixed the params with $. Here’s what it’ll look like:
Here we’re using Reciprocal Fusion (RRF) to merge the results. It ensures that items ranked high in multiple lists are given a high rank in the final list.
The new query uses the new pgtrgm.similaritythreshold operator to filter out text where similarity is larger than 0.3.
Tuning full text search results can be done by adding a body column to a document. We expect the title to be short and the body to be longer and include more details.
The title is given more weight than the body because it is a concise representation of the document's main topic. The body is given a weight of 0.2.
Length normalization: When you want to strongly normalize by document length. Unique word normalization is when you want more diverse vocabulary. Use the bitwise option to specify one or more behaviors.
Use a bi-directional encoder to retrieve the initial N results, then a cross-encoder to rank these against the search query. The bi-encoders are fast, making it great for searching a multitude of documents. The cross- Encoders is slower but more performant, making them great for reranking retrieved results.
Postgres’s reliance on TF-IDF for full-text search can struggle with very long documents and rare terms. Use the tools below to help you with your search.
