Summary
Use Postgres to create a robust search engine. Build a retrieval system with semantic, full-text, and fuzzy search to be used as a backbone in RAG pipelines.
I’ve replaced the Supabase function with just CTEs and a query, as well as prefixed the params with $. Here’s what it’ll look like:
Here we’re using Reciprocal Fusion (RRF) to merge the results. It ensures that items ranked high in multiple lists are given a high rank in the final list.
The new query uses the new pg_trgm.similarity_threshold operator to filter out text where similarity is larger than 0.3.
Tuning full text search results can be done by adding a body column to a document. We expect the title to be short and the body to be longer and include more details.
The title is given more weight than the body because it is a concise representation of the document's main topic. The body is given a weight of 0.2.
Length normalization: When you want to strongly normalize by document length. Unique word normalization is when you want more diverse vocabulary. Use the bitwise option to specify one or more behaviors.
Use a bi-directional encoder to retrieve the initial N results, then a cross-encoder to rank these against the search query. The bi-encoders are fast, making it great for searching a multitude of documents. The cross- Encoders is slower but more performant, making them great for reranking retrieved results.
Postgres’s reliance on TF-IDF for full-text search can struggle with very long documents and rare terms. Use the tools below to help you with your search.