Retrieve & Re-Rank

In Semantic Search we have shown how to use SparseEncoder to compute embeddings for queries, sentences, and paragraphs and how to use this for semantic search. For complex search tasks, for example question answering retrieval, the search can significantly be improved by using Retrieve & Re-Rank. Note that a detailed explanation with dense embeddings produced by Bi-Encoder is accessible here.

Overview

The Retrieve & Re-Rank approach consists of two stages:

Retrieval Stage: Use fast but less accurate methods (SparseEncoder/bi-encoders) to retrieve a larger set of potentially relevant documents
Re-Ranking Stage: Use more sophisticated but slower models (cross-encoders) to re-rank the retrieved documents for better precision

This approach combines the efficiency of first-stage retrieval with the accuracy of second-stage re-ranking.

Interactive Demo: Simple Wikipedia Search

File: retrieve_rerank_simple_wikipedia.ipynb [ Colab Version ]

This Jupyter notebook provides an interactive demonstration of retrieve & re-rank over Simple English Wikipedia as corpus. The example allows you to:

Input queries or questions
Compare different retrieval methods:
- BM25 (lexical/keyword search)
- Sparse Encoder ibm-granite/granite-embedding-30m-sparse
- Dense Encoder multi-qa-MiniLM-L6-cos-v1
And re-ranking results using a CrossEncoder cross-encoder/ms-marco-MiniLM-L6-v2

Comprehensive Evaluation: Hybrid Search Pipeline

File: hybrid_search.py

This script provides a complete evaluation pipeline comparing different retrieval and re-ranking approaches on a given dataset (here in our example NanoNFCorpus). It includes:

Sparse Retrieval using ibm-granite/granite-embedding-30m-sparse
Dense Retrieval using multi-qa-MiniLM-L6-cos-v1
Re-ranking both sparse and dense results with cross-encoder/ms-marco-MiniLM-L6-v2
Hybrid Search using Reciprocal Rank Fusion ReciprocalRankFusionEvaluator
Hybrid Re-ranking applying cross-encoder to fused results

Output: The script generates comprehensive metrics and saves results in the runs/ directory.

Evaluation Results

Example results from running the hybrid search evaluation on NanoNFCorpus:

================================================================================
EVALUATION SUMMARY
================================================================================
METHOD                            NDCG@10     MRR@10        MAP
--------------------------------------------------------------------------------
Sparse Retrieval                    32.10      47.27      28.29
Dense Retrieval                     27.35      41.59      22.79
Sparse + Reranking                  37.35      57.19      32.12
Dense + Reranking                   37.56      58.27      31.93
Hybrid RRF                          32.62      49.63      22.51
Hybrid RRF + Reranking              36.16      55.77      26.99
================================================================================

Key Observations:

Re-ranking consistently improves performance across all retrieval methods
Sparse retrieval seems to already give strong first results
Both sparse and dense re-ranking achieve similar high performance
Hybrid approaches provide balanced results

Pre-trained Models

Sparse Encoder (Retrieval)

The SparseEncoder produces embeddings independently for your paragraphs and for your search queries. You can use it like this:

from sentence_transformers import SparseEncoder

model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

docs = [
    "My first paragraph. That contains information",
    "Python is a programming language.",
]
document_embeddings = model.encode_document(docs)

query = "What is Python?"
query_embedding = model.encode_query(query)

For pre-trained Sparse Encoder models, see: Pretrained Sparse-Encoders.

Cross-Encoders (Re-Ranker)

For pre-trained Cross Encoder models, see: MS MARCO Cross-Encoders