Retrieve & Re-Rank

In Semantic Search we have shown how to use SparseEncoder to compute embeddings for queries, sentences, and paragraphs and how to use this for semantic search. For complex search tasks, for example question answering retrieval, the search can significantly be improved by using Retrieve & Re-Rank. Note that a detailed explanation with dense embeddings produced by Bi-Encoder is accessible here.

Overview

The Retrieve & Re-Rank approach consists of two stages:

  1. Retrieval Stage: Use fast but less accurate methods (SparseEncoder/bi-encoders) to retrieve a larger set of potentially relevant documents

  2. Re-Ranking Stage: Use more sophisticated but slower models (cross-encoders) to re-rank the retrieved documents for better precision

This approach combines the efficiency of first-stage retrieval with the accuracy of second-stage re-ranking.

Comprehensive Evaluation: Hybrid Search Pipeline

File: hybrid_search.py

This script provides a complete evaluation pipeline comparing different retrieval and re-ranking approaches on a given dataset (here in our example NanoNFCorpus). It includes:

  1. Sparse Retrieval using ibm-granite/granite-embedding-30m-sparse

  2. Dense Retrieval using multi-qa-MiniLM-L6-cos-v1

  3. Re-ranking both sparse and dense results with cross-encoder/ms-marco-MiniLM-L6-v2

  4. Hybrid Search using Reciprocal Rank Fusion ReciprocalRankFusionEvaluator

  5. Hybrid Re-ranking applying cross-encoder to fused results

Output: The script generates comprehensive metrics and saves results in the runs/ directory.

Evaluation Results

Example results from running the hybrid search evaluation on NanoNFCorpus:

================================================================================
EVALUATION SUMMARY
================================================================================
METHOD                            NDCG@10     MRR@10        MAP
--------------------------------------------------------------------------------
Sparse Retrieval                    32.10      47.27      28.29
Dense Retrieval                     27.35      41.59      22.79
Sparse + Reranking                  37.35      57.19      32.12
Dense + Reranking                   37.56      58.27      31.93
Hybrid RRF                          32.62      49.63      22.51
Hybrid RRF + Reranking              36.16      55.77      26.99
================================================================================

Key Observations:

  • Re-ranking consistently improves performance across all retrieval methods

  • Sparse retrieval seems to already give strong first results

  • Both sparse and dense re-ranking achieve similar high performance

  • Hybrid approaches provide balanced results

Pre-trained Models

Sparse Encoder (Retrieval)

The SparseEncoder produces embeddings independently for your paragraphs and for your search queries. You can use it like this:

from sentence_transformers import SparseEncoder

model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

docs = [
    "My first paragraph. That contains information",
    "Python is a programming language.",
]
document_embeddings = model.encode_document(docs)

query = "What is Python?"
query_embedding = model.encode_query(query)

For pre-trained Sparse Encoder models, see: Pretrained Sparse-Encoders.

Cross-Encoders (Re-Ranker)

For pre-trained Cross Encoder models, see: MS MARCO Cross-Encoders