Evaluation

CrossEncoder have their own evaluation classes, that are in sentence_transformers.cross_encoder.evaluation.

CEBinaryAccuracyEvaluator

class sentence_transformers.cross_encoder.evaluation.CEBinaryAccuracyEvaluator(sentence_pairs: list[list[str]], labels: list[int], name: str = '', threshold: float = 0.5, write_csv: bool = True)[source]

This evaluator can be used with the CrossEncoder class.

It is designed for CrossEncoders with 1 outputs. It measure the accuracy of the predict class vs. the gold labels. It uses a fixed threshold to determine the label (0 vs 1).

See CEBinaryClassificationEvaluator for an evaluator that determines automatically the optimal threshold.

CEBinaryClassificationEvaluator

class sentence_transformers.cross_encoder.evaluation.CEBinaryClassificationEvaluator(sentence_pairs: list[list[str]], labels: list[int], name: str = '', show_progress_bar: bool = False, write_csv: bool = True)[source]

This evaluator can be used with the CrossEncoder class. Given sentence pairs and binary labels (0 and 1), it compute the average precision and the best possible f1 score

CECorrelationEvaluator

class sentence_transformers.cross_encoder.evaluation.CECorrelationEvaluator(sentence_pairs: list[list[str]], scores: list[float], name: str = '', write_csv: bool = True)[source]

This evaluator can be used with the CrossEncoder class. Given sentence pairs and continuous scores, it compute the pearson & spearman correlation between the predicted score for the sentence pair and the gold score.

CEF1Evaluator

class sentence_transformers.cross_encoder.evaluation.CEF1Evaluator(sentence_pairs: list[list[str]], labels: list[int], *, batch_size: int = 32, show_progress_bar: bool = False, name: str = '', write_csv: bool = True)[source]

CrossEncoder F1 score based evaluator for binary and multiclass tasks.

The task type (binary or multiclass) is determined from the labels array. For binary tasks the returned metric is binary F1 score. For the multiclass tasks the returned metric is macro F1 score.

Parameters
  • sentence_pairs (List[List[str]]) – A list of sentence pairs, where each pair is a list of two strings.

  • labels (List[int]) – A list of integer labels corresponding to each sentence pair.

  • batch_size (int, optional) – Batch size for prediction. Defaults to 32.

  • show_progress_bar (bool, optional) – Show tqdm progress bar.

  • name (str, optional) – An optional name for the CSV file with stored results. Defaults to an empty string.

  • write_csv (bool, optional) – Flag to determine if the data should be saved to a CSV file. Defaults to True.

CESoftmaxAccuracyEvaluator

class sentence_transformers.cross_encoder.evaluation.CESoftmaxAccuracyEvaluator(sentence_pairs: list[list[str]], labels: list[int], name: str = '', write_csv: bool = True)[source]

This evaluator can be used with the CrossEncoder class.

It is designed for CrossEncoders with 2 or more outputs. It measure the accuracy of the predict class vs. the gold labels.

CERerankingEvaluator

class sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator(samples, at_k: int = 10, name: str = '', write_csv: bool = True, mrr_at_k: int | None = None)[source]

This class evaluates a CrossEncoder model for the task of re-ranking.

Given a query and a list of documents, it computes the score [query, doc_i] for all possible documents and sorts them in decreasing order. Then, MRR@10 and NDCG@10 are computed to measure the quality of the ranking.

Parameters

samples (List[Dict, str, Union[str, List[str]]) – Must be a list and each element is of the form: {‘query’: ‘’, ‘positive’: [], ‘negative’: []}. Query is the search query, positive is a list of positive (relevant) documents, negative is a list of negative (irrelevant) documents.