CrossEncoder
CrossEncoder
For an introduction to Cross-Encoders, see Cross-Encoders.
- class sentence_transformers.cross_encoder.model.CrossEncoder(model_name_or_path: str | None = None, *, modules: list[Module] | OrderedDict[str, Module] | None = None, device: str | None = None, prompts: dict[str, str] | None = None, default_prompt_name: str | None = None, cache_folder: str | None = None, trust_remote_code: bool = False, revision: str | None = None, local_files_only: bool = False, token: bool | str | None = None, model_kwargs: dict | None = None, processor_kwargs: dict | None = None, config_kwargs: dict | None = None, model_card_data: CrossEncoderModelCardData | None = None, backend: Literal['torch', 'onnx', 'openvino'] = 'torch', num_labels: int | None = None, max_length: int | None = None, activation_fn: Callable | None = None)[source]
Loads or creates a CrossEncoder model that takes a sentence pair as input and outputs a score or label.
A CrossEncoder does not produce sentence embeddings. Instead, it processes both sentences jointly through the transformer and outputs a score (regression) or class probabilities (classification). This makes it more accurate for pairwise tasks like reranking or semantic textual similarity, but it cannot pre-compute embeddings for individual sentences.
- Parameters:
model_name_or_path (str, optional) – If a filepath on disk, loads the model from that path. Otherwise, tries to download a pre-trained CrossEncoder model. If that fails, tries to construct a model from the Hugging Face Hub with that name. Defaults to None.
modules (list[nn.Module], optional) – A list of torch modules that are called sequentially. Can be used to create custom CrossEncoder models from scratch. Defaults to None.
device (str, optional) – Device (like
"cuda","cpu","mps","npu") that should be used for computation. If None, checks if a GPU can be used. Defaults to None.prompts (dict[str, str], optional) – A dictionary with prompts for the model. The key is the prompt name, the value is the prompt text. The prompt text will be prepended before any text to encode. For example:
{"query": "query: ", "passage": "passage: "}. If a model has saved prompts, you can override them by passing your own, or pass{"query": "", "document": ""}to disable them. Defaults to None.default_prompt_name (str, optional) – The name of the prompt that should be used by default. If not set, no prompt will be applied. Defaults to None.
cache_folder (str, optional) – Path to store models. Can also be set by the
SENTENCE_TRANSFORMERS_HOMEenvironment variable. Defaults to None.trust_remote_code (bool, optional) – Whether to allow for custom models defined on the Hub in their own modeling files. Only set to
Truefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. Defaults to False.revision (str, optional) – The specific model version to use. It can be a branch name, a tag name, or a commit id, for a stored model on Hugging Face. Defaults to None.
local_files_only (bool, optional) – Whether to only look at local files (i.e., do not try to download the model). Defaults to False.
token (bool or str, optional) – Hugging Face authentication token to download private models. Defaults to None.
model_kwargs (dict[str, Any], optional) –
Keyword arguments passed to the underlying Hugging Face Transformers model via
AutoModel.from_pretrained. Particularly useful options include:torch_dtype: Override the defaulttorch.dtypeand load the model under a specific dtype. Can betorch.float16,torch.bfloat16,torch.float32, or"auto"to use the dtype from the model’sconfig.json.attn_implementation: The attention implementation to use. For example"eager","sdpa", or"flash_attention_2". If youpip install kernels, then"flash_attention_2"should work without having to installflash_attn. It is frequently the fastest option. Defaults to"sdpa"when available (torch>=2.1.1).device_map: Device map for model parallelism, e.g."auto".provider: Forbackend="onnx", the ONNX execution provider (e.g."CUDAExecutionProvider").file_name: Forbackend="onnx"or"openvino", the filename to load (e.g. for optimized or quantized models).export: Forbackend="onnx"or"openvino", whether to export the model to the backend format. Also set automatically if the exported file doesn’t exist.
See the PreTrainedModel.from_pretrained documentation for more details. Defaults to None.
processor_kwargs (dict[str, Any], optional) – Keyword arguments passed to the Hugging Face Transformers processor/tokenizer via
AutoProcessor.from_pretrained. See the AutoTokenizer.from_pretrained documentation for more details. Defaults to None.config_kwargs (dict[str, Any], optional) – Keyword arguments passed to the Hugging Face Transformers config via
AutoConfig.from_pretrained. See the AutoConfig.from_pretrained documentation for more details. For example, you can setclassifier_dropoutornum_labelsvia this parameter. Defaults to None.model_card_data (
CrossEncoderModelCardData, optional) – A model card data object that contains information about the model. Used to generate a model card when saving the model. If not set, a default model card data object is created. Defaults to None.backend (str, optional) – The backend to use for inference. Can be
"torch"(default),"onnx", or"openvino". Defaults to"torch".num_labels (int, optional) – Number of labels of the classifier. If 1, the CrossEncoder is a regression model that outputs a continuous score. If > 1, it outputs several scores that can be soft-maxed to get probability scores for the different classes. Defaults to None.
max_length (int, optional) – Max length for input sequences. Longer sequences will be truncated. If None, the max length of the model will be used. Defaults to None.
activation_fn (Callable, optional) – Activation function applied on top of the model’s logits during
predict(). If None,nn.Sigmoid()is used whennum_labels=1, elsenn.Identity(). Defaults to None.
Example
from sentence_transformers import CrossEncoder # Load a pre-trained CrossEncoder model model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2") # Predict scores for sentence pairs pairs = [ ("How many people live in Berlin?", "Berlin had a population of 3,520,031 in 2019."), ("How many people live in Berlin?", "Berlin is well known for its museums."), ] scores = model.predict(pairs) print(scores) # [8.607 1.133] # Rank documents by relevance to a query results = model.rank( "How many people live in Berlin?", ["Berlin had a population of 3,520,031 in 2019.", "Berlin is well known for its museums."], ) print(results) # [{'corpus_id': 0, 'score': 8.607317}, {'corpus_id': 1, 'score': 1.1329174}]
Initialize a BaseModel instance.
- Parameters:
model_name_or_path (str, optional) – If a filepath on disk, loads the model from that path. Otherwise, tries to download a pre-trained model. If that fails, tries to construct a model from the Hugging Face Hub with that name. Defaults to None.
modules (list[nn.Module], optional) – A list of torch modules that are called sequentially. Can be used to create custom models from scratch. Defaults to None.
device (str, optional) – Device (like
"cuda","cpu","mps","npu") that should be used for computation. If None, checks if a GPU can be used. Defaults to None.prompts (dict[str, str], optional) – A dictionary with prompts for the model. The key is the prompt name, the value is the prompt text. The prompt text will be prepended before any text during inference. For example:
{"query": "query: ", "passage": "passage: "}. If a model has saved prompts, you can override them by passing your own, or pass{"query": "", "document": ""}to disable them. Defaults to None.default_prompt_name (str, optional) – The name of the prompt that should be used by default. If not set, no prompt will be applied. Defaults to None.
cache_folder (str, optional) – Path to store models. Can also be set by the
SENTENCE_TRANSFORMERS_HOMEenvironment variable. Defaults to None.trust_remote_code (bool, optional) – Whether to allow for custom models defined on the Hub in their own modeling files. Only set to
Truefor repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine. Defaults to False.revision (str, optional) – The specific model version to use. It can be a branch name, a tag name, or a commit id, for a stored model on Hugging Face. Defaults to None.
local_files_only (bool, optional) – Whether to only look at local files (i.e., do not try to download the model). Defaults to False.
token (bool or str, optional) – Hugging Face authentication token to download private models. Defaults to None.
model_kwargs (dict[str, Any], optional) –
Keyword arguments passed to the underlying Hugging Face Transformers model via
AutoModel.from_pretrained. Particularly useful options include:torch_dtype: Override the defaulttorch.dtypeand load the model under a specific dtype. Can betorch.float16,torch.bfloat16,torch.float32, or"auto"to use the dtype from the model’sconfig.json.attn_implementation: The attention implementation to use. For example"eager","sdpa", or"flash_attention_2". If youpip install kernels, then"flash_attention_2"should work without having to installflash_attn. It is frequently the fastest option. Defaults to"sdpa"when available (torch>=2.1.1).device_map: Device map for model parallelism, e.g."auto".provider: Forbackend="onnx", the ONNX execution provider (e.g."CUDAExecutionProvider").file_name: Forbackend="onnx"or"openvino", the filename to load (e.g. for optimized or quantized models).export: Forbackend="onnx"or"openvino", whether to export the model to the backend format. Also set automatically if the exported file doesn’t exist.
See the PreTrainedModel.from_pretrained documentation for more details. Defaults to None.
processor_kwargs (dict[str, Any], optional) –
Keyword arguments passed to the Hugging Face Transformers processor/tokenizer via
AutoProcessor.from_pretrained. See the AutoTokenizer.from_pretrained documentation for more details. Defaults to None.config_kwargs (dict[str, Any], optional) –
Keyword arguments passed to the Hugging Face Transformers config via
AutoConfig.from_pretrained. See the AutoConfig.from_pretrained documentation for more details. Defaults to None.model_card_data (CardData, optional) – A model card data object that contains information about the model. Used to generate a model card when saving the model. If not set, a default model card data object is created. Defaults to None.
backend (str, optional) – The backend to use for inference. Can be
"torch"(default),"onnx", or"openvino". Defaults to"torch".
- active_adapters() list[str][source]
If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
Gets the current active adapters of the model. In case of multi-adapter inference (combining multiple adapters for inference) returns the list of all active adapters so that users can deal with them accordingly.
For previous PEFT versions (that does not support multi-adapter inference), module.active_adapter will return a single string.
- add_adapter(*args, **kwargs) None[source]
Adds a fresh new adapter to the current model for training purposes. If no adapter name is passed, a default name is assigned to the adapter to follow the convention of PEFT library (in PEFT we use “default” as the default adapter name).
Requires peft as a backend to load the adapter weights and the underlying model to be compatible with PEFT.
- Parameters:
*args – Positional arguments to pass to the underlying AutoModel add_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.add_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel add_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.add_adapter
- bfloat16() T
Casts all floating point parameters and buffers to
bfloat16datatype.Note
This method modifies the module in-place.
- Returns:
self
- Return type:
- compile(*args, **kwargs)
Compile this Module’s forward using
torch.compile().This Module’s __call__ method is compiled and all arguments are passed as-is to
torch.compile().See
torch.compile()for details on the arguments for this function.
- cpu() T
Moves all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
self
- Return type:
- cuda(device: int | device | None = None) T
Moves all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Parameters:
device (int, optional) – if specified, all parameters will be copied to that device
- Returns:
self
- Return type:
- delete_adapter(*args, **kwargs) None[source]
If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
Delete an adapter’s LoRA layers from the underlying model.
- Parameters:
*args – Positional arguments to pass to the underlying AutoModel delete_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.delete_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel delete_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.delete_adapter
- property device: device
Get torch.device from module, assuming that the whole module has one device. In case there are no PyTorch parameters, fall back to CPU.
- disable_adapters() None[source]
Disable all adapters that are attached to the model. This leads to inferring with the base model only.
- double() T
Casts all floating point parameters and buffers to
doubledatatype.Note
This method modifies the module in-place.
- Returns:
self
- Return type:
- property dtype: dtype | None
The dtype of the module (assuming that all the module parameters have the same dtype).
- Type:
torch.dtype
- enable_adapters() None[source]
Enable adapters that are attached to the model. The model will use self.active_adapter()
- eval() T
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.This is equivalent with
self.train(False).See Locally disabling gradient computation for a comparison between .eval() and several similar mechanisms that may be confused with it.
- Returns:
self
- Return type:
- evaluate(evaluator: BaseEvaluator, output_path: str | None = None) dict[str, float] | float[source]
Evaluate the model based on an evaluator
- Parameters:
evaluator (BaseEvaluator) – The evaluator used to evaluate the model.
output_path (str, optional) – The path where the evaluator can write the results. Defaults to None.
- Returns:
The evaluation results.
- fit(train_dataloader: DataLoader, evaluator: BaseEvaluator | None = None, epochs: int = 1, loss_fct=None, activation_fct=Identity(), scheduler: str = 'WarmupLinear', warmup_steps: int = 10000, optimizer_class: type[Optimizer] = <class 'torch.optim.adamw.AdamW'>, optimizer_params: dict[str, object] = {'lr': 2e-05}, weight_decay: float = 0.01, evaluation_steps: int = 0, output_path: str | None = None, save_best_model: bool = True, max_grad_norm: float = 1, use_amp: bool = False, callback: Callable[[float, int, int], None] = None, show_progress_bar: bool = True) None[source]
Deprecated training method from before Sentence Transformers v4.0, it is recommended to use
CrossEncoderTrainerinstead. This method usesCrossEncoderTrainerbehind the scenes, but does not provide as much flexibility as the Trainer itself.This training approach uses a DataLoader and Loss function to train the model.
This method should produce equivalent results in v4.0 as before v4.0, but if you encounter any issues with your existing training scripts, then you may wish to use
CrossEncoder.old_fitinstead. That uses the old training method from before v4.0.- Parameters:
train_dataloader – The DataLoader with InputExample instances
evaluator – An evaluator (sentence_transformers.cross_encoder.evaluation) evaluates the model performance during training on held- out dev data. It is used to determine the best model that is saved to disk.
epochs – Number of epochs for training
loss_fct – Which loss function to use for training. If None, will use BinaryCrossEntropy() if self.config.num_labels == 1 else CrossEntropyLoss(). Defaults to None.
activation_fct – Activation function applied on top of logits output of model.
scheduler – Learning rate scheduler. Available schedulers: constantlr, warmupconstant, warmuplinear, warmupcosine, warmupcosinewithhardrestarts
warmup_steps – Behavior depends on the scheduler. For WarmupLinear (default), the learning rate is increased from o up to the maximal learning rate. After these many training steps, the learning rate is decreased linearly back to zero.
optimizer_class – Optimizer
optimizer_params – Optimizer parameters
weight_decay – Weight decay for model parameters
evaluation_steps – If > 0, evaluate the model using evaluator after each number of training steps
output_path – Storage path for the model and evaluation files
save_best_model – If true, the best model (according to evaluator) is stored at output_path
max_grad_norm – Used for gradient normalization.
use_amp – Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0
callback – Callback function that is invoked after each evaluation. It must accept the following three parameters in this order: score, epoch, steps
show_progress_bar – If True, output a tqdm progress bar
- float() T
Casts all floating point parameters and buffers to
floatdatatype.Note
This method modifies the module in-place.
- Returns:
self
- Return type:
- get_adapter_state_dict(*args, **kwargs) dict[source]
If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft
Gets the adapter state dict that should only contain the weights tensors of the specified adapter_name adapter. If no adapter_name is passed, the active adapter is used.
- Parameters:
*args – Positional arguments to pass to the underlying AutoModel get_adapter_state_dict function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.get_adapter_state_dict
**kwargs – Keyword arguments to pass to the underlying AutoModel get_adapter_state_dict function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.get_adapter_state_dict
- get_backend() Literal['torch', 'onnx', 'openvino'][source]
Return the backend used for inference, which can be one of “torch”, “onnx”, or “openvino”.
- Returns:
The backend used for inference.
- Return type:
str
- get_max_seq_length() int | None[source]
Deprecated since version Use: the
max_seq_lengthproperty instead.Returns the maximal sequence length that the first module of the model accepts. Longer inputs will be truncated.
- Returns:
The maximal sequence length that the model accepts, or None if it is not defined.
- Return type:
Optional[int]
- get_model_kwargs() list[str][source]
Get the keyword arguments specific to this model for inference methods like encode or predict.
Example
>>> from sentence_transformers import SentenceTransformer, SparseEncoder >>> SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2").get_model_kwargs() [] >>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs() ['task', 'truncate_dim'] >>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs() ['task']
- Returns:
A list of keyword arguments for the forward pass.
- Return type:
list[str]
- gradient_checkpointing_enable(gradient_checkpointing_kwargs: dict[str, Any] | None = None) None[source]
Enable gradient checkpointing for the model.
- half() T
Casts all floating point parameters and buffers to
halfdatatype.Note
This method modifies the module in-place.
- Returns:
self
- Return type:
- is_singular_input(inputs: tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]]]) bool[source]
Check if the input represents a single example or a batch of examples.
- Parameters:
inputs – The input to check.
- Returns:
True if the input is a single example, False if it is a batch.
- Return type:
bool
- load_adapter(*args, **kwargs) None[source]
Load adapter weights from file or remote Hub folder.” If you are not familiar with adapters and PEFT methods, we invite you to read more about them on PEFT official documentation: https://huggingface.co/docs/peft
Requires peft as a backend to load the adapter weights and the underlying model to be compatible with PEFT.
- Parameters:
*args – Positional arguments to pass to the underlying AutoModel load_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.load_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel load_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.load_adapter
- property max_seq_length: int | None
Returns the maximal input sequence length for the model. Longer inputs will be truncated.
- Returns:
The maximal input sequence length, or None if not defined.
- Return type:
Optional[int]
- property modalities: list[Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]]
Return the list of modalities supported by this model, e.g.
["text"]or["text", "image", "message"].
- model_card_data_class[source]
alias of
CrossEncoderModelCardData
- old_fit(train_dataloader: ~torch.utils.data.dataloader.DataLoader, evaluator: ~sentence_transformers.base.evaluation.evaluator.BaseEvaluator | None = None, epochs: int = 1, loss_fct=None, activation_fct=Identity(), scheduler: str = 'WarmupLinear', warmup_steps: int = 10000, optimizer_class: type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adamw.AdamW'>, optimizer_params: dict[str, object] = {'lr': 2e-05}, weight_decay: float = 0.01, evaluation_steps: int = 0, output_path: str | None = None, save_best_model: bool = True, max_grad_norm: float = 1, use_amp: bool = False, callback: ~collections.abc.Callable[[float, int, int], None] | None = None, show_progress_bar: bool = True) None[source]
Deprecated training method from before Sentence Transformers v4.0, it is recommended to use
CrossEncoderTrainerinstead. This method should only be used if you encounter issues with your existing training scripts after upgrading to v4.0.This training approach uses a DataLoader and Loss function to train the model.
- Parameters:
train_dataloader – The DataLoader with InputExample instances
evaluator – An evaluator (sentence_transformers.cross_encoder.evaluation) evaluates the model performance during training on held- out dev data. It is used to determine the best model that is saved to disk.
epochs – Number of epochs for training
loss_fct – Which loss function to use for training. If None, will use BinaryCrossEntropy() if self.config.num_labels == 1 else CrossEntropyLoss(). Defaults to None.
activation_fct – Activation function applied on top of logits output of model.
scheduler – Learning rate scheduler. Available schedulers: constantlr, warmupconstant, warmuplinear, warmupcosine, warmupcosinewithhardrestarts
warmup_steps – Behavior depends on the scheduler. For WarmupLinear (default), the learning rate is increased from o up to the maximal learning rate. After these many training steps, the learning rate is decreased linearly back to zero.
optimizer_class – Optimizer
optimizer_params – Optimizer parameters
weight_decay – Weight decay for model parameters
evaluation_steps – If > 0, evaluate the model using evaluator after each number of training steps
output_path – Storage path for the model and evaluation files
save_best_model – If true, the best model (according to evaluator) is stored at output_path
max_grad_norm – Used for gradient normalization.
use_amp – Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0
callback – Callback function that is invoked after each evaluation. It must accept the following three parameters in this order: score, epoch, steps
show_progress_bar – If True, output a tqdm progress bar
- predict(inputs: PairInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, activation_fn: Callable | None = None, apply_softmax: bool | None = False, convert_to_numpy: Literal[False] = True, convert_to_tensor: Literal[False] = False, device: str | list[str | torch.device] | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) torch.Tensor[source]
- predict(inputs: list[PairInput] | PairInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, activation_fn: Callable | None = None, apply_softmax: bool | None = False, convert_to_numpy: Literal[True] = True, convert_to_tensor: Literal[False] = False, device: str | list[str | torch.device] | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) np.ndarray
- predict(inputs: list[PairInput] | PairInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, activation_fn: Callable | None = None, apply_softmax: bool | None = False, convert_to_numpy: bool = True, convert_to_tensor: Literal[True] = False, device: str | list[str | torch.device] | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) torch.Tensor
- predict(inputs: list[PairInput], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, activation_fn: Callable | None = None, apply_softmax: bool | None = False, convert_to_numpy: Literal[False] = True, convert_to_tensor: Literal[False] = False, device: str | list[str | torch.device] | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) list[torch.Tensor]
Performs predictions with the CrossEncoder on the given input pairs.
Tip
Adjusting
batch_sizecan significantly improve processing speed. The optimal value depends on your hardware, model size, precision, and input length. Benchmark a few batch sizes on a small subset of your data to find the best value.- Parameters:
inputs (Union[List[PairInput], PairInput]) – A list of input pairs or one input pair, where each element can be a string, image, or multimodal dict.
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding.
prompt (Optional[str], optional) – The prompt to use for encoding.
batch_size (int, optional) – Batch size for encoding. Defaults to 32.
show_progress_bar (bool, optional) – Output progress bar. Defaults to None.
activation_fn (callable, optional) – Activation function applied on the logits output of the CrossEncoder. If None, the
model.activation_fnwill be used, which defaults totorch.nn.Sigmoidif num_labels=1, elsetorch.nn.Identity. Defaults to None.apply_softmax (bool, optional) – If set to True and model.num_labels > 1, applies softmax on the logits output such that for each sample, the scores of each class sum to 1. Defaults to False.
convert_to_numpy (bool, optional) – Whether the output should be a list of numpy vectors. If False, output a list of PyTorch tensors. Defaults to True.
convert_to_tensor (bool, optional) – Whether the output should be one large tensor. Overwrites convert_to_numpy. Defaults to False.
device (Union[str, List[str]], optional) – Device(s) to use for computation. Can be a single device string (e.g., “cuda:0”, “cpu”) or a list of devices (e.g., [“cuda:0”, “cuda:1”]). If a list is provided, multiprocessing will be used automatically. Defaults to None.
pool (Dict[str, Any], optional) – A pool of workers created with
start_multi_process_pool(). If provided, multiprocessing will be used. If None anddeviceis a list, a pool will be created automatically. Defaults to None.chunk_size (int, optional) – Size of chunks for multiprocessing. If None, a sensible default is calculated. Only used when
poolis not None ordeviceis a list. Defaults to None.
- Returns:
Predictions for the passed input pairs. The return type depends on the
convert_to_numpyandconvert_to_tensorparameters. Ifconvert_to_tensoris True, the output will be atorch.Tensor. Ifconvert_to_numpyis True, the output will be anumpy.ndarray. Otherwise, the output will be a list oftorch.Tensorvalues.- Return type:
Union[List[torch.Tensor], np.ndarray, torch.Tensor]
Examples
from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/stsb-roberta-base") sentences = [["I love cats", "Cats are amazing"], ["I prefer dogs", "Dogs are loyal"]] model.predict(sentences) # => array([0.6912767, 0.4303499], dtype=float32) # Using multiprocessing with automatic pool scores = model.predict(sentences, device=["cuda:0", "cuda:1"]) # Using multiprocessing with manual pool pool = model.start_multi_process_pool() scores = model.predict(sentences, pool=pool) model.stop_multi_process_pool(pool)
- preprocess(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict] | tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]]], prompt: str | None = None, **kwargs) dict[str, Tensor | Any][source]
Preprocesses the inputs for the model.
- Parameters:
inputs (list[SingleInput | PairInput]) – A list of inputs to be preprocessed. Each input can be a string, dict, tuple, PIL Image, numpy array, torch Tensor, or other supported modality. If a single input is provided, it must be wrapped in a list.
prompt (str, optional) – A prompt string to prepend to text inputs. Defaults to None. If the model supports the
messagemodality, the prompt will be added as a system message to the input messages instead of being prepended to text.
- Returns:
A dictionary of tensors with the preprocessed inputs.
- Return type:
dict[str, Tensor | Any]
- property processor: Any
Property to get the processor that is used by this model
- push_to_hub(repo_id: str, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str | None = None, local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None, revision: str | None = None, create_pr: bool = False) str[source]
Uploads all elements of this model to a HuggingFace Hub repository, creating it if it doesn’t exist.
- Parameters:
repo_id (str) – Repository name for your model in the Hub, including the user or organization.
token (str, optional) – An authentication token (See https://huggingface.co/settings/token)
private (bool, optional) – Set to true, for hosting a private model
safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way
commit_message (str, optional) – Message to commit while pushing.
local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded
exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible
replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card. If false (default), keep the existing model card if one exists in the repository.
train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.
revision (str, optional) – Branch to push the uploaded files to
create_pr (bool, optional) – If True, create a pull request instead of pushing directly to the main branch
- Returns:
The url of the commit of your model in the repository on the Hugging Face Hub.
- Return type:
str
- rank(query: str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], documents: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]], top_k: int | None = None, return_documents: bool = False, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, activation_fn: Callable | None = None, apply_softmax=False, convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | device] | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None) list[dict[Literal['corpus_id', 'score', 'text'], int | float | str]][source]
Performs ranking with the CrossEncoder on the given query and documents. Returns a sorted list with the document indices and scores.
Tip
Adjusting
batch_sizecan significantly improve processing speed. The optimal value depends on your hardware, model size, precision, and input length. Benchmark a few batch sizes on a small subset of your data to find the best value.- Parameters:
query (PairableInput) – A single query, e.g. a string, image, or multimodal dict.
documents (List[PairableInput]) – A list of documents, e.g. strings, images, or multimodal dicts.
top_k (Optional[int], optional) – Return the top-k documents. If None, all documents are returned. Defaults to None.
return_documents (bool, optional) – If True, also returns the documents. If False, only returns the indices and scores. Defaults to False.
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding.
prompt (Optional[str], optional) – The prompt to use for encoding.
batch_size (int, optional) – Batch size for encoding. Defaults to 32.
show_progress_bar (bool, optional) – Output progress bar. Defaults to None.
activation_fn ([type], optional) – Activation function applied on the logits output of the CrossEncoder. If None, nn.Sigmoid() will be used if num_labels=1, else nn.Identity. Defaults to None.
convert_to_numpy (bool, optional) – Convert the output to a numpy matrix. Defaults to True.
apply_softmax (bool, optional) – If there are more than 2 dimensions and apply_softmax=True, applies softmax on the logits output. Defaults to False.
convert_to_tensor (bool, optional) – Convert the output to a tensor. Defaults to False.
device (Union[str, List[str]], optional) – Device(s) to use for computation. Can be a single device string (e.g., “cuda:0”, “cpu”) or a list of devices (e.g., [“cuda:0”, “cuda:1”]). If a list is provided, multiprocessing will be used automatically. Defaults to None.
pool (Dict[str, Any], optional) – A pool of workers created with
start_multi_process_pool(). If provided, multiprocessing will be used. If None anddeviceis a list, a pool will be created automatically. Defaults to None.chunk_size (int, optional) – Size of chunks for multiprocessing. If None, a sensible default is calculated. Only used when
poolis not None ordeviceis a list. Defaults to None.
- Returns:
A sorted list with the “corpus_id”, “score”, and optionally “text” of the documents.
- Return type:
List[Dict[Literal[“corpus_id”, “score”, “text”], Union[int, float, str]]]
Example
from sentence_transformers import CrossEncoder model = CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2") query = "Who wrote 'To Kill a Mockingbird'?" documents = [ "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature.", "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil.", "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961.", "Jane Austen was an English novelist known primarily for her six major novels, which interpret, critique and comment upon the British landed gentry at the end of the 18th century.", "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era.", "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan." ] model.rank(query, documents, return_documents=True)
[{'corpus_id': 0, 'score': 10.67858, 'text': "'To Kill a Mockingbird' is a novel by Harper Lee published in 1960. It was immediately successful, winning the Pulitzer Prize, and has become a classic of modern American literature."}, {'corpus_id': 2, 'score': 9.761677, 'text': "Harper Lee, an American novelist widely known for her novel 'To Kill a Mockingbird', was born in 1926 in Monroeville, Alabama. She received the Pulitzer Prize for Fiction in 1961."}, {'corpus_id': 1, 'score': -3.3099542, 'text': "The novel 'Moby-Dick' was written by Herman Melville and first published in 1851. It is considered a masterpiece of American literature and deals with complex themes of obsession, revenge, and the conflict between good and evil."}, {'corpus_id': 5, 'score': -4.8989105, 'text': "'The Great Gatsby', a novel written by American author F. Scott Fitzgerald, was published in 1925. The story is set in the Jazz Age and follows the life of millionaire Jay Gatsby and his pursuit of Daisy Buchanan."}, {'corpus_id': 4, 'score': -5.082967, 'text': "The 'Harry Potter' series, which consists of seven fantasy novels written by British author J.K. Rowling, is among the most popular and critically acclaimed books of the modern era."}]
- save_pretrained(path: str, model_name: str | None = None, create_model_card: bool = True, train_datasets: list[str] | None = None, safe_serialization: bool = True) None[source]
Saves a model and its configuration files to a directory, so that it can be loaded again.
- Parameters:
path (str) – Path on disk where the model will be saved.
model_name (str, optional) – Optional model name.
create_model_card (bool, optional) – If True, create a README.md with basic information about this model.
train_datasets (List[str], optional) – Optional list with the names of the datasets used to train the model.
safe_serialization (bool, optional) – If True, save the model using safetensors. If False, save the model the traditional (but unsafe) PyTorch way.
- save_to_hub(repo_id: str, organization: str | None = None, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str = 'Add new model.', local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None) str[source]
DEPRECATED, use push_to_hub instead.
Uploads all elements of this model to a new HuggingFace Hub repository.
- Parameters:
repo_id (str) – Repository name for your model in the Hub, including the user or organization.
token (str, optional) – An authentication token (See https://huggingface.co/settings/token)
private (bool, optional) – Set to true, for hosting a private model
safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way
commit_message (str, optional) – Message to commit while pushing.
local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded
exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible
replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card
train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.
- Returns:
The url of the commit of your model in the repository on the Hugging Face Hub.
- Return type:
str
- set_adapter(*args, **kwargs) None[source]
Sets a specific adapter by forcing the model to use that adapter and disable the other adapters.
- Parameters:
*args – Positional arguments to pass to the underlying AutoModel set_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.set_adapter
**kwargs – Keyword arguments to pass to the underlying AutoModel set_adapter function. More information can be found in the transformers documentation https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin.set_adapter
- start_multi_process_pool(target_devices: list[str] | None = None) dict[Literal['input', 'output', 'processes'], Any][source]
Starts a multi-process pool to infer with several independent processes.
This method is recommended if you want to predict on multiple GPUs or CPUs. It is advised to start only one process per GPU. This method works together with predict and stop_multi_process_pool.
- Parameters:
target_devices (List[str], optional) – PyTorch target devices, e.g. [“cuda:0”, “cuda:1”, …], [“npu:0”, “npu:1”, …], or [“cpu”, “cpu”, “cpu”, “cpu”]. If target_devices is None and CUDA/NPU is available, then all available CUDA/NPU devices will be used. If target_devices is None and CUDA/NPU is not available, then 4 CPU devices will be used.
- Returns:
A dictionary with the target processes, an input queue, and an output queue.
- Return type:
Dict[str, Any]
- static stop_multi_process_pool(pool: dict[Literal['input', 'output', 'processes'], Any]) None[source]
Stops all processes started with start_multi_process_pool.
- Parameters:
pool (Dict[str, object]) – A dictionary containing the input queue, output queue, and process list.
- Returns:
None
- supports(modality: Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]) bool[source]
Check if the model supports the given modality.
A modality is supported if:
It is directly listed in
modalities(including tuple modalities that are explicitly listed), orIt is a tuple of modalities (e.g.
("image", "text")) where each part is individually supported and the model also supports"message"format, which is used to combine multiple modalities into a single input.
- Parameters:
modality – A single modality string (e.g.
"text","image") or a tuple of modality strings (e.g.("image", "text")).- Returns:
Whether the model supports the given modality.
- Return type:
bool
Example:
>>> from sentence_transformers import SentenceTransformer >>> model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") >>> model.supports("text") True >>> model.supports("image") False
- to(*args, **kwargs)
Moves and/or casts the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to(), but only accepts floating point or complexdtypes. In addition, this method will only cast the floating point or complex parameters and buffers todtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters:
device (
torch.device) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns:
self
- Return type:
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- tokenize(texts: list[str] | list[dict] | list[tuple[str, str]], **kwargs) dict[str, Tensor][source]
Deprecated since version `tokenize`: is deprecated. Use preprocess instead.
- property tokenizer: Any
Property to get the tokenizer that is used by this model
- train(mode: bool = True) T
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters:
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns:
self
- Return type:
- property transformers_model: PreTrainedModel | None
Property to get the underlying transformers PreTrainedModel instance, if it exists. Note that it’s possible for a model to have multiple underlying transformers models, but this property will return the first one it finds in the module hierarchy.
Note
This property can also return e.g. ORTModelForFeatureExtraction or OVModelForFeatureExtraction instances from the optimum-intel and optimum-onnx libraries, if the model is loaded using
backend="onnx"orbackend="openvino".- Returns:
The underlying transformers model or None if not found.
- Return type:
PreTrainedModel or None
CrossEncoderModelCardData
- class sentence_transformers.cross_encoder.model_card.CrossEncoderModelCardData(language: str | list[str] | None = <factory>, license: str | None = None, model_name: str | None = None, model_id: str | None = None, train_datasets: list[dict[str, str]] = <factory>, eval_datasets: list[dict[str, str]] = <factory>, task_name: str | None = None, tags: list[str] = <factory>, local_files_only: bool = False, generate_widget_examples: bool = True)[source]
A dataclass storing data used in the model card.
- Parameters:
language (Optional[Union[str, List[str]]]) – The model language, either a string or a list, e.g. “en” or [“en”, “de”, “nl”]
license (Optional[str]) – The license of the model, e.g. “apache-2.0”, “mit”, or “cc-by-nc-sa-4.0”
model_name (Optional[str]) – The pretty name of the model, e.g. “CrossEncoder based on answerdotai/ModernBERT-base”.
model_id (Optional[str]) – The model ID when pushing the model to the Hub, e.g. “tomaarsen/ce-mpnet-base-ms-marco”.
train_datasets (List[Dict[str, str]]) – A list of the names and/or Hugging Face dataset IDs of the training datasets. e.g. [{“name”: “SNLI”, “id”: “stanfordnlp/snli”}, {“name”: “MultiNLI”, “id”: “nyu-mll/multi_nli”}, {“name”: “STSB”}]
eval_datasets (List[Dict[str, str]]) – A list of the names and/or Hugging Face dataset IDs of the evaluation datasets. e.g. [{“name”: “SNLI”, “id”: “stanfordnlp/snli”}, {“id”: “mteb/stsbenchmark-sts”}]
task_name (str) – The human-readable task the model is trained on, e.g. “semantic search and paraphrase mining”.
tags (Optional[List[str]]) – A list of tags for the model, e.g. [“sentence-transformers”, “cross-encoder”].
local_files_only (bool) – If True, don’t attempt to find dataset or base model information on the Hub. Defaults to False.
Tip
Install codecarbon to automatically track carbon emission usage and include it in your model cards.
Example:
>>> model = CrossEncoder( ... "microsoft/mpnet-base", ... model_card_data=CrossEncoderModelCardData( ... model_id="tomaarsen/ce-mpnet-base-allnli", ... train_datasets=[{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}], ... eval_datasets=[{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}], ... license="apache-2.0", ... language="en", ... ), ... )