Logo

Getting Started

  • Installation
    • Install with uv
    • Install with pip
    • Install with Conda
    • Install from Source
    • Editable Install
    • Install PyTorch with CUDA support
  • Quickstart
    • Sentence Transformer
    • Cross Encoder
    • Sparse Encoder
    • Next Steps
  • Migration Guide
    • Migrating from v5.x to v5.4+
      • Updated import paths
      • Renamed methods and parameters
      • CrossEncoder.max_length property renamed to max_seq_length
      • Trainer tokenizer parameter renamed to processing_class
      • tokenizer_kwargs renamed to processor_kwargs
      • CrossEncoder API changes
      • Removed tags parameter from push_to_hub
      • Default pooling for CausalLM models
      • Changes for custom module and loss authors
    • Migrating from v4.x to v5.x
      • Migration for model.encode
      • Migration for Asym to Router
      • Migration of advanced usage
    • Migrating from v3.x to v4.x
      • Migration for parameters on CrossEncoder initialization and methods
      • Migration for specific parameters from CrossEncoder.fit
      • Migration for CrossEncoder evaluators
    • Migrating from v2.x to v3.x
      • Migration for specific parameters from SentenceTransformer.fit
      • Migration for custom Datasets and DataLoaders used in SentenceTransformer.fit

Sentence Transformer

  • Usage
    • Computing Embeddings
      • Initializing a Sentence Transformer Model
      • Calculating Embeddings
      • Prompt Templates
      • Input Sequence Length
      • Multi-Process / Multi-GPU Encoding
    • Semantic Textual Similarity
      • Similarity Calculation
    • Semantic Search
      • Background
      • Symmetric vs. Asymmetric Semantic Search
      • Manual Implementation
      • Optimized Implementation
      • Speed Optimization
      • Elasticsearch
      • OpenSearch
      • Approximate Nearest Neighbor
      • Retrieve & Re-Rank
      • Examples
    • Retrieve & Re-Rank
      • Retrieve & Re-Rank Pipeline
      • Retrieval: Bi-Encoder
      • Re-Ranker: Cross-Encoder
      • Example Scripts
      • Pre-trained Bi-Encoders (Retrieval)
      • Pre-trained Cross-Encoders (Re-Ranker)
    • Clustering
      • k-Means
      • Agglomerative Clustering
      • Fast Clustering
      • Topic Modeling
    • Paraphrase Mining
      • paraphrase_mining()
    • Translated Sentence Mining
      • Margin Based Mining
      • Examples
    • Image Search
      • Installation
      • Usage
      • Examples
    • Embedding Quantization
      • Binary Quantization
      • Scalar (int8) Quantization
      • Additional extensions
      • Demo
      • Try it yourself
    • Creating Custom Models
      • Modular Architecture
      • Sentence Transformer Model from a Transformers Model
      • Advanced: Custom Modules
    • Evaluation with MTEB
      • Installation
      • Evaluation
      • Additional Arguments
      • Results Handling
      • Leaderboard Submission
    • Speeding up Inference
      • PyTorch
      • ONNX
      • OpenVINO
      • Benchmarks
  • Pretrained Models
    • Original Models
    • Semantic Search Models
      • Multi-QA Models
      • MSMARCO Passage Models
    • Multilingual Models
      • Semantic Similarity Models
      • Bitext Mining
    • Multimodal Models
      • Image & Text Models
      • Audio & Video Models
    • INSTRUCTOR models
    • Scientific Similarity Models
  • Training Overview
    • Why Finetune?
    • Training Components
    • Model
    • Dataset
      • Dataset Format
      • Multimodal Datasets
    • Loss Function
    • Training Arguments
    • Evaluator
    • Trainer
      • Callbacks
    • Multi-Dataset Training
    • Deprecated Training
    • Best Base Embedding Models
    • Comparisons with CrossEncoder Training
  • Dataset Overview
    • Multimodal Datasets
      • Accepted column types
      • Cross-modal dataset example
      • Automatic preprocessing
    • Datasets on the Hugging Face Hub
    • Pre-existing Datasets
  • Loss Overview
    • Loss Table
    • Loss modifiers
    • Regularization
    • Distillation
    • Commonly used Loss Functions
    • Custom Loss Functions
  • Training Examples
    • Semantic Textual Similarity
      • Training data
      • Loss Function
    • Natural Language Inference
      • Data
      • SoftmaxLoss
      • MultipleNegativesRankingLoss
    • Paraphrase Data
      • Pre-Trained Models
    • Quora Duplicate Questions
      • Training
      • MultipleNegativesRankingLoss
      • Pretrained Models
    • MS MARCO
      • Bi-Encoder
    • Matryoshka Embeddings
      • Use Cases
      • Results
      • Training
      • Inference
      • Code Examples
    • Adaptive Layers
      • Use Cases
      • Results
      • Training
      • Inference
      • Code Examples
    • Multilingual Models
      • Extend your own models
      • Training
      • Datasets
      • Sources for Training Data
      • Evaluation
      • Available Pre-trained Models
      • Usage
      • Performance
      • Citation
    • Model Distillation
      • Knowledge Distillation
      • Speed - Performance Trade-Off
      • Dimensionality Reduction
      • Quantization
    • Augmented SBERT
      • Motivation
      • Extend to your own datasets
      • Methodology
      • Scenario 1: Limited or small annotated datasets (few labeled sentence-pairs)
      • Scenario 2: No annotated datasets (Only unlabeled sentence-pairs)
      • Training
      • Citation
    • Training with Prompts
      • What are Prompts?
      • Why would we train with Prompts?
      • How do we train with Prompts?
    • Training with PEFT Adapters
      • Compatibility Methods
      • Adding a New Adapter
      • Loading a Pretrained Adapter
      • Training Script
    • Training with Unsloth
      • Examples in this repository
      • Unsloth Colab notebooks
      • Fine-tuning via FastSentenceTransformer
      • Inference and deployment
      • Benchmarks
    • Multimodal Training
      • Supported Input Types
      • Training
      • References
    • Unsupervised Learning
      • TSDAE
      • SimCSE
      • CT
      • CT (In-Batch Negative Sampling)
      • Masked Language Model (MLM)
      • GenQ
      • GPL
      • Performance Comparison
    • Domain Adaptation
      • Domain Adaptation vs. Unsupervised Learning
      • Adaptive Pre-Training
      • GPL: Generative Pseudo-Labeling
    • Hyperparameter Optimization
      • HPO Components
      • Putting It All Together
      • Example Scripts
    • Distributed Training
      • Comparison
      • FSDP

Cross Encoder

  • Usage
    • Cross-Encoder vs Bi-Encoder
      • Cross-Encoder vs. Bi-Encoder
      • When to use Cross- / Bi-Encoders?
      • Cross-Encoders Usage
      • Combining Bi- and Cross-Encoders
      • Training Cross-Encoders
    • Retrieve & Re-Rank
      • Retrieve & Re-Rank Pipeline
      • Retrieval: Bi-Encoder
      • Re-Ranker: Cross-Encoder
      • Example Scripts
      • Pre-trained Bi-Encoders (Retrieval)
      • Pre-trained Cross-Encoders (Re-Ranker)
    • Creating Custom Models
      • Modular Architecture
      • Advanced: Custom Modules
    • Speeding up Inference
      • PyTorch
      • ONNX
      • OpenVINO
      • Benchmarks
  • Pretrained Models
    • MS MARCO
    • SQuAD (QNLI)
    • STSbenchmark
    • Quora Duplicate Questions
    • NLI
    • Multimodal Rerankers
    • Community Models
  • Training Overview
    • Why Finetune?
    • Training Components
    • Model
    • Dataset
      • Dataset Format
      • Multimodal Datasets
      • Hard Negatives Mining
    • Loss Function
    • Training Arguments
    • Evaluator
    • Trainer
      • Callbacks
    • Multi-Dataset Training
    • Training Tips
    • Deprecated Training
    • Comparisons with SentenceTransformer Training
  • Dataset Overview
    • Multimodal Datasets
      • Accepted column types
      • Cross-modal dataset example
      • Automatic preprocessing
    • Datasets on the Hugging Face Hub
    • Pre-existing Datasets
  • Loss Overview
    • Loss Table
    • Distillation
    • Commonly used Loss Functions
    • Custom Loss Functions
  • Training Examples
    • Semantic Textual Similarity
      • Training data
      • Loss Function
      • Inference
    • Natural Language Inference
      • Data
      • CrossEntropyLoss
      • Inference
    • Quora Duplicate Questions
      • Training
      • Inference
    • MS MARCO
      • Cross Encoder
      • Training Scripts
      • Inference
    • Rerankers
      • BinaryCrossEntropyLoss
      • CachedMultipleNegativesRankingLoss
      • Inference
    • Model Distillation
      • Cross Encoder Knowledge Distillation
      • Inference
    • Multimodal Training
      • Transformer (Any-to-Any) + LogitScore
      • Transformer (Feature Extraction) + Pooling + Dense
      • Comparing the Two Approaches
      • Other Module Chains
      • References
    • Distributed Training
      • Comparison
      • FSDP

Sparse Encoder

  • Usage
    • Prompts
      • Computing Sparse Embeddings
      • Semantic Textual Similarity
      • Semantic Search
      • Retrieve & Re-Rank
      • Sparse Encoder Evaluation
      • Speeding up Inference
  • Pretrained Models
    • Core SPLADE Models
    • Inference-Free SPLADE Models
    • Model Collections
  • Training Overview
    • Why Finetune?
    • Training Components
    • Model
    • Dataset
      • Dataset Format
    • Loss Function
    • Training Arguments
    • Evaluator
    • Trainer
      • Callbacks
    • Multi-Dataset Training
    • Training Tips
  • Dataset Overview
    • Multimodal Datasets
      • Accepted column types
      • Cross-modal dataset example
      • Automatic preprocessing
    • Datasets on the Hugging Face Hub
    • Pre-existing Datasets
  • Loss Overview
    • Sparse specific Loss Functions
      • SPLADE Loss
      • CSR Loss
    • Loss Table
    • Distillation
    • Commonly used Loss Functions
    • Custom Loss Functions
  • Training Examples
    • Model Distillation
      • MarginMSE
    • MS MARCO
      • SparseMultipleNegativesRankingLoss
    • Semantic Textual Similarity
      • Training data
      • Loss Function
    • Natural Language Inference
      • Data
      • SpladeLoss
    • Quora Duplicate Questions
      • Training
    • Information Retrieval
      • SparseMultipleNegativesRankingLoss (MNRL)
      • Inference & Evaluation
    • Distributed Training
      • Comparison
      • FSDP

Package Reference

  • Sentence Transformer
    • SentenceTransformer
      • SentenceTransformer
      • SentenceTransformerModelCardData
    • Trainer
      • SentenceTransformerTrainer
    • Training Arguments
      • SentenceTransformerTrainingArguments
    • Losses
      • BatchAllTripletLoss
      • BatchHardSoftMarginTripletLoss
      • BatchHardTripletLoss
      • BatchSemiHardTripletLoss
      • ContrastiveLoss
      • OnlineContrastiveLoss
      • ContrastiveTensionLoss
      • ContrastiveTensionLossInBatchNegatives
      • CoSENTLoss
      • AnglELoss
      • CosineSimilarityLoss
      • DenoisingAutoEncoderLoss
      • GISTEmbedLoss
      • CachedGISTEmbedLoss
      • GlobalOrthogonalRegularizationLoss
      • MSELoss
      • MarginMSELoss
      • MatryoshkaLoss
      • Matryoshka2dLoss
      • AdaptiveLayerLoss
      • MegaBatchMarginLoss
      • MultipleNegativesRankingLoss
      • CachedMultipleNegativesRankingLoss
      • MultipleNegativesSymmetricRankingLoss
      • CachedMultipleNegativesSymmetricRankingLoss
      • SoftmaxLoss
      • TripletLoss
      • DistillKLDivLoss
    • Evaluation
      • BinaryClassificationEvaluator
      • EmbeddingSimilarityEvaluator
      • InformationRetrievalEvaluator
      • NanoBEIREvaluator
      • MSEEvaluator
      • ParaphraseMiningEvaluator
      • RerankingEvaluator
      • TranslationEvaluator
      • TripletEvaluator
    • Datasets
      • ParallelSentencesDataset
      • SentenceLabelDataset
      • DenoisingAutoEncoderDataset
      • NoDuplicatesDataLoader
    • Modules
      • Main Modules
      • Further Modules
  • Cross Encoder
    • CrossEncoder
      • CrossEncoder
      • CrossEncoderModelCardData
    • Trainer
      • CrossEncoderTrainer
    • Training Arguments
      • CrossEncoderTrainingArguments
    • Losses
      • BinaryCrossEntropyLoss
      • CrossEntropyLoss
      • LambdaLoss
      • ListMLELoss
      • PListMLELoss
      • ListNetLoss
      • MultipleNegativesRankingLoss
      • CachedMultipleNegativesRankingLoss
      • MSELoss
      • MarginMSELoss
      • RankNetLoss
    • Evaluation
      • CrossEncoderRerankingEvaluator
      • CrossEncoderNanoBEIREvaluator
      • CrossEncoderClassificationEvaluator
      • CrossEncoderCorrelationEvaluator
    • Modules
      • LogitScore
  • Sparse Encoder
    • SparseEncoder
      • SparseEncoder
      • SparseEncoderModelCardData
    • Trainer
      • SparseEncoderTrainer
    • Training Arguments
      • SparseEncoderTrainingArguments
    • Losses
      • SpladeLoss
      • CachedSpladeLoss
      • FlopsLoss
      • CSRLoss
      • CSRReconstructionLoss
      • SparseMultipleNegativesRankingLoss
      • SparseMarginMSELoss
      • SparseDistillKLDivLoss
      • SparseTripletLoss
      • SparseCosineSimilarityLoss
      • SparseCoSENTLoss
      • SparseAnglELoss
      • SparseMSELoss
    • Evaluation
      • SparseInformationRetrievalEvaluator
      • SparseNanoBEIREvaluator
      • SparseEmbeddingSimilarityEvaluator
      • SparseBinaryClassificationEvaluator
      • SparseTripletEvaluator
      • SparseRerankingEvaluator
      • SparseTranslationEvaluator
      • SparseMSEEvaluator
      • ReciprocalRankFusionEvaluator
    • Modules
      • SPLADE Pooling
      • SparseAutoEncoder
      • SparseStaticEmbedding
    • Callbacks
      • SpladeRegularizerWeightSchedulerCallback
    • Search Engines
      • semantic_search_elasticsearch()
      • semantic_search_opensearch()
      • semantic_search_qdrant()
      • semantic_search_seismic()
  • Base
    • Model
      • BaseModel
      • BaseModelCardData
    • Trainer
      • BaseTrainer
    • Training Arguments
      • BaseTrainingArguments
    • Samplers
      • BatchSamplers
      • MultiDatasetBatchSamplers
    • Modules
      • Common Modules
      • Base Modules
    • Evaluation
      • BaseEvaluator
      • SequentialEvaluator
  • Utility Functions
    • distributed
      • all_gather()
      • all_gather_with_grad()
    • environment
      • check_package_availability()
      • get_device_name()
      • is_accelerate_available()
      • is_datasets_available()
      • is_training_available()
      • suggest_extra_on_exception()
    • file_io
      • disabled_tqdm
      • http_get()
      • is_sentence_transformer_model()
      • load_dir_path()
      • load_file_path()
    • hard_negatives
      • mine_hard_negatives()
    • misc
      • disable_datasets_caching()
      • disable_logging()
      • fullname()
      • import_from_string()
    • quantization
      • quantize_embeddings()
      • semantic_search_faiss()
      • semantic_search_usearch()
    • retrieval
      • community_detection()
      • information_retrieval()
      • paraphrase_mining()
      • paraphrase_mining_embeddings()
      • semantic_search()
    • similarity
      • SimilarityFunction
      • cos_sim()
      • dot_score()
      • euclidean_sim()
      • manhattan_sim()
      • pairwise_angle_sim()
      • pairwise_cos_sim()
      • pairwise_dot_score()
      • pairwise_euclidean_sim()
      • pairwise_manhattan_sim()
      • pytorch_cos_sim()
    • tensor
      • batch_to_device()
      • compute_count_vector()
      • normalize_embeddings()
      • select_max_active_dims()
      • truncate_embeddings()
Sentence Transformers
  • Utility Functions
  • Edit on GitHub

Utility Functions

  • distributed
    • all_gather()
    • all_gather_with_grad()
  • environment
    • check_package_availability()
    • get_device_name()
    • is_accelerate_available()
    • is_datasets_available()
    • is_training_available()
    • suggest_extra_on_exception()
  • file_io
    • disabled_tqdm
    • http_get()
    • is_sentence_transformer_model()
    • load_dir_path()
    • load_file_path()
  • hard_negatives
    • mine_hard_negatives()
  • misc
    • disable_datasets_caching()
    • disable_logging()
    • fullname()
    • import_from_string()
  • quantization
    • quantize_embeddings()
    • semantic_search_faiss()
    • semantic_search_usearch()
  • retrieval
    • community_detection()
    • information_retrieval()
    • paraphrase_mining()
    • paraphrase_mining_embeddings()
    • semantic_search()
  • similarity
    • SimilarityFunction
      • SimilarityFunction.possible_values()
      • SimilarityFunction.to_similarity_fn()
      • SimilarityFunction.to_similarity_pairwise_fn()
    • cos_sim()
    • dot_score()
    • euclidean_sim()
    • manhattan_sim()
    • pairwise_angle_sim()
    • pairwise_cos_sim()
    • pairwise_dot_score()
    • pairwise_euclidean_sim()
    • pairwise_manhattan_sim()
    • pytorch_cos_sim()
  • tensor
    • batch_to_device()
    • compute_count_vector()
    • normalize_embeddings()
    • select_max_active_dims()
    • truncate_embeddings()
Previous Next

© Copyright 2026.

Built with Sphinx using a theme provided by Read the Docs.