Datasets¶

This folder contains some example datasets that can be used to for training and evaluation of sentence embeddings methods.

To download these datasets, run:

python get_data.py

It will download the datasets and unzip them into this directory.

AllNLI Dataset¶

The AllNLI dataset is the concatenation of the SNLI dataset (https://nlp.stanford.edu/projects/snli/) and the MultiNLI dataset (https://www.nyu.edu/projects/bowman/multinli/).

STS Benchmark¶

The STS Benchmark (http://ixa2.si.ehu.eus/stswiki) contains sentence pairs with human gold score for their similarity.

Name	Last modified	Size

Parent Directory		-
README.html	2022-06-30 21:32	8.1K

Index of /examples/datasets

Datasets¶

AllNLI Dataset¶

STS Benchmark¶