Datasets

This folder contains some example datasets that can be used to for training and evaluation of sentence embeddings methods.

To download these datasets, run:

python get_data.py

It will download the datasets and unzip them into this directory.

AllNLI Dataset

The AllNLI dataset is the concatenation of the SNLI dataset (https://nlp.stanford.edu/projects/snli/) and the MultiNLI dataset (https://www.nyu.edu/projects/bowman/multinli/).

STS Benchmark

The STS Benchmark (http://ixa2.si.ehu.eus/stswiki) contains sentence pairs with human gold score for their similarity.