Datasets¶
This folder contains some example datasets that can be used to for training and evaluation of sentence embeddings methods.
To download these datasets, run:
python get_data.py
It will download the datasets and unzip them into this directory.
AllNLI Dataset¶
The AllNLI dataset is the concatenation of the SNLI dataset (https://nlp.stanford.edu/projects/snli/) and the MultiNLI dataset (https://www.nyu.edu/projects/bowman/multinli/).
STS Benchmark¶
The STS Benchmark (http://ixa2.si.ehu.eus/stswiki) contains sentence pairs with human gold score for their similarity.