research data . Dataset . 2020 . Embargo end date: 19 Jun 2020

COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons

Barančíková, Petra; Bojar, Ondřej;
Open Access
  • Published: 15 Jun 2020
  • Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of Costra 1.0, which was extended with more sentences and sentence comparisons.
Funded by
EC| Bergamot
Browser-based Multilingual Translation
  • Funder: European Commission (EC)
  • Project Code: 825303
  • Funding stream: H2020 | RIA
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue