software . 2020 . Embargo end date: 17 Jun 2020

CroSloEngual BERT

Ulčar, Matej; Robnik-Šikonja, Marko;
Open Access
  • Published: 16 Jun 2020
  • Publisher: Faculty of Computer and Information Science, University of Ljubljana
Abstract
Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by finetuning the model end-to-end. CroSloEngual BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library). Changes in version 1.1: fixed vocab.txt file, as previous verson had an error causing very bad results during fine-tuning and/or evaluation.
Persistent Identifiers
Funded by
EC| EMBEDDIA
Project
EMBEDDIA
Cross-Lingual Embeddings for Less-Represented Languages in European News Media
  • Funder: European Commission (EC)
  • Project Code: 825153
  • Funding stream: H2020 | RIA
Communities
Digital Humanities and Cultural Heritage
Any information missing or wrong?Report an Issue