research data . Dataset . 2018

Spanish 3B words Word2Vec Embeddings

Aitor Almeida; Aritz Bilbao;
Open Access Spanish
  • Published: 06 Sep 2018
  • Publisher: Zenodo
Abstract
<p>Ready to use gensim Word2Vec embedding models for the Spanish language. Models are created using a window of +/- 5 words, discarding those words with less than 5 instances and creating a vector of 400 dimensions for each word. The text used to create the embeddings has been recovered from news, Wikipedia, the Spanish BOE, web crawling and open literary sources.&nbsp; The used text has a total of 3.257.329.900 words and 18.852.481.207 characters.</p> <p>We support two types of models: Gensim full models (complete_model.zip) and KeyedVectors (keyed_vectors.zip). You can check the differences between them in the following URL: <a href="https://radimrehurek.com/g...
Subjects
free text keywords: word2vec, word embeddings, natural language processing, gensim, nlp, spanish
Communities
Digital Humanities and Cultural Heritage
Download fromView all 3 versions
Zenodo
Dataset . 2018
Provider: Datacite
Zenodo
Dataset . 2018
Provider: Zenodo
Zenodo
Dataset . 2018
Provider: Datacite
Zenodo
Dataset . 2018
Provider: Datacite
Any information missing or wrong?Report an Issue