Actions
  • shareshare
  • link
  • cite
  • add
add
auto_awesome_motion View all 5 versions
Publication . Conference object . 2021

Decentralized Word2Vec Using Gossip Learning

Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;
Open Access
English
Published: 02 Jun 2021
Publisher: Zenodo
Country: Sweden
Abstract

Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training. QC 20210423

Subjects

gossip learning, natural language processing, decentralized machine learning, word2vec, Computer Sciences, Datavetenskap (datalogi)

Related Organizations
Funded by
EC| RAIS
Project
RAIS
RAIS: Real-time Analytics for the Internet of Sports
  • Funder: European Commission (EC)
  • Project Code: 813162
  • Funding stream: H2020 | MSCA-ITN-ETN
Related to Research communities
Digital Humanities and Cultural Heritage
moresidebar