research data . Dataset . 2021 . Embargo end date: 24 May 2021

Latvian Delfi article archive (in Latvian and Russian) 1.0

Pollak, Senja; Purver, Matthew; Shekhar, Ravi; Freienthal, Linda; Kuulmets, Hele-Andra; Krustok, Ivar;
Open Access
  • Published: 19 Apr 2021
  • Publisher: Ekspress Meedia Group
Abstract
This dataset is an archive of articles from the Delfi news site from 2015-2019, containing over 180,000 articles (c. 50% in Latvian and 50% in the Russian language). Keywords for articles are included. There are 5 JSON files: lv_2015.json contains 42 001 articles from the year 2015 lv_2016_.json contains 40 342 articles from the year 2016 lv_2017_.json contains 37 256 articles from the year 2017 lv_2018_.json contains 31 732 articles from the year 2018 lv_2019_.json contains 29 070 articles from the year 2019 In sum: 180 401 articles Description of the dataset This JSON file is a list of dictionaries, i.e. each article is represented as a dictionary. Each dictio...
Persistent Identifiers
Funded by
EC| EMBEDDIA
Project
EMBEDDIA
Cross-Lingual Embeddings for Less-Represented Languages in European News Media
  • Funder: European Commission (EC)
  • Project Code: 825153
  • Funding stream: H2020 | RIA
Communities
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue