research data . Dataset . 2021 . Embargo end date: 04 Mar 2021

CORD-19 Software Mentions

Wade, Alex D.; Williams, Ivana;
  • Published: 01 Jan 2021
  • Publisher: Dryad
We have developed a machine learning model to extract mentions of software from scientific articles. The SoftCite dataset was used to train and evaluate the model. This model has been applied to the CORD-19 collection of full-text coronavirus-related research papers. This dataset comprises the output of this model and each scientific article's relevant metadata. Data are derived from the CORD-19 dataset provided by AllenAI, release version 2021-02-08 (changelog cord-19_2021-02-08.tar.gz 7.4GB c5446fea 29f69de2) downloaded from AWS on 08-Feb-2021.
Persistent Identifiers
free text keywords: Research Software, Natural Language Processing, scholarly impact, semi-supervised machine learning, Scholarly communication, Coronavirus (COVID-19), FOS: Computer and information sciences
Digital Humanities and Cultural Heritage
Download from
Dataset . 2021
Provider: Datacite
Any information missing or wrong?Report an Issue