We have developed a machine learning model to extract mentions of software from scientific articles. The SoftCite dataset was used to train and evaluate the model. This model has been applied to the CORD-19 collection of full-text coronavirus-related research papers. This dataset comprises the output of this model and each scientific article's relevant metadata. Data are derived from the CORD-19 dataset provided by AllenAI, release version 2021-02-08 (changelog cord-19_2021-02-08.tar.gz 7.4GB c5446fea 29f69de2) downloaded from AWS on 08-Feb-2021.
free text keywords: Research Software, Natural Language Processing, scholarly impact, semi-supervised machine learning, Scholarly communication, Coronavirus (COVID-19), FOS: Computer and information sciences