• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 6 versions
Publication . Conference object . Preprint . Article . 2018

FEVER: a Large-scale Dataset for Fact Extraction and VERification

James Thorne; Andreas Vlachos; Christos Christodoulopoulos; Arpit Mittal;
Open Access
Published: 14 Mar 2018
Publisher: Association for Computational Linguistics
Country: United Kingdom
In this paper we introduce a new publicly available dataset for verification against textual sources, FEVER: Fact Extraction and VERification. It consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from. The claims are classified as Supported, Refuted or NotEnoughInfo by annotators achieving 0.6841 in Fleiss $\kappa$. For the first two classes, the annotators also recorded the sentence(s) forming the necessary evidence for their judgment. To characterize the challenge of the dataset presented, we develop a pipeline approach and compare it to suitably designed oracles. The best accuracy we achieve on labeling a claim accompanied by the correct evidence is 31.87%, while if we ignore the evidence we achieve 50.91%. Thus we believe that FEVER is a challenging testbed that will help stimulate progress on claim verification against textual sources.
Comment: Updated version of NAACL2018 paper. Data is released on
Subjects by Vocabulary

Microsoft Academic Graph classification: Natural language processing computer.software_genre computer Scale (ratio) Computer science Pipeline (software) Fact extraction Artificial intelligence business.industry business Sentence


Computer Science - Computation and Language, Computation and Language (cs.CL), FOS: Computer and information sciences

27 references, page 1 of 3

Gabor Angeli and Christopher D. Manning. 2014. NaturalLI: Natural logic inference for common sense reasoning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. pages 534-545.

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.

Danqi Chen, Adam Fisch, Jason Weston, and Antoine Bordes. 2017. Reading wikipedia to answer opendomain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, pages 1870-1879.

Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan Roth. 2009. Recognizing textual entailment: Rational, evaluation and approaches. Natural Language Engineering 15(4):i-xvii.

Joe Ellis, Jeremy Getman, Dana Fore, Neil Kuster, Zhiyi Song, Ann Bies, and Stephanie Strassel. 2016. Overview of Linguistic Resources for the TAC KBP 2016 Evaluations : Methodologies and Results. Proceedings of TAC KBP 2016 Workshop, National Institute of Standards and Technology, Maryland, USA (Ldc).

William Ferreira and Andreas Vlachos. 2016. Emergent: a novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California, pages 1163-1168.

Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76(5):378.

Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, and Luke Zettlemoyer. 2017. AllenNLP: A Deep Semantic Natural Language Processing Platform .

Michael Heilman and Noah A. Smith. 2010. Good Question! statistical ranking for question generation. In Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. pages 609-617.

Funded by
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
Related to Research communities
Digital Humanities and Cultural Heritage