research product . program source code . 2021

X-SRL Dataset and mBERT Word Aligner

Daza, Angel (Leibniz Institute for the German Language / Department of Computational Linguistics, Heidelberg University);
  • Published: 01 Jan 2021
  • Publisher: heiDATA
Abstract
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages.
Subjects
ACM Computing Classification System: ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
free text keywords: Arts and Humanities, Computer and Information Science, SRL, Semantic Role Labeling, annotation projection, multilingual BERT, multilingual semantic role labeling, word alignment, Humanities
Communities
Digital Humanities and Cultural Heritage
Download from
B2FIND
program source code . 2021
Provider: B2FIND
Any information missing or wrong?Report an Issue