publication . Conference object . Article . Preprint . 2018

Document-Level Neural Machine Translation with Hierarchical Attention Networks

Lesly Miculicich; Dhananjay Ram; Nikolaos Pappas; James Henderson;
Open Access
  • Published: 14 Dec 2018
Abstract
Neural Machine Translation (NMT) can be improved by including document-level contextual information. For this purpose, we propose a hierarchical attention model to capture the context in a structured and dynamic manner. The model is integrated in the original NMT architecture as another level of abstraction, conditioning on the NMT model's own previous hidden states. Experiments show that hierarchical attention significantly improves the BLEU score over a strong NMT baseline with the state-of-the-art in context-aware methods, and that both the encoder and decoder benefit from context in complementary ways.
Subjects
free text keywords: Computer Science - Computation and Language, Architecture, Attention model, Contextual information, Computer science, Encoder, Natural language processing, computer.software_genre, computer, Document level, Abstraction, Artificial intelligence, business.industry, business, Machine translation
Funded by
EC| SUMMA
Project
SUMMA
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
Validated by funder
Communities
Digital Humanities and Cultural Heritage
Download fromView all 6 versions
OpenAIRE
Preprint . 2018
Provider: OpenAIRE
Zenodo
Conference object . 2018
Provider: Datacite
48 references, page 1 of 4

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations, San Diego, USA.

Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. 2018. Evaluating discourse phenomena in neural machine translation. In Proceedings of the 16th Annual Conference of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, USA. Association for Computational Linguistics.

Steven Bird. 2006. Nltk: The natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69-72. Association for Computational Linguistics.

Ondrˇej Bojar, Christian Buck, Chris Callison-Burch, Christian Federmann, Barry Haddow, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia. 2013. Findings of the 2013 Workshop on Statistical Machine Translation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 1-44, Sofia, Bulgaria. Association for Computational Linguistics.

Leo Born, Mohsen Mesgar, and Michael Strube. 2017. Using a graph-based coherence model in documentlevel machine translation. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 26-35, Copenhagen, Denmark. Association for Computational Linguistics.

Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), pages 261-268, Trento, Italy.

Mauro Cettolo, Jan Niehues, Sebastian Stu¨ker, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2015. The IWSLT 2015 evaluation campaign. In In proceedins of the International Workshop on Spoken Language Translation.

Peter W Foltz, Walter Kintsch, and Thomas K Landauer. 1998. The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2-3):285-307.

Zhengxian Gong, Min Zhang, and Guodong Zhou. 2011. Cache-based document-level statistical machine translation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 909-919, Edinburgh, Scotland, UK. Association for Computational Linguistics.

Christian Hardmeier. 2012. Discourse in statistical machine translation. a survey and a case study. Discours. Revue de linguistique, psycholinguistique et informatique. A journal of linguistics, psycholinguistics and computational linguistics, (11). [OpenAIRE]

Christian Hardmeier, Sara Stymne, Jo¨rg Tiedemann, and Joakim Nivre. 2013. cder for phrase-based statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 193-198, Sofia, Bulgaria. Association for Computational Linguistics. [OpenAIRE]

Sebastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. 2017. Does neural machine translation benefit from larger context? arXiv preprint arXiv:1704.05135. [OpenAIRE]

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proc. ACL.

Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP 2004, pages 388-395, Barcelona, Spain. Association for Computational Linguistics.

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177-180, Prague, Czech Republic. Association for Computational Linguistics.

48 references, page 1 of 4
Abstract
Neural Machine Translation (NMT) can be improved by including document-level contextual information. For this purpose, we propose a hierarchical attention model to capture the context in a structured and dynamic manner. The model is integrated in the original NMT architecture as another level of abstraction, conditioning on the NMT model's own previous hidden states. Experiments show that hierarchical attention significantly improves the BLEU score over a strong NMT baseline with the state-of-the-art in context-aware methods, and that both the encoder and decoder benefit from context in complementary ways.
Subjects
free text keywords: Computer Science - Computation and Language, Architecture, Attention model, Contextual information, Computer science, Encoder, Natural language processing, computer.software_genre, computer, Document level, Abstraction, Artificial intelligence, business.industry, business, Machine translation
Funded by
EC| SUMMA
Project
SUMMA
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
Validated by funder
Communities
Digital Humanities and Cultural Heritage
Download fromView all 6 versions
OpenAIRE
Preprint . 2018
Provider: OpenAIRE
Zenodo
Conference object . 2018
Provider: Datacite
48 references, page 1 of 4

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations, San Diego, USA.

Rachel Bawden, Rico Sennrich, Alexandra Birch, and Barry Haddow. 2018. Evaluating discourse phenomena in neural machine translation. In Proceedings of the 16th Annual Conference of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, USA. Association for Computational Linguistics.

Steven Bird. 2006. Nltk: The natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69-72. Association for Computational Linguistics.

Ondrˇej Bojar, Christian Buck, Chris Callison-Burch, Christian Federmann, Barry Haddow, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia. 2013. Findings of the 2013 Workshop on Statistical Machine Translation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 1-44, Sofia, Bulgaria. Association for Computational Linguistics.

Leo Born, Mohsen Mesgar, and Michael Strube. 2017. Using a graph-based coherence model in documentlevel machine translation. In Proceedings of the Third Workshop on Discourse in Machine Translation, pages 26-35, Copenhagen, Denmark. Association for Computational Linguistics.

Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), pages 261-268, Trento, Italy.

Mauro Cettolo, Jan Niehues, Sebastian Stu¨ker, Luisa Bentivogli, Roldano Cattoni, and Marcello Federico. 2015. The IWSLT 2015 evaluation campaign. In In proceedins of the International Workshop on Spoken Language Translation.

Peter W Foltz, Walter Kintsch, and Thomas K Landauer. 1998. The measurement of textual coherence with latent semantic analysis. Discourse processes, 25(2-3):285-307.

Zhengxian Gong, Min Zhang, and Guodong Zhou. 2011. Cache-based document-level statistical machine translation. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 909-919, Edinburgh, Scotland, UK. Association for Computational Linguistics.

Christian Hardmeier. 2012. Discourse in statistical machine translation. a survey and a case study. Discours. Revue de linguistique, psycholinguistique et informatique. A journal of linguistics, psycholinguistics and computational linguistics, (11). [OpenAIRE]

Christian Hardmeier, Sara Stymne, Jo¨rg Tiedemann, and Joakim Nivre. 2013. cder for phrase-based statistical machine translation. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 193-198, Sofia, Bulgaria. Association for Computational Linguistics. [OpenAIRE]

Sebastien Jean, Stanislas Lauly, Orhan Firat, and Kyunghyun Cho. 2017. Does neural machine translation benefit from larger context? arXiv preprint arXiv:1704.05135. [OpenAIRE]

Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander M. Rush. 2017. OpenNMT: Open-source toolkit for neural machine translation. In Proc. ACL.

Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of EMNLP 2004, pages 388-395, Barcelona, Spain. Association for Computational Linguistics.

Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pages 177-180, Prague, Czech Republic. Association for Computational Linguistics.

48 references, page 1 of 4
Any information missing or wrong?Report an Issue