publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . 2016

The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT

Marcin Junczys-Dowmunt; Tomasz Dwojak; Rico Sennrich;
Open Access English
  • Published: 16 May 2016
  • Publisher: Association for Computational Linguistics
  • Country: United Kingdom
Abstract
This paper describes the AMU-UEDIN submissions to the WMT 2016 shared task on news translation. We explore methods of decode-time integration of<br/>attention-based neural translation models with phrase-based statistical machine<br/>translation. Efficient batch-algorithms for GPU-querying are proposed and implemented. For English-Russian, our system stays behind the state-of-the-art pure neural models in terms of BLEU. Among restricted systems, manual evaluation places it in the first cluster tied with the pure neural model. For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1.1 BLEU points and...
Subjects
free text keywords: Computer Science - Computation and Language, Machine translation software usability, Speech recognition, Phrase, Evaluation of machine translation, Computer science, Neural system, Natural language processing, computer.software_genre, computer, Machine translation, Artificial intelligence, business.industry, business, BLEU
Funded by
EC| SUMMA
Project
SUMMA
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
,
EC| TraMOOC
Project
TraMOOC
Translation for Massive Open Online Courses
  • Funder: European Commission (EC)
  • Project Code: 644333
  • Funding stream: H2020 | IA
Communities
Digital Humanities and Cultural Heritage
Download fromView all 4 versions
Edinburgh Research Explorer
Contribution for newspaper or weekly magazine . 2016
OpenAIRE
Preprint . 2016
Provider: OpenAIRE

[Alkhouli et al.2015] Tamer Alkhouli, Felix Rietig, and Hermann Ney. 2015. Investigations on phrasebased decoding with recurrent neural network language and translation models. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 294-303, Lisbon, Portugal, September. Association for Computational Linguistics. 6The neural network lore seems to suggest that this should not work, as neural networks are non-linear models. We only found one paper with evidence to the contrary: Utans (1996) [Bahdanau et al.2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR).

[Cherry and Foster2012] Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT '12, pages 427- 436, Stroudsburg, PA, USA. Association for Computational Linguistics.

[Durrani et al.2013] Nadir Durrani, Alexander Fraser, Helmut Schmid, Hieu Hoang, and Philipp Koehn. 2013. Can Markov models over minimal translation units help phrase-based SMT? In ACL, pages 399- 405. The Association for Computer Linguistics.

[Durrani et al.2014] Nadir Durrani, Hassan Sajjad, Hieu Hoang, and Philipp Koehn. 2014. Integrating an unsupervised transliteration model into statistical machine translation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26-30, 2014, Gothenburg, Sweden, pages 148- 153.

[Heafield et al.2013] Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the ACL, pages 690-696.

[Koehn et al.2007] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL, pages 177-180. ACL.

[Koehn2010] Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York, NY, USA, 1st edition.

[Mikolov et al.2013] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.

[Pascanu et al.2013] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pages 1310-1318, , Atlanta, GA, USA.

[Sennrich et al.2015a] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015a. Improving Neural Machine Translation Models with Monolingual Data. ArXiv e-prints, November.

[Sennrich et al.2015b] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015b. Neural Machine Translation of Rare Words with Subword Units. CoRR, abs/1508.07909.

[Sennrich et al.2016] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh Neural Machine Translation Systems for WMT 16. In Proc. of the Conference on Machine Translation (WMT), Berlin, Germany.

[Utans1996] Joachim Utans. 1996. Weight averaging for neural networks and local resampling schemes. In Proc. AAAI-96 Workshop on Integrating Multiple Learned Models, pages 133-138. AAAI Press.

[Zeiler2012] Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. CoRR, abs/1212.5701.

Abstract
This paper describes the AMU-UEDIN submissions to the WMT 2016 shared task on news translation. We explore methods of decode-time integration of<br/>attention-based neural translation models with phrase-based statistical machine<br/>translation. Efficient batch-algorithms for GPU-querying are proposed and implemented. For English-Russian, our system stays behind the state-of-the-art pure neural models in terms of BLEU. Among restricted systems, manual evaluation places it in the first cluster tied with the pure neural model. For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1.1 BLEU points and...
Subjects
free text keywords: Computer Science - Computation and Language, Machine translation software usability, Speech recognition, Phrase, Evaluation of machine translation, Computer science, Neural system, Natural language processing, computer.software_genre, computer, Machine translation, Artificial intelligence, business.industry, business, BLEU
Funded by
EC| SUMMA
Project
SUMMA
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
,
EC| TraMOOC
Project
TraMOOC
Translation for Massive Open Online Courses
  • Funder: European Commission (EC)
  • Project Code: 644333
  • Funding stream: H2020 | IA
Communities
Digital Humanities and Cultural Heritage
Download fromView all 4 versions
Edinburgh Research Explorer
Contribution for newspaper or weekly magazine . 2016
OpenAIRE
Preprint . 2016
Provider: OpenAIRE

[Alkhouli et al.2015] Tamer Alkhouli, Felix Rietig, and Hermann Ney. 2015. Investigations on phrasebased decoding with recurrent neural network language and translation models. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 294-303, Lisbon, Portugal, September. Association for Computational Linguistics. 6The neural network lore seems to suggest that this should not work, as neural networks are non-linear models. We only found one paper with evidence to the contrary: Utans (1996) [Bahdanau et al.2015] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR).

[Cherry and Foster2012] Colin Cherry and George Foster. 2012. Batch tuning strategies for statistical machine translation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT '12, pages 427- 436, Stroudsburg, PA, USA. Association for Computational Linguistics.

[Durrani et al.2013] Nadir Durrani, Alexander Fraser, Helmut Schmid, Hieu Hoang, and Philipp Koehn. 2013. Can Markov models over minimal translation units help phrase-based SMT? In ACL, pages 399- 405. The Association for Computer Linguistics.

[Durrani et al.2014] Nadir Durrani, Hassan Sajjad, Hieu Hoang, and Philipp Koehn. 2014. Integrating an unsupervised transliteration model into statistical machine translation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26-30, 2014, Gothenburg, Sweden, pages 148- 153.

[Heafield et al.2013] Kenneth Heafield, Ivan Pouzyrevsky, Jonathan H. Clark, and Philipp Koehn. 2013. Scalable modified Kneser-Ney language model estimation. In Proceedings of the 51st Annual Meeting of the ACL, pages 690-696.

[Koehn et al.2007] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondˇrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL, pages 177-180. ACL.

[Koehn2010] Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York, NY, USA, 1st edition.

[Mikolov et al.2013] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781.

[Pascanu et al.2013] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pages 1310-1318, , Atlanta, GA, USA.

[Sennrich et al.2015a] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015a. Improving Neural Machine Translation Models with Monolingual Data. ArXiv e-prints, November.

[Sennrich et al.2015b] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2015b. Neural Machine Translation of Rare Words with Subword Units. CoRR, abs/1508.07909.

[Sennrich et al.2016] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Edinburgh Neural Machine Translation Systems for WMT 16. In Proc. of the Conference on Machine Translation (WMT), Berlin, Germany.

[Utans1996] Joachim Utans. 1996. Weight averaging for neural networks and local resampling schemes. In Proc. AAAI-96 Workshop on Integrating Multiple Learned Models, pages 133-138. AAAI Press.

[Zeiler2012] Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. CoRR, abs/1212.5701.

Any information missing or wrong?Report an Issue