publication . Preprint . 2019

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

Kuratov, Yuri; Arkhipov, Mikhail;
Open Access English
  • Published: 17 May 2019
Abstract
The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension, natural language inference, and sentiment analysis. At the moment there are two alternative approaches to train such models: monolingual and multilingual. While language specific models show superior performance, multilingual models allow to perform a transfer from one language to another and solve tasks for different languages simultaneously. This work shows that transfer learning from a multilingual model to monolingual mod...
Subjects
ACM Computing Classification System: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
free text keywords: Computer Science - Computation and Language
Download from
17 references, page 1 of 2

[1] Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

[2] Burtsev, M., Seliverstov, A., Airapetyan, R., Arkhipov, M., Baymurzina, D., Bushkov, N., Gureenkova, O., Khakhulin, T., Kuratov, Y., Kuznetsov, D., et al. (2018). Deeppavlov: Open-source library for dialogue systems. Proceedings of ACL 2018, System Demonstrations, pages 122{127.

[3] Dai, A. M. and Le, Q. V. (2015). Semi-supervised sequence learning. In Advances in neural information processing systems, pages 3079{3087.

[4] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[5] Hu, M., Peng, Y., Huang, Z., Qiu, X., Wei, F., and Zhou, M. (2017). Reinforced mnemonic reader for machine reading comprehension. arXiv preprint arXiv:1705.02798.

[6] Kravchenko, D. (2017). Paraphrase detection using machine translation and textual similarity algorithms. In Conference on Arti cial Intelligence and Natural Language, pages 277{292. Springer.

[7] Luong, M.-T., Pham, H., and Manning, C. D. (2015). E ective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.

[8] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). E cient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[9] Mulcaire, P., Swayamdipta, S., and Smith, N. (2018). Polyglot semantic role labeling. arXiv preprint arXiv:1805.11598.

[10] Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proc. of NAACL.

[11] Pivovarova, L., Pronoza, E., Yagunova, E., and Pronoza, A. (2017). Paraphraser: Russian paraphrase corpus and shared task. In Conference on Articial Intelligence and Natural Language, pages 211{225. Springer. [OpenAIRE]

[12] Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.

[13] Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., and Gribov, A. (2018). Rusentiment: An enriched sentiment analysis dataset for social media in russian. In Proceedings of the 27th International Conference on Computational Linguistics, pages 755{763.

[14] Sennrich, R., Haddow, B., and Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.

[15] Seo, M., Kembhavi, A., Farhadi, A., and Hajishirzi, H. (2016). Bidirectional attention ow for machine comprehension. arXiv preprint arXiv:1611.01603. [OpenAIRE]

17 references, page 1 of 2
Abstract
The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension, natural language inference, and sentiment analysis. At the moment there are two alternative approaches to train such models: monolingual and multilingual. While language specific models show superior performance, multilingual models allow to perform a transfer from one language to another and solve tasks for different languages simultaneously. This work shows that transfer learning from a multilingual model to monolingual mod...
Subjects
ACM Computing Classification System: InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL
free text keywords: Computer Science - Computation and Language
Download from
17 references, page 1 of 2

[1] Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

[2] Burtsev, M., Seliverstov, A., Airapetyan, R., Arkhipov, M., Baymurzina, D., Bushkov, N., Gureenkova, O., Khakhulin, T., Kuratov, Y., Kuznetsov, D., et al. (2018). Deeppavlov: Open-source library for dialogue systems. Proceedings of ACL 2018, System Demonstrations, pages 122{127.

[3] Dai, A. M. and Le, Q. V. (2015). Semi-supervised sequence learning. In Advances in neural information processing systems, pages 3079{3087.

[4] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

[5] Hu, M., Peng, Y., Huang, Z., Qiu, X., Wei, F., and Zhou, M. (2017). Reinforced mnemonic reader for machine reading comprehension. arXiv preprint arXiv:1705.02798.

[6] Kravchenko, D. (2017). Paraphrase detection using machine translation and textual similarity algorithms. In Conference on Arti cial Intelligence and Natural Language, pages 277{292. Springer.

[7] Luong, M.-T., Pham, H., and Manning, C. D. (2015). E ective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.

[8] Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). E cient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[9] Mulcaire, P., Swayamdipta, S., and Smith, N. (2018). Polyglot semantic role labeling. arXiv preprint arXiv:1805.11598.

[10] Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proc. of NAACL.

[11] Pivovarova, L., Pronoza, E., Yagunova, E., and Pronoza, A. (2017). Paraphraser: Russian paraphrase corpus and shared task. In Conference on Articial Intelligence and Natural Language, pages 211{225. Springer. [OpenAIRE]

[12] Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.

[13] Rogers, A., Romanov, A., Rumshisky, A., Volkova, S., Gronas, M., and Gribov, A. (2018). Rusentiment: An enriched sentiment analysis dataset for social media in russian. In Proceedings of the 27th International Conference on Computational Linguistics, pages 755{763.

[14] Sennrich, R., Haddow, B., and Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.

[15] Seo, M., Kembhavi, A., Farhadi, A., and Hajishirzi, H. (2016). Bidirectional attention ow for machine comprehension. arXiv preprint arXiv:1611.01603. [OpenAIRE]

17 references, page 1 of 2
Any information missing or wrong?Report an Issue