publication . Conference object . Preprint . 2020

Shape of Synth to Come: Why We Should Use Synthetic Data for English Surface Realization

Henry Elder; Robert Burke; Alexander O'Connor; Jennifer Foster;
Open Access
  • Published: 29 Jul 2020
  • Publisher: Association for Computational Linguistics
Abstract
The Surface Realization Shared Tasks of 2018 and 2019 were Natural Language Generation shared tasks with the goal of exploring approaches to surface realization from Universal-Dependency-like trees to surface strings for several languages. In the 2018 shared task there was very little difference in the absolute performance of systems trained with and without additional, synthetically created data, and a new rule prohibiting the use of synthetic data was introduced for the 2019 shared task. Contrary to the findings of the 2018 shared task, we show, in experiments on the English 2018 dataset, that the use of synthetic data can have a substantial positive effect - ...
Persistent Identifiers
Subjects
free text keywords: Computer Science - Computation and Language, Artificial intelligence, business.industry, business, Synthetic data, Natural language generation, Natural language processing, computer.software_genre, computer, Computer science
Related Organizations
Funded by
SFI| ADAPT: Centre for Digital Content Platform Research
Project
  • Funder: Science Foundation Ireland (SFI)
  • Project Code: 13/RC/2106
  • Funding stream: SFI Research Centres
Communities
Digital Humanities and Cultural Heritage
23 references, page 1 of 2

Bernd Bohnet, Leo Wanner, Simon Mille, and Alicia Burga. 2010. Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 98-106, Beijing, China. Coling 2010 Organizing Committee.

Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, and Emiel Krahmer. 2019. Neural datato-text generation: A comparison between pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 552-562, Stroudsburg, PA, USA. Association for Computational Linguistics.

Henry Elder, Jennifer Foster, James Barry, and Alexander OConnor. 2019. Designing a Symbolic Intermediate Representation for Neural Surface Realization. In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pages 65-73, Stroudsburg, PA, USA. Association for Computational Linguistics.

Henry Elder and Chris Hokamp. 2018. Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models. In Proceedings of the First Workshop on Multilingual Surface Realisation, pages 49-53, Stroudsburg, PA, USA. Association for Computational Linguistics. [OpenAIRE]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems 28, pages 1693-1701. Curran Associates, Inc.

Juraj Juraska, Panagiotis Karagiannis, Kevin Bowden, and Marilyn Walker. 2018. A Deep Ensemble Model with Slot Alignment for Sequence-toSequence Natural Language Generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 152-162, Stroudsburg, PA, USA. Association for Computational Linguistics.

David King and Michael White. 2018. The OSU Realizer for SRST 18: Neural Sequence-to-Sequence Inflection and Incremental Locality-Based Linearization. In Proceedings of the First Workshop on Multilingual Surface Realisation, 2009, pages 39-48, Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Translation. In Proceedings of ACL 2017, System Demonstrations, pages 67-72, Stroudsburg, PA, USA. Association for Computational Linguistics.

Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, and Luke Zettlemoyer. 2017. Neural AMR: Sequence-to-Sequence Models for Parsing and Generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 146-157, Stroudsburg, PA, USA. Association for Computational Linguistics.

Percy Liang, Michael I. Jordan, and Dan Klein. 2009. Learning Semantic Correspondences with Less Supervision. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, (August):91-99.

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. 2014. The fStanfordg fCoreNLPg Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55-60.

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2017. Pointer Sentinel Mixture Models. In 5th International Conference on Learning Representations, fICLRg 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.

Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, Emily Pitler, and Leo Wanner. 2018. The First Multilingual Surface Realisation Shared Task (SR'18): Overview and Evaluation Results. In Proceedings of the 1st Workshop on Multilingual Surface Realisation (MSR), 56th Annual Meeting of the Association for Computational Linguistics, pages 1- 10, Melbourne, Australia.

Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, and Leo Wanner. 2019. The Second Multilingual Surface Realisation Shared Task (SR19): Overview and Evaluation Results. In Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019), Msr, pages 1-17, Stroudsburg, PA, USA. Association for Computational Linguistics.

Amit Moryossef, Yoav Goldberg, and Ido Dagan. 2019. Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation. In Proceedings of the 2019 Conference of the North, pages 2267- 2277, Stroudsburg, PA, USA. Association for Computational Linguistics. [OpenAIRE]

Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. compare-mt: A Tool for Holistic Comparison of Language Generation Systems. In Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) Demo Track, Minneapolis, USA. [OpenAIRE]

23 references, page 1 of 2
Related research
Abstract
The Surface Realization Shared Tasks of 2018 and 2019 were Natural Language Generation shared tasks with the goal of exploring approaches to surface realization from Universal-Dependency-like trees to surface strings for several languages. In the 2018 shared task there was very little difference in the absolute performance of systems trained with and without additional, synthetically created data, and a new rule prohibiting the use of synthetic data was introduced for the 2019 shared task. Contrary to the findings of the 2018 shared task, we show, in experiments on the English 2018 dataset, that the use of synthetic data can have a substantial positive effect - ...
Persistent Identifiers
Subjects
free text keywords: Computer Science - Computation and Language, Artificial intelligence, business.industry, business, Synthetic data, Natural language generation, Natural language processing, computer.software_genre, computer, Computer science
Related Organizations
Funded by
SFI| ADAPT: Centre for Digital Content Platform Research
Project
  • Funder: Science Foundation Ireland (SFI)
  • Project Code: 13/RC/2106
  • Funding stream: SFI Research Centres
Communities
Digital Humanities and Cultural Heritage
23 references, page 1 of 2

Bernd Bohnet, Leo Wanner, Simon Mille, and Alicia Burga. 2010. Broad Coverage Multilingual Deep Sentence Generation with a Stochastic Multi-Level Realizer. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 98-106, Beijing, China. Coling 2010 Organizing Committee.

Thiago Castro Ferreira, Chris van der Lee, Emiel van Miltenburg, and Emiel Krahmer. 2019. Neural datato-text generation: A comparison between pipeline and end-to-end architectures. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 552-562, Stroudsburg, PA, USA. Association for Computational Linguistics.

Henry Elder, Jennifer Foster, James Barry, and Alexander OConnor. 2019. Designing a Symbolic Intermediate Representation for Neural Surface Realization. In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation, pages 65-73, Stroudsburg, PA, USA. Association for Computational Linguistics.

Henry Elder and Chris Hokamp. 2018. Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models. In Proceedings of the First Workshop on Multilingual Surface Realisation, pages 49-53, Stroudsburg, PA, USA. Association for Computational Linguistics. [OpenAIRE]

Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems 28, pages 1693-1701. Curran Associates, Inc.

Juraj Juraska, Panagiotis Karagiannis, Kevin Bowden, and Marilyn Walker. 2018. A Deep Ensemble Model with Slot Alignment for Sequence-toSequence Natural Language Generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 152-162, Stroudsburg, PA, USA. Association for Computational Linguistics.

David King and Michael White. 2018. The OSU Realizer for SRST 18: Neural Sequence-to-Sequence Inflection and Incremental Locality-Based Linearization. In Proceedings of the First Workshop on Multilingual Surface Realisation, 2009, pages 39-48, Guillaume Klein, Yoon Kim, Yuntian Deng, Jean Senellart, and Alexander Rush. 2017. OpenNMT: Open-Source Toolkit for Neural Machine Translation. In Proceedings of ACL 2017, System Demonstrations, pages 67-72, Stroudsburg, PA, USA. Association for Computational Linguistics.

Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, and Luke Zettlemoyer. 2017. Neural AMR: Sequence-to-Sequence Models for Parsing and Generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 146-157, Stroudsburg, PA, USA. Association for Computational Linguistics.

Percy Liang, Michael I. Jordan, and Dan Klein. 2009. Learning Semantic Correspondences with Less Supervision. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, (August):91-99.

Christopher D Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J Bethard, and David McClosky. 2014. The fStanfordg fCoreNLPg Natural Language Processing Toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55-60.

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2017. Pointer Sentinel Mixture Models. In 5th International Conference on Learning Representations, fICLRg 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net.

Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, Emily Pitler, and Leo Wanner. 2018. The First Multilingual Surface Realisation Shared Task (SR'18): Overview and Evaluation Results. In Proceedings of the 1st Workshop on Multilingual Surface Realisation (MSR), 56th Annual Meeting of the Association for Computational Linguistics, pages 1- 10, Melbourne, Australia.

Simon Mille, Anja Belz, Bernd Bohnet, Yvette Graham, and Leo Wanner. 2019. The Second Multilingual Surface Realisation Shared Task (SR19): Overview and Evaluation Results. In Proceedings of the 2nd Workshop on Multilingual Surface Realisation (MSR 2019), Msr, pages 1-17, Stroudsburg, PA, USA. Association for Computational Linguistics.

Amit Moryossef, Yoav Goldberg, and Ido Dagan. 2019. Step-by-Step: Separating Planning from Realization in Neural Data-to-Text Generation. In Proceedings of the 2019 Conference of the North, pages 2267- 2277, Stroudsburg, PA, USA. Association for Computational Linguistics. [OpenAIRE]

Graham Neubig, Zi-Yi Dou, Junjie Hu, Paul Michel, Danish Pruthi, and Xinyi Wang. 2019. compare-mt: A Tool for Holistic Comparison of Language Generation Systems. In Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) Demo Track, Minneapolis, USA. [OpenAIRE]

23 references, page 1 of 2
Related research
Any information missing or wrong?Report an Issue