publication . Preprint . Conference object . Contribution for newspaper or weekly magazine . 2016

The MGB-2 Challenge: Arabic Multi-Device Broadcast Media Recognition

Ahmed Ali; Peter Bell; James Glass; Yacine Messaoui; Hamdy Mubarak; Steve Renals; Yifan Zhang;
Open Access English
  • Published: 19 Sep 2016
  • Publisher: Institute of Electrical and Electronics Engineers (IEEE)
  • Country: United Kingdom
Abstract
This paper describes the Arabic Multi-Genre Broadcast (MGB-2) Challenge for SLT-2016. Unlike last year's English MGB Challenge, which focused on recognition of diverse TV genres, this year, the challenge has an emphasis on handling the diversity in dialect in Arabic speech. Audio data comes from 19 distinct programmes from the Aljazeera Arabic TV channel between March 2005 and December 2015. Programmes are split into three groups: conversations, interviews, and reports. A total of 1,200 hours have been released with lightly supervised transcriptions for the acoustic modelling. For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera.net for a 10 year duration 2000-2011. Two lexicons have been provided, one phoneme based and one grapheme based. Finally, two tasks were proposed for this year's challenge: standard speech transcription, and word alignment. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained.
Fields of Science and Technology classification (FOS)
02 engineering and technology, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 020206 networking & telecommunications
Subjects
free text keywords: Computer Science - Computation and Language, Metadata, Emphasis (typography), Speech transcription, Process (engineering), Arabic, language.human_language, language, Grapheme, Task (project management), Computer science, Natural language processing, computer.software_genre, computer, Artificial intelligence, business.industry, business, Transcription (linguistics)
Communities
  • Digital Humanities and Cultural Heritage
Funded by
EC| SUMMA
Project
SUMMA
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
Validated by funder

[5] Andreas Stolcke et al. Srilm-an extensible language modeling toolkit. In Interspeech, volume 2002, page 2002, 2002.

[6] Norbert Braunschweiler, Mark JF Gales, and Sabine Buchholz. Lightly supervised recognition for automatic alignment of large coherent speech recordings. In INTERSPEECH, pages 2222-2225, 2010.

[7] Fadi Biadsy, Nizar Habash, and Julia Hirschberg. Improving the arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 397-405. Association for Computational Linguistics, 2009. [OpenAIRE]

Abstract
This paper describes the Arabic Multi-Genre Broadcast (MGB-2) Challenge for SLT-2016. Unlike last year's English MGB Challenge, which focused on recognition of diverse TV genres, this year, the challenge has an emphasis on handling the diversity in dialect in Arabic speech. Audio data comes from 19 distinct programmes from the Aljazeera Arabic TV channel between March 2005 and December 2015. Programmes are split into three groups: conversations, interviews, and reports. A total of 1,200 hours have been released with lightly supervised transcriptions for the acoustic modelling. For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera.net for a 10 year duration 2000-2011. Two lexicons have been provided, one phoneme based and one grapheme based. Finally, two tasks were proposed for this year's challenge: standard speech transcription, and word alignment. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained.
Fields of Science and Technology classification (FOS)
02 engineering and technology, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, 020206 networking & telecommunications
Subjects
free text keywords: Computer Science - Computation and Language, Metadata, Emphasis (typography), Speech transcription, Process (engineering), Arabic, language.human_language, language, Grapheme, Task (project management), Computer science, Natural language processing, computer.software_genre, computer, Artificial intelligence, business.industry, business, Transcription (linguistics)
Communities
  • Digital Humanities and Cultural Heritage
Funded by
EC| SUMMA
Project
SUMMA
Scalable Understanding of Multilingual Media
  • Funder: European Commission (EC)
  • Project Code: 688139
  • Funding stream: H2020 | RIA
Validated by funder

[5] Andreas Stolcke et al. Srilm-an extensible language modeling toolkit. In Interspeech, volume 2002, page 2002, 2002.

[6] Norbert Braunschweiler, Mark JF Gales, and Sabine Buchholz. Lightly supervised recognition for automatic alignment of large coherent speech recordings. In INTERSPEECH, pages 2222-2225, 2010.

[7] Fadi Biadsy, Nizar Habash, and Julia Hirschberg. Improving the arabic pronunciation dictionary for phone and word recognition with linguistically-based pronunciation rules. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 397-405. Association for Computational Linguistics, 2009. [OpenAIRE]

Any information missing or wrong?Report an Issue