research data . Dataset . 2014 . Embargo end date: 27 Mar 2014

Urdu Monolingual Corpus

Jawaid, Bushra; Kamran, Amir; Bojar, Ondřej;
Open Access
  • Published: 22 Mar 2014
  • Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
We release a sizeable monolingual Urdu corpus automatically tagged with part-of-speech tags. We extend the work of Jawaid and Bojar (2012) who use three different taggers and then apply a voting scheme to disambiguate among the different choices suggested by each tagger. We run this complex ensemble on a large monolingual corpus and release the both plain and tagged corpora.
Persistent Identifiers
Funded by
Moses Open Source Evaluation and Support Co-ordination for OutReach and Exploitation
  • Funder: European Commission (EC)
  • Project Code: 288487
  • Funding stream: FP7 | SP1 | ICT
Digital Humanities and Cultural Heritage
Download from
Any information missing or wrong?Report an Issue