Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
1 Research products, page 1 of 1

  • Digital Humanities and Cultural Heritage
  • Article
  • European Commission
  • Science Foundation Ireland
  • SFI|SFI Centre for Science Engineering and Technology (CSET)
  • KHRESMOI
  • PANACEA
  • 16. Peace & justice
  • DE
  • IE
  • Digital Humanities and Cultural Heritage

Date (most recent)
arrow_drop_down
  • Open Access
    Authors: 
    Pavel Pecina; Antonio Toral; Vassilis Papavassiliou; Prokopis Prokopidis; Aleš Tamchyna; Andy Way; Josef van Genabith;
    Country: Czech Republic
    Project: SFI | CSET CNGL: Next Generatio... (07/CE/I1142), EC | ABU-MATRAN (324414), EC | KHRESMOI (257528), EC | PANACEA (248064)

    In this paper, we tackle the problem of domain adaptation of statistical machine translation (SMT) by exploiting domain-specific data acquired by domain-focused crawling of text from the World Wide Web. We design and empirically evaluate a procedure for automatic acquisition of monolingual and parallel text and their exploitation for system training, tuning, and testing in a phrase-based SMT framework. We present a strategy for using such resources depending on their availability and quantity supported by results of a large-scale evaluation carried out for the domains of environment and labour legislation, two language pairs (English---French and English---Greek) and in both directions: into and from English. In general, machine translation systems trained and tuned on a general domain perform poorly on specific domains and we show that such systems can be adapted successfully by retuning model parameters using small amounts of parallel in-domain data, and may be further improved by using additional monolingual and parallel training data for adaptation of language and translation models. The average observed improvement in BLEU achieved is substantial at 15.30 points absolute.