- home
- Advanced Search
Filters
Clear AllLoading
apps Other research productkeyboard_double_arrow_right Other ORP type 2016 Czech Republic EnglishCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | QTLEAPMírovský, Jiří; Mikulová, Marie; Nedoluzhko, Anna; Novák, Michal; Cinková, Silvie;We present an extended version of the Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0). It includes all annotation of coreference (the original one from PCEDT 2.0 as well as the new one) and improved cross-lingual alignment of coreferential expressions. The corpus released as PCEDT 2.0 Coref is publicly available.
Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2016Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1664&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2016Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1664&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2017 Czech Republic EC | HimLAuthors: Rosa, Rudolf; Zeman, Daniel; Mareček, David; Žabokrtský, Zdeněk;Rosa, Rudolf; Zeman, Daniel; Mareček, David; Žabokrtský, Zdeněk;Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1971&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1971&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2021 Czech RepublicCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | BergamotAuthors: Novák, Michal; Jon, Josef;Novák, Michal; Jon, Josef;Marian NMT model for Catalan to Occitan translation. It is a multi-task model, producing also a phonemic transcription of the Catalan source. The model was submitted to WMT'21 Shared Task on Multilingual Low-Resource Translation for Indo-European Languages as a CUNI-Contrastive system for Catalan to Occitan.
LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2021Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-3772&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2021Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-3772&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2011 Netherlands EnglishMETA-NET EC | T4ME NETAuthors: Odijk, J.E.J.M.; Overkoepelend onderzoeksprogramma UiL-OTS; LS OZ Taal en spraaktechnologie;Odijk, J.E.J.M.; Overkoepelend onderzoeksprogramma UiL-OTS; LS OZ Taal en spraaktechnologie;All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::fed0d158264f30491a22f2d92b750f24&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::fed0d158264f30491a22f2d92b750f24&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2014 Netherlands EnglishUniversity of Copenhagen EC | DASISHHogenaar, A.Th.; Witkamp, P.; Bruijne, M.C. de; Wijnant, Arnaud; Kvamme, Trond; Kvalheim, Vigdis; Recker, Astrid; Fihn, Johan; Berglund, Torbjörn; Jerlehag, Birger; Müller Gjesdal, Anje; Parra, Carla; Dione, Bamba; De Smedt, Koenraad; Engelhardt, Claudia; Ludwig, Jens; Lenkiewicz, Przemyslaw;This report was produced in the context of the project Data Service Infrastructure for the Social Sciences and Humanities (DASISH) work package 4.3 Convergence of Data Services. The goal has been to allow the selection and promotion of high-quality deposit services for researchers in the Social Sciences and Humanities (SSH) and to make suggestions for service improvements.
All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::faf593791f03ddd3adafb8938d1debe1&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::faf593791f03ddd3adafb8938d1debe1&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2015 Czech Republic EC | QTLEAP, EC | FAUSTAuthors: Rosa, Rudolf;Rosa, Rudolf;Depfix is an open-source system for automatic post-editing of phrase-based machine translation outputs. Depfix employs a range of natural language processing tools to obtain analyses of the input sentences, and uses a set of rules to correct common or serious errors in machine translation outputs. Depfix is currently implemented only for English-to-Czech translation direction.
LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2015Data sources: Biblio at Institute of Formal and Applied LinguisticsBiblio at Institute of Formal and Applied LinguisticsOther ORP type . 2014Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1466&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2015Data sources: Biblio at Institute of Formal and Applied LinguisticsBiblio at Institute of Formal and Applied LinguisticsOther ORP type . 2014Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1466&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2011 Czech Republic CzechCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | EUROMATRIXPLUS, EC | FAUSTFučíková, Eva; Urešová, Zdeňka; Šindlerová, Jana; Cinková, Silvie; Hajič, Jan; Pajas, Petr; Hajičová, Eva; Žabokrtský, Zdeněk; Semecký, Jiří; Sgall, Petr; Štěpánek, Jan; Popelka, Jan; Mikulová, Marie; Toman, Josef; Panevová, Jarmila;Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed Czech-English parallel corpus sized over 1.2 million running words in almost 50,000 sentences for each part. Data The English part contains the entire Penn Treebank - Wall Street Journal Section (LDC99T42). The Czech part consists of Czech translations of all of the Penn Treebank-WSJ texts. The corpus is 1:1 sentence-aligned. An additional automatic alignment on the node level (different for each annotation layer) is part of this release, too. The original Penn Treebank-like file structure (25 sections, each containing up to one hundred files) has been preserved. Only those PTB documents which have both POS and structural annotation (total of 2312 documents) have been translated to Czech and made part of this release. Each language part is enhanced with a comprehensive manual linguistic annotation in the PDT 2.0 style (LDC2006T01, Prague Dependency Treebank 2.0). The main features of this annotation style are: dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values) semantic labeling of content words and types of coordinating structures argument structure, including an argument structure ("valency") lexicon for both languages ellipsis and anaphora resolution. This annotation style is called tectogrammatical annotation and it constitutes the tectogrammatical layer in the corpus. For more details see below and documentation. Annotation of the Czech part Sentences of the Czech translation were automatically morphologically annotated and parsed into surface-syntax dependency trees in the PDT 2.0 annotation style. This annotation style is sometimes called analytical annotation; it constitutes the analytical layer of the corpus. The manual tectogrammatical (deep-syntax) annotation was built as a separate layer above the automatic analytical (surface-syntax) parse. A sample of 2,000 sentences was manually annotated on the analytical layer. Annotation of the English part The resulting manual tectogrammatical annotation was built above an automatic transformation of the original phrase-structure annotation of the Penn Treebank into surface dependency (analytical) representations, using the following additional linguistic information from other sources: PropBank (LDC2004T14) VerbNet NomBank (LDC2008T23) flat noun phrase structures (by courtesy of D. Vadas and J.R. Curran) For each sentence, the original Penn Treebank phrase structure trees are preserved in this corpus together with their links to the analytical and tectogrammatical annotation.
Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2011Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11858/00-097C-0000-0015-8DAF-4&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2011Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11858/00-097C-0000-0015-8DAF-4&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2017 Czech Republic CzechCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | KConnect, EC | KHRESMOILibovický, Jindřich; Pecina, Pavel; Urešová, Zdeňka; Hlaváčová, Jaroslava; Tamchyna, Aleš; Hajič, Jan; Dušek, Ondřej;This package contains data sets for development (Section dev) and testing (Section test) of machine translation of sentences from summaries of medical articles between Czech, English, French, German, Hungarian, Polish, Spanish and Swedish. Version 2.0 extends the previous version by adding Hungarian, Polish, Spanish, and Swedish translations.
LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-2122&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-2122&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2017 Czech Republic CzechCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | HimLAuthors: Rosa, Rudolf; Žabokrtský, Zdeněk; Zeman, Daniel; Mareček, David;Rosa, Rudolf; Žabokrtský, Zdeněk; Zeman, Daniel; Mareček, David;Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1970&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1970&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2016 Portugal EnglishEuropean Language Resources Association EC | QTLEAPOtegi, Arantxa; Aranberri, Nora; Branco, António; Hajic, Jan; Neale, Steven; Osenova, Petya; Pereira, Rita; Popel, Martin; Silva, João; Simov, Kiril; Agirre, Eneko;handle: 10451/33107
This work presents parallel corpora automatically annotated with several NLP tools, including lemma and part of-speech tagging, named-entity recognition and classification, named-entity disambiguation, word-sense disambiguation, and coreference. The corpora comprise both the well-known Europarl corpus and a domain-specific question-answer troubleshooting corpus on the IT domain. English is common in all parallel corpora, with translations in five languages, namely, Basque, Bulgarian, Czech, Portuguese and Spanish. We describe the annotated corpora and the tools used for annotation, as well as annotation statistics for each language. These new resources are freely available and will help research on semantic processing for machine translation and cross-lingual transfer.
Universidade de Lisb... arrow_drop_down Universidade de Lisboa: Repositório.ULOther ORP type . 2016Data sources: Universidade de Lisboa: Repositório.ULadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10451/33107&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
visibility 88visibility views 88 download downloads 28 Powered bymore_vert Universidade de Lisb... arrow_drop_down Universidade de Lisboa: Repositório.ULOther ORP type . 2016Data sources: Universidade de Lisboa: Repositório.ULadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=10451/33107&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu
Loading
apps Other research productkeyboard_double_arrow_right Other ORP type 2016 Czech Republic EnglishCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | QTLEAPMírovský, Jiří; Mikulová, Marie; Nedoluzhko, Anna; Novák, Michal; Cinková, Silvie;We present an extended version of the Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0). It includes all annotation of coreference (the original one from PCEDT 2.0 as well as the new one) and improved cross-lingual alignment of coreferential expressions. The corpus released as PCEDT 2.0 Coref is publicly available.
Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2016Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1664&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2016Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1664&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2017 Czech Republic EC | HimLAuthors: Rosa, Rudolf; Zeman, Daniel; Mareček, David; Žabokrtský, Zdeněk;Rosa, Rudolf; Zeman, Daniel; Mareček, David; Žabokrtský, Zdeněk;Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1971&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2017Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1971&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2021 Czech RepublicCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | BergamotAuthors: Novák, Michal; Jon, Josef;Novák, Michal; Jon, Josef;Marian NMT model for Catalan to Occitan translation. It is a multi-task model, producing also a phonemic transcription of the Catalan source. The model was submitted to WMT'21 Shared Task on Multilingual Low-Resource Translation for Indo-European Languages as a CUNI-Contrastive system for Catalan to Occitan.
LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2021Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-3772&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2021Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-3772&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2011 Netherlands EnglishMETA-NET EC | T4ME NETAuthors: Odijk, J.E.J.M.; Overkoepelend onderzoeksprogramma UiL-OTS; LS OZ Taal en spraaktechnologie;Odijk, J.E.J.M.; Overkoepelend onderzoeksprogramma UiL-OTS; LS OZ Taal en spraaktechnologie;All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::fed0d158264f30491a22f2d92b750f24&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::fed0d158264f30491a22f2d92b750f24&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2014 Netherlands EnglishUniversity of Copenhagen EC | DASISHHogenaar, A.Th.; Witkamp, P.; Bruijne, M.C. de; Wijnant, Arnaud; Kvamme, Trond; Kvalheim, Vigdis; Recker, Astrid; Fihn, Johan; Berglund, Torbjörn; Jerlehag, Birger; Müller Gjesdal, Anje; Parra, Carla; Dione, Bamba; De Smedt, Koenraad; Engelhardt, Claudia; Ludwig, Jens; Lenkiewicz, Przemyslaw;This report was produced in the context of the project Data Service Infrastructure for the Social Sciences and Humanities (DASISH) work package 4.3 Convergence of Data Services. The goal has been to allow the selection and promotion of high-quality deposit services for researchers in the Social Sciences and Humanities (SSH) and to make suggestions for service improvements.
All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::faf593791f03ddd3adafb8938d1debe1&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=narcis______::faf593791f03ddd3adafb8938d1debe1&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2015 Czech Republic EC | QTLEAP, EC | FAUSTAuthors: Rosa, Rudolf;Rosa, Rudolf;Depfix is an open-source system for automatic post-editing of phrase-based machine translation outputs. Depfix employs a range of natural language processing tools to obtain analyses of the input sentences, and uses a set of rules to correct common or serious errors in machine translation outputs. Depfix is currently implemented only for English-to-Czech translation direction.
LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2015Data sources: Biblio at Institute of Formal and Applied LinguisticsBiblio at Institute of Formal and Applied LinguisticsOther ORP type . 2014Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1466&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!
more_vert LINDAT/CLARIAH-CZ re... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2015Data sources: Biblio at Institute of Formal and Applied LinguisticsBiblio at Institute of Formal and Applied LinguisticsOther ORP type . 2014Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11234/1-1466&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.euapps Other research productkeyboard_double_arrow_right Other ORP type 2011 Czech Republic CzechCharles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) EC | EUROMATRIXPLUS, EC | FAUSTFučíková, Eva; Urešová, Zdeňka; Šindlerová, Jana; Cinková, Silvie; Hajič, Jan; Pajas, Petr; Hajičová, Eva; Žabokrtský, Zdeněk; Semecký, Jiří; Sgall, Petr; Štěpánek, Jan; Popelka, Jan; Mikulová, Marie; Toman, Josef; Panevová, Jarmila;Texts The Prague Czech-English Dependency Treebank 2.0 (PCEDT 2.0) is a major update of the Prague Czech-English Dependency Treebank 1.0 (LDC2004T25). It is a manually parsed Czech-English parallel corpus sized over 1.2 million running words in almost 50,000 sentences for each part. Data The English part contains the entire Penn Treebank - Wall Street Journal Section (LDC99T42). The Czech part consists of Czech translations of all of the Penn Treebank-WSJ texts. The corpus is 1:1 sentence-aligned. An additional automatic alignment on the node level (different for each annotation layer) is part of this release, too. The original Penn Treebank-like file structure (25 sections, each containing up to one hundred files) has been preserved. Only those PTB documents which have both POS and structural annotation (total of 2312 documents) have been translated to Czech and made part of this release. Each language part is enhanced with a comprehensive manual linguistic annotation in the PDT 2.0 style (LDC2006T01, Prague Dependency Treebank 2.0). The main features of this annotation style are: dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values) semantic labeling of content words and types of coordinating structures argument structure, including an argument structure ("valency") lexicon for both languages ellipsis and anaphora resolution. This annotation style is called tectogrammatical annotation and it constitutes the tectogrammatical layer in the corpus. For more details see below and documentation. Annotation of the Czech part Sentences of the Czech translation were automatically morphologically annotated and parsed into surface-syntax dependency trees in the PDT 2.0 annotation style. This annotation style is sometimes called analytical annotation; it constitutes the analytical layer of the corpus. The manual tectogrammatical (deep-syntax) annotation was built as a separate layer above the automatic analytical (surface-syntax) parse. A sample of 2,000 sentences was manually annotated on the analytical layer. Annotation of the English part The resulting manual tectogrammatical annotation was built above an automatic transformation of the original phrase-structure annotation of the Penn Treebank into surface dependency (analytical) representations, using the following additional linguistic information from other sources: PropBank (LDC2004T14) VerbNet NomBank (LDC2008T23) flat noun phrase structures (by courtesy of D. Vadas and J.R. Curran) For each sentence, the original Penn Treebank phrase structure trees are preserved in this corpus together with their links to the analytical and tectogrammatical annotation.
Biblio at Institute ... arrow_drop_down Biblio at Institute of Formal and Applied LinguisticsOther ORP type . 2011Data sources: Biblio at Institute of Formal and Applied Linguisticsadd ClaimPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.All Research productsarrow_drop_down <script type="text/javascript"> <!-- document.write('<div id="oa_widget"></div>'); document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=11858/00-097C-0000-0015-8DAF-4&type=result"></script>'); --> </script>
For further information contact us at helpdesk@openaire.eu0 citations 0 popularity Average influence Average impulse Average Powered by BIP!