Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
22 Research products, page 1 of 3

  • Digital Humanities and Cultural Heritage
  • Open Access
  • Netherlands Organisation for Scientific Research (NWO)
  • CLARIN

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access
    Authors: 
    van Bavel, B.J.P.; Curtis, Daniel; Hannaford, Matthew; Moatsos, M.; Roosen, Joris; Soens, Tim; LS Transities v. economie en samenleving; OGKG - Sociaal-economische geschiedenis; LS Economische Geschiedenis;
    Publisher: Wiley
    Countries: Netherlands, Belgium
    Project: NWO | CLARIAH Common Lab Resear... (11759), EC | COORDINATINGFORLIFE (339647)

    Recent advances in paleoclimatology and the growing digital availability of large historical datasets on human activity have created new opportunities to investigate long‐term interactions between climate and society. However, noncritical use of historical datasets can create pitfalls, resulting in misleading findings that may become entrenched as accepted knowledge. We demonstrate pitfalls in the content, use and interpretation of historical datasets in research into climate and society interaction through a systematic review of recent studies on the link between climate and (a) conflict incidence, (b) plague outbreaks and (c) agricultural productivity changes. We propose three sets of interventions to overcome these pitfalls, which involve a more critical and multidisciplinary collection and construction of historical datasets, increased specificity and transparency about uncertainty or biases, and replacing inductive with deductive approaches to causality. This will improve the validity and robustness of interpretations on the long‐term relationship between climate and society. This article is categorized under: Climate, History, Society, Culture > Disciplinary Perspectives Recent literature investigating long‐term interactions between climate and society increasingly utilizes historical big data. Too often this is done without applying historical criticism, which may lead to misguided narratives. We propose a set of interventions to avoid this and optimize the use of historical datasets.

  • Open Access English
    Authors: 
    Kleijn, S.; LS taalbeheersing van het Nederlands; UiL OTS L&C;
    Publisher: LOT
    Country: Netherlands
    Project: NWO | LIN: A validated reading ... (8956)

    Is my text comprehensible for my audience? It is a question publishers, organizations and governments struggle with and it is a question that readability formulae proclaim to solve. With a press of a button the readability of a text is assessed and users know whether texts are suited for their intended readers. Because the need for objective measures of readability has only increased, readability formulae have retained their overall popularity. This despite a steady stream of criticism. Fortunately, developments in computational linguistics have opened up new possibilities to improve the old readability formulae. In her dissertation, Suzanne Kleijn combined current language technology with insights from readability research and discourse processing in an attempt to build an empirically validated readability tool for Dutch secondary school readers. As a result, the findings are relevant both to the field of discourse processing and to practitioners aiming for readability improvement. Kleijn investigated the relationship between linguistic features and two aspects of readability: comprehension and processing ease. Comprehension was measured using an especially developed cloze procedure (‘The HyTeC-cloze’) and processing ease was measured using eye-movement registration. In her design, Kleijn combined experimental and correlational work in order to disentangle causal effects of linguistic features on readability from correlational relationships. That is, readability differences between texts and differences between stylistic variants of the same text were studied at the same time. In three separate experiments only the lexical complexity, the syntactic complexity or the number of coherence markers within texts was changed to see how these factors affect readability. While reducing a text’s lexical complexity or syntactic complexity improved text comprehension (as measured with the developed cloze tests) and increased processing ease (as measured with eye-movement registration), coherence markers showed mixed results. Adding contrastive connectives (e.g., maar ‘but’) or causal connectives (e.g., dus ‘so’) had a positive effect on comprehension of their immediate context, but inserting additive connectives (e.g., daarnaast ‘furthermore’) had a negative effect on comprehension. Taken together, the three experimental studies tested 2900+ Dutch adolescents and provided comprehension data for 60 texts (in two versions). These data were used to build a multilevel model to predict readability of Dutch texts for Dutch adolescents. Linguistic features were automatically extracted from the texts using the text analysis tool T-Scan and added to the model. The final model (‘U-Read’) included five factors: Word frequency of content words (without names and corrected for compounds), Content words per clause, Concrete nouns, Maximum syntactic dependency length and Adjectival pas participles. Together these features explained 23% of the observed variance in comprehension scores, which is a 20% improvement compared to predictors in two popular Dutch readability formulae, the Flesch-Douma and CLIB-formula. Although these features were found to be good predictors of readability, they do not necessarily have a strong causal effect on comprehension. This is because readability formulae are based on differences between texts. For example, texts containing difficult words tend to discuss difficult or unfamiliar topics. Replacing difficult words in such a text for more familiar words will reduce the text’s stylistic complexity, but not the complexity associated with its topic and content. As a result, the effects of changing a linguistic feature are relatively small compared to the differences predicted on the basis of between-text differences. Kleijn’s findings thus provide a realistic (and sobering) view of the importance of linguistic features and their potential for reducing the difficulty level of a given text.

  • Publication . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Bloothooft, G.; Onland, D.; Kunst, J.P.; Odijk, Jan; van Hessen, Arjan;
    Publisher: Ubiquity Press Limited
    Country: Netherlands
    Project: NWO | CLARIN-NL (2300154268)

    Flexible software has been developed for the interactive mapping of socio-cultural phenomena in the Netherlands on the web. The possibilities of such software are demonstrated for the mapping of migration in the Netherlands across four generations. Both the origin and dispersion of the population can be explored at the geographic levels of municipality, region,dialect area and province.

  • Open Access English
    Authors: 
    Stemle, Egon W.; Wigham, Ciara R.;
    Publisher: HAL CCSD
    Country: France
    Project: NWO | The impact of computer-me... (10959)

    International audience; This volume presents the proceedings of the 5th edition of the annual conference series on CMC and Social Media Corpora for the Humanities (cmc-corpora2017). This conference series is dedicated to the collection, annotation, processing, and exploitation of corpora of computer-mediated communication (CMC) and social media for research in the humanities. The annual event brings together language-centered research on CMC and social media in linguistics, philologies, communication sciences, media and social sciences with research questions from the fields of corpus and computational linguistics, language technology, text technology, and machine learning.The 5th Conference on CMC and Social Media Corpora for the Humanities was held at Eurac Research on October, 4th and 5th, in Bolzano, Italy. This volume contains extended abstracts of the invited talks, papers, and extended abstracts of posters presented at the event. The conference attracted 26 valid submissions. Each submission was reviewed by at least two members of the scientific committee. This committee decided to accept 16 papers and 8 posters of which 14 papers and 3 posters were presented at the conference. The programme also includes three invited talks: two keynote talks by Aivars Glaznieks (Eurac Research, Italy) and A. Seza Doğruöz (Independent researcher) and an invited talk on the Common Language Resources and Technology Infrastructure (CLARIN) given by Darja Fišer, the CLARIN ERIC Director of User Involvement.

  • Open Access
    Authors: 
    Verheijen, L.; Spooren, W.P.M.S.; Stemle, E.W.; Wigham, C.R.;
    Country: Netherlands
    Project: NWO | The impact of computer-me... (10959)
  • Publication . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Windhouwer, Menzo; Dimitriadis, A.; Akerman, Vesa; Odijk, Jan; van Hessen, Arjan; LS Psycholinguistiek; UiL OTS LLI;
    Country: Netherlands
    Project: NWO | Typological Database Syst... (1800114557)

    The Typological Database System (TDS), which provides integrated access to a dozen independently created typological databases, was launched in 2007. Due to the pace of change in web technologies, the original software has for some time been edging toward obsolescence. CLARIN-NL granted funding to the TDS Curator project to migrate this valuable resource to a more durable platform, archiving the data and converting its interface to a true web service architecture that can continue to provide interactive access to the data. This chapter describes the architecture of the new system, and the Integrated Data and Documentation Format (IDDF) on which it is based.

  • Publication . Article . Preprint . 2015 . Embargo End Date: 01 Jan 2015
    Open Access
    Authors: 
    Chuklin, Aleksandr; de Rijke, Maarten;
    Publisher: arXiv
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness ("perceived relevance") and immediate usefulness of the snippets ("snippet relevance"). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items. We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that. Comment: SIGIR 2014 Workshop on Gathering Efficient Assessments of Relevance

  • Open Access English
    Authors: 
    David Graus; David van Dijk; Manos Tsagkias; Wouter Weerkamp; Maarten de Rijke;
    Publisher: ACM
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.

  • Publication . Conference object . 2014
    Open Access English
    Authors: 
    Aleksandr Chuklin; Ke Zhou; Anne Schuth; Floor Sietsma; Maarten de Rijke;
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation. We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical(e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.

  • Open Access English
    Authors: 
    Reinanda, R.; de Rijke, M.; Tsujii, J.; Hajic, J.;
    Publisher: Association for Computational Linguistics
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated by incorrect heuristic labeling, and (2) distribution mismatch between the target and distant supervision examples. We are particularly interested in addressing the second problem and propose a sampling approach to handle the distribution mismatch. Our prior-informed distant supervision approach improves over basic distant supervision and outperforms a purely supervised approach when evaluated on TAC-KBP data, both on classification and end-to-end metrics.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
22 Research products, page 1 of 3
  • Open Access
    Authors: 
    van Bavel, B.J.P.; Curtis, Daniel; Hannaford, Matthew; Moatsos, M.; Roosen, Joris; Soens, Tim; LS Transities v. economie en samenleving; OGKG - Sociaal-economische geschiedenis; LS Economische Geschiedenis;
    Publisher: Wiley
    Countries: Netherlands, Belgium
    Project: NWO | CLARIAH Common Lab Resear... (11759), EC | COORDINATINGFORLIFE (339647)

    Recent advances in paleoclimatology and the growing digital availability of large historical datasets on human activity have created new opportunities to investigate long‐term interactions between climate and society. However, noncritical use of historical datasets can create pitfalls, resulting in misleading findings that may become entrenched as accepted knowledge. We demonstrate pitfalls in the content, use and interpretation of historical datasets in research into climate and society interaction through a systematic review of recent studies on the link between climate and (a) conflict incidence, (b) plague outbreaks and (c) agricultural productivity changes. We propose three sets of interventions to overcome these pitfalls, which involve a more critical and multidisciplinary collection and construction of historical datasets, increased specificity and transparency about uncertainty or biases, and replacing inductive with deductive approaches to causality. This will improve the validity and robustness of interpretations on the long‐term relationship between climate and society. This article is categorized under: Climate, History, Society, Culture > Disciplinary Perspectives Recent literature investigating long‐term interactions between climate and society increasingly utilizes historical big data. Too often this is done without applying historical criticism, which may lead to misguided narratives. We propose a set of interventions to avoid this and optimize the use of historical datasets.

  • Open Access English
    Authors: 
    Kleijn, S.; LS taalbeheersing van het Nederlands; UiL OTS L&C;
    Publisher: LOT
    Country: Netherlands
    Project: NWO | LIN: A validated reading ... (8956)

    Is my text comprehensible for my audience? It is a question publishers, organizations and governments struggle with and it is a question that readability formulae proclaim to solve. With a press of a button the readability of a text is assessed and users know whether texts are suited for their intended readers. Because the need for objective measures of readability has only increased, readability formulae have retained their overall popularity. This despite a steady stream of criticism. Fortunately, developments in computational linguistics have opened up new possibilities to improve the old readability formulae. In her dissertation, Suzanne Kleijn combined current language technology with insights from readability research and discourse processing in an attempt to build an empirically validated readability tool for Dutch secondary school readers. As a result, the findings are relevant both to the field of discourse processing and to practitioners aiming for readability improvement. Kleijn investigated the relationship between linguistic features and two aspects of readability: comprehension and processing ease. Comprehension was measured using an especially developed cloze procedure (‘The HyTeC-cloze’) and processing ease was measured using eye-movement registration. In her design, Kleijn combined experimental and correlational work in order to disentangle causal effects of linguistic features on readability from correlational relationships. That is, readability differences between texts and differences between stylistic variants of the same text were studied at the same time. In three separate experiments only the lexical complexity, the syntactic complexity or the number of coherence markers within texts was changed to see how these factors affect readability. While reducing a text’s lexical complexity or syntactic complexity improved text comprehension (as measured with the developed cloze tests) and increased processing ease (as measured with eye-movement registration), coherence markers showed mixed results. Adding contrastive connectives (e.g., maar ‘but’) or causal connectives (e.g., dus ‘so’) had a positive effect on comprehension of their immediate context, but inserting additive connectives (e.g., daarnaast ‘furthermore’) had a negative effect on comprehension. Taken together, the three experimental studies tested 2900+ Dutch adolescents and provided comprehension data for 60 texts (in two versions). These data were used to build a multilevel model to predict readability of Dutch texts for Dutch adolescents. Linguistic features were automatically extracted from the texts using the text analysis tool T-Scan and added to the model. The final model (‘U-Read’) included five factors: Word frequency of content words (without names and corrected for compounds), Content words per clause, Concrete nouns, Maximum syntactic dependency length and Adjectival pas participles. Together these features explained 23% of the observed variance in comprehension scores, which is a 20% improvement compared to predictors in two popular Dutch readability formulae, the Flesch-Douma and CLIB-formula. Although these features were found to be good predictors of readability, they do not necessarily have a strong causal effect on comprehension. This is because readability formulae are based on differences between texts. For example, texts containing difficult words tend to discuss difficult or unfamiliar topics. Replacing difficult words in such a text for more familiar words will reduce the text’s stylistic complexity, but not the complexity associated with its topic and content. As a result, the effects of changing a linguistic feature are relatively small compared to the differences predicted on the basis of between-text differences. Kleijn’s findings thus provide a realistic (and sobering) view of the importance of linguistic features and their potential for reducing the difficulty level of a given text.

  • Publication . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Bloothooft, G.; Onland, D.; Kunst, J.P.; Odijk, Jan; van Hessen, Arjan;
    Publisher: Ubiquity Press Limited
    Country: Netherlands
    Project: NWO | CLARIN-NL (2300154268)

    Flexible software has been developed for the interactive mapping of socio-cultural phenomena in the Netherlands on the web. The possibilities of such software are demonstrated for the mapping of migration in the Netherlands across four generations. Both the origin and dispersion of the population can be explored at the geographic levels of municipality, region,dialect area and province.

  • Open Access English
    Authors: 
    Stemle, Egon W.; Wigham, Ciara R.;
    Publisher: HAL CCSD
    Country: France
    Project: NWO | The impact of computer-me... (10959)

    International audience; This volume presents the proceedings of the 5th edition of the annual conference series on CMC and Social Media Corpora for the Humanities (cmc-corpora2017). This conference series is dedicated to the collection, annotation, processing, and exploitation of corpora of computer-mediated communication (CMC) and social media for research in the humanities. The annual event brings together language-centered research on CMC and social media in linguistics, philologies, communication sciences, media and social sciences with research questions from the fields of corpus and computational linguistics, language technology, text technology, and machine learning.The 5th Conference on CMC and Social Media Corpora for the Humanities was held at Eurac Research on October, 4th and 5th, in Bolzano, Italy. This volume contains extended abstracts of the invited talks, papers, and extended abstracts of posters presented at the event. The conference attracted 26 valid submissions. Each submission was reviewed by at least two members of the scientific committee. This committee decided to accept 16 papers and 8 posters of which 14 papers and 3 posters were presented at the conference. The programme also includes three invited talks: two keynote talks by Aivars Glaznieks (Eurac Research, Italy) and A. Seza Doğruöz (Independent researcher) and an invited talk on the Common Language Resources and Technology Infrastructure (CLARIN) given by Darja Fišer, the CLARIN ERIC Director of User Involvement.

  • Open Access
    Authors: 
    Verheijen, L.; Spooren, W.P.M.S.; Stemle, E.W.; Wigham, C.R.;
    Country: Netherlands
    Project: NWO | The impact of computer-me... (10959)
  • Publication . Part of book or chapter of book . 2017
    Open Access English
    Authors: 
    Windhouwer, Menzo; Dimitriadis, A.; Akerman, Vesa; Odijk, Jan; van Hessen, Arjan; LS Psycholinguistiek; UiL OTS LLI;
    Country: Netherlands
    Project: NWO | Typological Database Syst... (1800114557)

    The Typological Database System (TDS), which provides integrated access to a dozen independently created typological databases, was launched in 2007. Due to the pace of change in web technologies, the original software has for some time been edging toward obsolescence. CLARIN-NL granted funding to the TDS Curator project to migrate this valuable resource to a more durable platform, archiving the data and converting its interface to a true web service architecture that can continue to provide interactive access to the data. This chapter describes the architecture of the new system, and the Integrated Data and Documentation Format (IDDF) on which it is based.

  • Publication . Article . Preprint . 2015 . Embargo End Date: 01 Jan 2015
    Open Access
    Authors: 
    Chuklin, Aleksandr; de Rijke, Maarten;
    Publisher: arXiv
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness ("perceived relevance") and immediate usefulness of the snippets ("snippet relevance"). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items. We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that. Comment: SIGIR 2014 Workshop on Gathering Efficient Assessments of Relevance

  • Open Access English
    Authors: 
    David Graus; David van Dijk; Manos Tsagkias; Wouter Weerkamp; Maarten de Rijke;
    Publisher: ACM
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.

  • Publication . Conference object . 2014
    Open Access English
    Authors: 
    Aleksandr Chuklin; Ke Zhou; Anne Schuth; Floor Sietsma; Maarten de Rijke;
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation. We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical(e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.

  • Open Access English
    Authors: 
    Reinanda, R.; de Rijke, M.; Tsujii, J.; Hajic, J.;
    Publisher: Association for Computational Linguistics
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated by incorrect heuristic labeling, and (2) distribution mismatch between the target and distant supervision examples. We are particularly interested in addressing the second problem and propose a sampling approach to handle the distribution mismatch. Our prior-informed distant supervision approach improves over basic distant supervision and outperforms a purely supervised approach when evaluated on TAC-KBP data, both on classification and end-to-end metrics.