Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
13 Research products, page 1 of 2

  • Digital Humanities and Cultural Heritage
  • European Commission
  • Netherlands Organisation for Scientific Research (NWO)
  • LIMOSINE
  • NL
  • CLARIN

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Publication . Article . Preprint . 2015 . Embargo End Date: 01 Jan 2015
    Open Access
    Authors: 
    Chuklin, Aleksandr; de Rijke, Maarten;
    Publisher: arXiv
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness ("perceived relevance") and immediate usefulness of the snippets ("snippet relevance"). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items. We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that. Comment: SIGIR 2014 Workshop on Gathering Efficient Assessments of Relevance

  • Open Access English
    Authors: 
    David Graus; David van Dijk; Manos Tsagkias; Wouter Weerkamp; Maarten de Rijke;
    Publisher: ACM
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.

  • Publication . Conference object . 2014
    Open Access English
    Authors: 
    Aleksandr Chuklin; Ke Zhou; Anne Schuth; Floor Sietsma; Maarten de Rijke;
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation. We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical(e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.

  • Open Access English
    Authors: 
    Reinanda, R.; de Rijke, M.; Tsujii, J.; Hajic, J.;
    Publisher: Association for Computational Linguistics
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated by incorrect heuristic labeling, and (2) distribution mismatch between the target and distant supervision examples. We are particularly interested in addressing the second problem and propose a sampling approach to handle the distribution mismatch. Our prior-informed distant supervision approach improves over basic distant supervision and outperforms a purely supervised approach when evaluated on TAC-KBP data, both on classification and end-to-end metrics.

  • Publication . Conference object . Part of book or chapter of book . 2014
    Open Access
    Authors: 
    Graus, D.; Tsagkias, M.; Buitinck, L.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; +1 more
    Publisher: Springer International Publishing
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | LIMOSINE (288024), NWO | Building Rich Links to En... (2300153702)

    The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

  • Open Access English
    Authors: 
    Graus, D.; Peetz, M.-H.; Odijk, D.; de Rooij, O.; de Rijke, M.; d'Aquin, M.; Dietze, S.; Drachsler, H.; Guy, M.; Herder, E.;
    Publisher: CEUR-WS
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Semantic Search in E-Disc... (7999)

    In this paper we present yourHistory: a Facebook application that aims to generate a tailor-made, personalized timeline of historic events, by matching a semantically enriched Facebook profile to a pool of candidate historic events extracted from DBPedia. Two aspects are central to our application: (i) semantic linking technologies backed by rich open web knowledge bases for generating semantically enriched user profiles, and (ii) semantic relatedness metrics for ranking historic events to user profiles. This paper describes the development of a Facebook application that aims to be engaging for users, whilst at the same time being a source for data that can be applied to evaluating or improving the application. We describe our Wikipedia-based semantic relatedness metric for event ranking, but also the restrictions and constraints concerning privacy-sensitive and ethical matters, around data storage and user consent. Finally, we reflect on how this type of user data can be applied for evaluating or improving both the semantic linking and event ranking methods in future work.

  • Publication . Conference object . 2014
    Open Access English
    Authors: 
    Gârbacea, C.; Tsagkias, M.; de Rijke, M.; Schaub, T.; Friedrich, G.; O'Sullivan, B.;
    Publisher: IOS Press
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | SPuDisc: Searching Public... (2300176811), NWO | Semantic Search in E-Disc... (7999)

    We address the task of detecting the reputation polarity of social media updates, that is, deciding whether the content of an update has positive or negative implications for the reputation of a given entity. Typical approaches to this task include sentiment lexicons and linguistic features. However, they fall short in the social media domain because of its unedited and noisy nature, and, more importantly, because reputation polarity is not only encoded in sentiment-bearing words but it is also embedded in other word usage. To this end, automatic methods for extracting discriminative features for reputation polarity detection can play a role. We propose a data-driven, supervised approach for extracting textual features, which we use to train a reputation polarity classifier. Experiments on the RepLab 2013 collection show that our model outperforms the state-of-the-art method based on sentiment analysis by 20\% accuracy.

  • Open Access English
    Authors: 
    Huijnen, Pim; Laan, Fons; de Rijke, Maarten; Pieters, Toine; Nadamoto, A; Jatowt, A; Wierzbicki, A; Leidner, JL; Sub History and Philosophy of Science; Sub Pharmacoepidemiology; +3 more
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), NWO | Semantic Search in E-Disc... (7999), EC | LIMOSINE (288024), NWO | Building Rich Links to En... (2300153702)

    Comparative historical research on the the intensity, diversity and fluidity of public discourses has been severely hampered by the extraordinary task of manually gathering and processing large sets of opinionated data in news media in different countries. At most 50,000 documents have been systematically studied in a single comparative historical project in the subject area of heredity and eugenics. Digital techniques, like the text mining tools WAHSP and BILAND we have developed in two successive demonstrator projects, are able to perform advanced forms of multi-lingual text-mining in much larger data sets of newspapers. We describe the development and use of WAHSP and BILAND to support historical discourse analysis in large digitized news media corpora. Furthermore, we argue how text mining techniques overcome the problem of traditional historical research that only documents explicitly referring to eugenics issues and debates can be incorporated. Our tools are able to provide information on ideas and notions about heredity, genetics and eugenics that circulate in discourses that are not directly related to eugenics (e.g., sport, education and economics).

  • Publication . Article . Preprint . 2013 . Embargo End Date: 01 Jan 2013
    Open Access
    Authors: 
    Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; de Rijke, Maarten;
    Publisher: arXiv
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), EC | COMPLACS (270327), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | SPuDisc: Searching Public... (2300176811), NWO | Semantic Search in E-Disc... (7999)

    This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a finite-time regret bound of order O(log t). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. Comment: 13 pages, 6 figures

  • Publication . Conference object . 2013
    Open Access English
    Authors: 
    Kenter, T.; Graus, D.; Meij, E.; de Rijke, M.;
    Publisher: Microsoft Research
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Semantic Search in E-Disc... (7999), EC | PROMISE (258191)

    Document filtering over time is applied in tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In streaming settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson's χ2 test for feature selection and demonstrate their benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting and windowing schemes. Experiments show improvements of our approach over a non-adaptive baseline, in a realistic settings with limited amounts of training data.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
13 Research products, page 1 of 2
  • Publication . Article . Preprint . 2015 . Embargo End Date: 01 Jan 2015
    Open Access
    Authors: 
    Chuklin, Aleksandr; de Rijke, Maarten;
    Publisher: arXiv
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Currently, the quality of a search engine is often determined using so-called topical relevance, i.e., the match between the user intent (expressed as a query) and the content of the document. In this work we want to draw attention to two aspects of retrieval system performance affected by the presentation of results: result attractiveness ("perceived relevance") and immediate usefulness of the snippets ("snippet relevance"). Perceived relevance may influence discoverability of good topical documents and seemingly better rankings may in fact be less useful to the user if good-looking snippets lead to irrelevant documents or vice-versa. And result items on a search engine result page (SERP) with high snippet relevance may add towards the total utility gained by the user even without the need to click those items. We start by motivating the need to collect different aspects of relevance (topical, perceived and snippet relevances) and how these aspects can improve evaluation measures. We then discuss possible ways to collect these relevance aspects using crowdsourcing and the challenges arising from that. Comment: SIGIR 2014 Workshop on Gathering Efficient Assessments of Relevance

  • Open Access English
    Authors: 
    David Graus; David van Dijk; Manos Tsagkias; Wouter Weerkamp; Maarten de Rijke;
    Publisher: ACM
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    We address the task of recipient recommendation for emailing in enterprises. We propose an intuitive and elegant way of modeling the task of recipient recommendation, which uses both the communication graph (i.e., who are most closely connected to the sender) and the content of the email. Additionally, the model can incorporate evidence as prior probabilities. Experiments on two enterprise email collections show that our model achieves very high scores, and that it outperforms two variants that use either the communication graph or the content in isolation.

  • Publication . Conference object . 2014
    Open Access English
    Authors: 
    Aleksandr Chuklin; Ke Zhou; Anne Schuth; Floor Sietsma; Maarten de Rijke;
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Modeling user behavior on a search engine result page is important for understanding the users and supporting simulation experiments. As result pages become more complex, click models evolve as well in order to capture additional aspects of user behavior in response to new forms of result presentation. We propose a method for evaluating the intuitiveness of vertical-aware click models, namely the ability of a click model to capture key aspects of aggregated result pages, such as vertical selection, item selection, result presentation and vertical diversity. This method allows us to isolate model components and therefore gives a multi-faceted view on a model's performance. We argue that our method can be used in conjunction with traditional click model evaluation metrics such as log-likelihood or perplexity. In order to demonstrate the power of our method in situations where result pages can contain more than one type of vertical(e.g., Image and News) we extend the previously studied Federated Click Model such that it models user clicks on such pages. Our evaluation method yields non-trivial yet interpretable conclusions about the intuitiveness of click models, highlighting their strengths and weaknesses.

  • Open Access English
    Authors: 
    Reinanda, R.; de Rijke, M.; Tsujii, J.; Hajic, J.;
    Publisher: Association for Computational Linguistics
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | Semantic Search in E-Disc... (7999)

    Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated by incorrect heuristic labeling, and (2) distribution mismatch between the target and distant supervision examples. We are particularly interested in addressing the second problem and propose a sampling approach to handle the distribution mismatch. Our prior-informed distant supervision approach improves over basic distant supervision and outperforms a purely supervised approach when evaluated on TAC-KBP data, both on classification and end-to-end metrics.

  • Publication . Conference object . Part of book or chapter of book . 2014
    Open Access
    Authors: 
    Graus, D.; Tsagkias, M.; Buitinck, L.; de Rijke, M.; de Rijke, M.; Kenter, T.; de Vries, A.P.; Zhai, C.X.; de Jong, F.; Radinsky, K.; +1 more
    Publisher: Springer International Publishing
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), EC | LIMOSINE (288024), NWO | Building Rich Links to En... (2300153702)

    The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.

  • Open Access English
    Authors: 
    Graus, D.; Peetz, M.-H.; Odijk, D.; de Rooij, O.; de Rijke, M.; d'Aquin, M.; Dietze, S.; Drachsler, H.; Guy, M.; Herder, E.;
    Publisher: CEUR-WS
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), NWO | SPuDisc: Searching Public... (2300176811), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Semantic Search in E-Disc... (7999)

    In this paper we present yourHistory: a Facebook application that aims to generate a tailor-made, personalized timeline of historic events, by matching a semantically enriched Facebook profile to a pool of candidate historic events extracted from DBPedia. Two aspects are central to our application: (i) semantic linking technologies backed by rich open web knowledge bases for generating semantically enriched user profiles, and (ii) semantic relatedness metrics for ranking historic events to user profiles. This paper describes the development of a Facebook application that aims to be engaging for users, whilst at the same time being a source for data that can be applied to evaluating or improving the application. We describe our Wikipedia-based semantic relatedness metric for event ranking, but also the restrictions and constraints concerning privacy-sensitive and ethical matters, around data storage and user consent. Finally, we reflect on how this type of user data can be applied for evaluating or improving both the semantic linking and event ranking methods in future work.

  • Publication . Conference object . 2014
    Open Access English
    Authors: 
    Gârbacea, C.; Tsagkias, M.; de Rijke, M.; Schaub, T.; Friedrich, G.; O'Sullivan, B.;
    Publisher: IOS Press
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), EC | VOX-POL (312827), EC | LIMOSINE (288024), NWO | SPuDisc: Searching Public... (2300176811), NWO | Semantic Search in E-Disc... (7999)

    We address the task of detecting the reputation polarity of social media updates, that is, deciding whether the content of an update has positive or negative implications for the reputation of a given entity. Typical approaches to this task include sentiment lexicons and linguistic features. However, they fall short in the social media domain because of its unedited and noisy nature, and, more importantly, because reputation polarity is not only encoded in sentiment-bearing words but it is also embedded in other word usage. To this end, automatic methods for extracting discriminative features for reputation polarity detection can play a role. We propose a data-driven, supervised approach for extracting textual features, which we use to train a reputation polarity classifier. Experiments on the RepLab 2013 collection show that our model outperforms the state-of-the-art method based on sentiment analysis by 20\% accuracy.

  • Open Access English
    Authors: 
    Huijnen, Pim; Laan, Fons; de Rijke, Maarten; Pieters, Toine; Nadamoto, A; Jatowt, A; Wierzbicki, A; Leidner, JL; Sub History and Philosophy of Science; Sub Pharmacoepidemiology; +3 more
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), NWO | Semantic Search in E-Disc... (7999), EC | LIMOSINE (288024), NWO | Building Rich Links to En... (2300153702)

    Comparative historical research on the the intensity, diversity and fluidity of public discourses has been severely hampered by the extraordinary task of manually gathering and processing large sets of opinionated data in news media in different countries. At most 50,000 documents have been systematically studied in a single comparative historical project in the subject area of heredity and eugenics. Digital techniques, like the text mining tools WAHSP and BILAND we have developed in two successive demonstrator projects, are able to perform advanced forms of multi-lingual text-mining in much larger data sets of newspapers. We describe the development and use of WAHSP and BILAND to support historical discourse analysis in large digitized news media corpora. Furthermore, we argue how text mining techniques overcome the problem of traditional historical research that only documents explicitly referring to eugenics issues and debates can be incorporated. Our tools are able to provide information on ideas and notions about heredity, genetics and eugenics that circulate in discourses that are not directly related to eugenics (e.g., sport, education and economics).

  • Publication . Article . Preprint . 2013 . Embargo End Date: 01 Jan 2013
    Open Access
    Authors: 
    Zoghi, Masrour; Whiteson, Shimon; Munos, Remi; de Rijke, Maarten;
    Publisher: arXiv
    Country: Netherlands
    Project: NWO | Modeling and Learning fro... (8686), EC | COMPLACS (270327), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Digging archaeology data:... (25409), NWO | SPuDisc: Searching Public... (2300176811), NWO | Semantic Search in E-Disc... (7999)

    This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms. Our approach extends the Upper Confidence Bound algorithm to the relative setting by using estimates of the pairwise probabilities to select a promising arm and applying Upper Confidence Bound with the winner as a benchmark. We prove a finite-time regret bound of order O(log t). In addition, our empirical results using real data from an information retrieval application show that it greatly outperforms the state of the art. Comment: 13 pages, 6 figures

  • Publication . Conference object . 2013
    Open Access English
    Authors: 
    Kenter, T.; Graus, D.; Meij, E.; de Rijke, M.;
    Publisher: Microsoft Research
    Country: Netherlands
    Project: NWO | SPuDisc: Searching Public... (2300176811), NWO | Modeling and Learning fro... (8686), NWO | Building Rich Links to En... (2300153702), EC | LIMOSINE (288024), NWO | Semantic Search in E-Disc... (7999), EC | PROMISE (258191)

    Document filtering over time is applied in tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In streaming settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson's χ2 test for feature selection and demonstrate their benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting and windowing schemes. Experiments show improvements of our approach over a non-adaptive baseline, in a realistic settings with limited amounts of training data.