
You have already added 0 works in your ORCID record related to the merged Research product.
You have already added 0 works in your ORCID record related to the merged Research product.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
You have already added 0 works in your ORCID record related to the merged Research product.
You have already added 0 works in your ORCID record related to the merged Research product.
Time-Aware Chi-squared for Document Filtering over Time
Time-Aware Chi-squared for Document Filtering over Time
Document filtering over time is applied in tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In streaming settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson's χ2 test for feature selection and demonstrate their benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting and windowing schemes. Experiments show improvements of our approach over a non-adaptive baseline, in a realistic settings with limited amounts of training data.
[1] J. Allan. Introduction to topic detection and tracking. In Topic detection and tracking, pages 1-16. Springer, 2002.
[2] J. Frank, M. Kleiman-Weiner, D. Roberts, F. Niu, C. Zhang, C. R´e, and I. Soboro↵ . Building an entity-centric stream filtering test collection for TREC 2012. In Proceedings of the 21st TREC, 2012.
[3] I. Katakis, G. Tsoumakas, and I. Vlahavas. Dynamic feature space and incremental feature selection for the classification of textual data streams. In PKDD, pages 102-116, 2006.
[4] H. J. Kim and J. Chang. Integrating incremental feature weighting into naive bayes text classifier. In Machine Learning and Cybernetics, 2007 International Conference on, volume 2, pages 1137-1143, 2007.
[5] R. Klinkenberg. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8 (3):281-300, 2004.
[6] Y. Yiming and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97, pages 412-420, 1997.
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).0 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Average influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Average impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Average citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).0 popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.Average influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).Average impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.Average Powered byBIP!

- Funder: Netherlands Organisation for Scientific Research (NWO) (NWO)
- Project Code: 640.004.802
- Funder: European Commission (EC)
- Project Code: 258191
- Funding stream: FP7 | SP1 | ICT
- Funder: Netherlands Organisation for Scientific Research (NWO) (NWO)
- Project Code: 727.011.005
- Funder: European Commission (EC)
- Project Code: 288024
- Funding stream: FP7 | SP1 | ICT
Document filtering over time is applied in tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In streaming settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson's χ2 test for feature selection and demonstrate their benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting and windowing schemes. Experiments show improvements of our approach over a non-adaptive baseline, in a realistic settings with limited amounts of training data.