- home
- Advanced Search
1,431 Research products, page 1 of 144
Loading
- Research data . 2016 . Embargo End Date: 23 Jul 2020Open Access EnglishAuthors:Nakao, Hisashi; Tamura, Kohei; Arimatsu, Yui; Nakagawa, Tomomi; Matsumoto, Naoko; Matsugi, Takehiko;Nakao, Hisashi; Tamura, Kohei; Arimatsu, Yui; Nakagawa, Tomomi; Matsumoto, Naoko; Matsugi, Takehiko;Publisher: Dryad
Whether man is predisposed to lethal violence, ranging from homicide to warfare, and how that may have impacted human evolution, are among the most controversial topics of debate on human evolution. Although recent studies on the evolution of warfare have been based on various archaeological and ethnographic data, they have reported mixed results: it is unclear whether or not warfare among prehistoric hunter–gatherers was common enough to be a component of human nature and a selective pressure for the evolution of human behaviour. This paper reports the mortality attributable to violence, and the spatio-temporal pattern of violence thus shown among ancient hunter–gatherers using skeletal evidence in prehistoric Japan (the Jomon period: 13 000 cal BC–800 cal BC). Our results suggest that the mortality due to violence was low and spatio-temporally highly restricted in the Jomon period, which implies that violence including warfare in prehistoric Japan was not common. ESM for Violence in Japanese prehistoryDefinition of and sources of data for injured individuals in the Jomon period, and detailed information of all sites where skeletal remains have been recovered.supplement_corrected_final.docx
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2016Open Access GermanAuthors:Hagmann, Dominik; Langendorf, Alarich; Steininger, Andreas;Hagmann, Dominik; Langendorf, Alarich; Steininger, Andreas;Publisher: Zenodo
2015's 3d model of the interior of the so-called witch tower at the south easteren corner of the outer fortifications of Ulmerfeld Castle. The model was made using 3d photogrammetry (image based modeling) and mast aerial photography.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2018Open AccessAuthors:Bonnie, Rick;Bonnie, Rick;Publisher: Zenodo
This open dataset lists, describes, and provides relevant bibliography to all known archaeological sites in Galilee with evidence for stone vessels. It forms part of the dataset used in the monograph Being Jewish in Galilee, 100–200 CE: An Archaeological Study (Brepols). The dataset is available in both PDF and CSV formats. The PDF file provides a detailed description of and bibliography for the evidence of stone vessels at each archaeological site. The CSV file contains the raw data that can be easily imported into spreadsheets and databases.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Luis Naranjo-Zeledón;Luis Naranjo-Zeledón;Publisher: Zenodo
These files contain the calculations of phonological proximity and validations with users carried out on a subset of the Costa Rican Sign Language (LESCO, for its acronym in Spanish). The signs corresponding to the alphabet have been used.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Open AccessAuthors:Mengual, Ximo; Bot, Sander; Chkhartishvili, Tinatin; Reimann, André; Thormann, Jana; von der Mark, Laura;Mengual, Ximo; Bot, Sander; Chkhartishvili, Tinatin; Reimann, André; Thormann, Jana; von der Mark, Laura;Publisher: Zenodo
: Data type: molecular data
- Research data . 2020Open Access EnglishAuthors:Giovanni Spitale;Giovanni Spitale;Publisher: Zenodo
The COVID-19 pandemic generated (and keeps generating) a huge corpus of news articles, easily retrievable in Factiva with very targeted queries. This dataset, generated with an ad-hoc parser and NLP pipeline, analyzes the frequency of lemmas and named entities in news articles (in German, French, Italian and English ) regarding Switzerland and COVID-19. The analysis of large bodies of grey literature via text mining and computational linguistics is an increasingly frequent approach to understand the large-scale trends of specific topics. We used Factiva, a news monitoring and search engine developed and owned by Dow Jones, to gather and download all the news articles published between January 2020 and May 2021 on Covid-19 and Switzerland. Due to Factiva's copyright policy, it is not possible to share the original dataset with the exports of the articles' text; however, we can share the results of our work on the corpus. All the information relevant to reproduce the results is provided. Factiva allows a very granular definition of the queries, and moreover has access to full text articles published by the major media outlet of the world. The query has been defined as follows (syntax in bold, explanation in italics): ((coronavirus or Wuhan virus or corvid19 or corvid 19 or covid19 or covid 19 or ncov or novel coronavirus or sars) and (atleast3 coronavirus or atleast3 wuhan or atleast3 corvid* or atleast3 covid* or atleast3 ncov or atleast3 novel or atleast3 corona*)) Keywords for covid19; must appear at least 3 times in the text and ns=(gsars or gout) Subject is “novel coronaviruses” or “outbreaks and epidemics” and “general news” and la=X Language is X (DE, FR, IT, EN) and rst=tmnb Restrict to TMNB (major news and business publications) and wc>300 At least 300 words and date from 20191001 to 20212005 Date interval and re=SWITZ Region is Switzerland It is important to specify some details that characterize the query. The query is not limited to articles published by Swiss media, but to articles regarding Switzerland. The reason is simple: a Swiss user googling for “Schweiz Coronavirus” or for “Coronavirus Ticino” can easily find and read articles published by foreign media outlets (namely, German or Italian) on that topic. If the objective is capturing and describing the information trends to which people are exposed, this approach makes much more sense than limiting the analysis to articles published by Swiss media. Factiva’s field “NS” is a descriptor for the content of the article. “gsars” is defined in Factiva’s documentation as “All news on Severe Acute Respiratory Syndrome”, and “gout” as “The widespread occurrence of an infectious disease affecting many people or animals in a given population at the same time”; however, the way these descriptors are assigned to articles is not specified in the documentation. Finally, the query has been restricted to major news and business publications of at least 300 words. Duplicate check is performed by Factiva. Given the incredibly large amount of articles published on COVID-19, this (absolutely arbitrary) restriction allows retrieving a corpus that is both meaningful and manageable. metadata.xlsx contains information about the articles retrieved (strategy, amount) This work is part of the PubliCo research project. This work is part of the PubliCo research project, supported by the Swiss National Science Foundation (SNF). Project no. 31CA30_195905
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2015Open Access
In 2014-2015, Caddo vessels from the Tuck Carpenter (41CP5) collection were scanned at the Center for Regional Heritage Research. These scans were generated for use in a study of 3D geometric morphometrics and for public outreach. Many thanks to the Caddo Nation of Oklahoma and the Anthropology and Archaeology Laboratory for the requisite permissions and access.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Barman, Raphaël; Ehrmann, Maud; Clematide, Simon; Ares Oliveira, Sofia;Barman, Raphaël; Ehrmann, Maud; Clematide, Simon; Ares Oliveira, Sofia;Publisher: ZenodoCountry: Switzerland
This record contains the datasets and models used and produced for the work reported in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers" (link). Please cite this paper if you are using the models/datasets or find it relevant to your research: @article{barman_combining_2020, title = {{Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers}}, author = {Raphaël Barman and Maud Ehrmann and Simon Clematide and Sofia Ares Oliveira and Frédéric Kaplan}, journal= {Journal of Data Mining \& Digital Humanities}, volume= {HistoInformatics} DOI = {10.5281/zenodo.4065271}, year = {2021}, url = {https://jdmdh.episciences.org/7097}, } Please note that this record contains data under different licenses. 1. DATA Annotations (json files): JSON files contains image annotations, with one file per newspaper containing region annotations (label and coordinates) in VIA format. The following licenses apply: luxwort.json: those annotations are under a CC0 1.0 license. Please refer to the right statement specified for each image in the file. GDL.json, IMP.json and JDG.json: those annotations are under a CC BY-SA 4.0 license. Image files: The archive images.zip contains the Swiss titles image files (GDL, IMP, JDG) used for the experiments described in the paper. Those images are under copyright (property of the journal Le Temps and of ArcInfo) and can be used for academic research or educational purposes only. Redistribution, publication or commercial use are not permitted. These terms of use are similar to the following right statement: http://rightsstatements.org/vocab/InC-EDU/1.0/ 2. MODELS Some of the best models are released under a CC BY-SA 4.0 license (they are also available as assets of the current Github release). JDG_flair-FT: this model was trained on JDG using french Flair and FastText embeddings. It is able to predict the four classes presented in the paper (Serial, Weather, Death notice and Stocks). Luxwort_obituary_flair-bpemb: this model was trained on Luxwort using multilingual Flair and Byte-pair embeddings. It is able to predict the Death notice class. Luxwort_obituary_flair-FT_indomain: this model was trained on Luxwort using in-domain Flair and FastText embeddings (trained on Luxwort data). It is also able to predict the Death notice class. Those models can be used to predict probabilities on new images using the same code as in the original dhSegment repository. One needs to adjust three parameters to the predict function: 1) embeddings_path (the path to the embeddings list), 2) embeddings_map_path(the path to the compressed embedding map), and 3) embeddings_dim (the size of the embeddings). Please refer to the paper for further information or contact us. 3. CODE: https://github.com/dhlab-epfl/dhSegment-text 4. ACKNOWLEDGEMENTS We warmly thank the journal Le Temps (owner of La Gazette de Lausanne and the Journal de Genève) and the group ArcInfo (owner of L'Impartial) for accepting to share the related datasets for academic purposes. We also thank the National Library of Luxembourg for its support with all steps related to the Luxemburger Wort annotation release. This work was realized in the context of the impresso - Media Monitoring of the Past project and supported by the Swiss National Science Foundation under grant CR- SII5_173719. 5. CONTACT Maud Ehrmann (EPFL-DHLAB) Simon Clematide (UZH)
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2023Open Access EnglishAuthors:OSDG; UNDP IICPSD SDG AI Lab; PPMI;OSDG; UNDP IICPSD SDG AI Lab; PPMI;Publisher: Zenodo
The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, which were validated by approximately 1,000 OSDG Community Platform (OSDG-CP) citizen scientists from over 110 countries, with respect to the Sustainable Development Goals (SDGs). Dataset Information In support of the global effort to achieve the Sustainable Development Goals (SDGs), OSDG is realising a series of SDG-labelled text datasets. The OSDG Community Dataset (OSDG-CD) is the direct result of the work of more than 1,000 volunteers from over 110 countries who have contributed to our understanding of SDGs via the OSDG Community Platform (OSDG-CP). The dataset contains tens of thousands of text excerpts (henceforth: texts) which were validated by the Community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches. 📘 The file contains 40,067 text excerpts and a total of 277,524 assigned labels. Source Data The dataset consists of paragraph-length text excerpts derived from publicly available documents, including reports, policy documents and publication abstracts. A significant number of documents (more than 3,000) originate from UN-related sources such as SDG-Pathfinder and SDG Library. These sources often contain documents that already have SDG labels associated with them. Each text is comprised of 3 to 6 sentences and is about 90 words on average. Methodology All the texts are evaluated by volunteers on the OSDG-CP. The platform is an ambitious attempt to bring together researchers, subject-matter experts and SDG advocates from all around the world to create a large and accurate source of textual information on the SDGs. The Community volunteers use the platform to participate in labelling exercises where they validate each text's relevance to SDGs based on their background knowledge. In each exercise, the volunteer is shown a text together with an SDG label associated with it – this usually comes from the source – and asked to either accept or reject the suggested label. There are 3 types of exercises: All volunteers start with the mandatory introductory exercise that consists of 10 pre-selected texts. Each volunteer must complete this exercise before they can access 2 other exercise types. Upon completion, the volunteer reviews the exercise by comparing their answers with the answers of the rest of the Community using aggregated statistics we provide, i.e., the share of those who accepted and rejected the suggested SDG label for each of the 10 texts. This helps the volunteer to get a feel for the platform. SDG-specific exercises where the volunteer validates texts with respect to a single SDG, e.g., SDG 1 No Poverty. All SDGs exercise where the volunteer validates a random sequence of texts where each text can have any SDG as its associated label. After finishing the introductory exercise, the volunteer is free to select either SDG-specific or All SDGs exercises. Each exercise, regardless of its type, consists of 100 texts. Once the exercise is finished, the volunteer can either label more texts or exit the platform. Of course, the volunteer can finish the exercise early. All progress is saved and recorded still. To ensure quality, each text is validated by up to 9 different volunteers and all texts included in the public release of the data have been validated by at least 3 different volunteers. It is worth keeping in mind that all exercises present the volunteers with a binary decision problem, i.e., either accept or reject a suggested label. The volunteers are never asked to select one or more SDGs that a certain text might relate to. The rationale behind this set-up is that asking a volunteer to select from 17 SDGs is extremely inefficient. Currently, all texts are validated against only one associated SDG label. Column Description doi - Digital Object Identifier of the original document text_id - unique text identifier text - text excerpt from the document sdg - the SDG the text is validated against labels_negative - the number of volunteers who rejected the suggested SDG label labels_positive - the number of volunteers who accepted the suggested SDG label agreement - agreement score based on the formula \(agreement = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}}\) Further Information To learn more about the project, please visit the OSDG website and the official GitHub page. Do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. All queries can be directed to community@osdg.ai. This CSV file uses UTF-8 character encoding. For easy access on MS Excel, open the file using Data → From Text/CSV. Please split CSV data into different columns by using a TAB delimiter.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Jamyang Dakpa; Tashi Dhondup; Yeshi Jigme Gangne; Garrett, Edward; Meelen, Marieke; Sonam Wangyal;Jamyang Dakpa; Tashi Dhondup; Yeshi Jigme Gangne; Garrett, Edward; Meelen, Marieke; Sonam Wangyal;Publisher: Zenodo
This is a small hand-annotated partial treebank of Modern Tibetan, primarily in CoNLL-U format. Some texts were POS-tagged by machine, and then dependency relations between verbs and their arguments were added by hand. Other texts include only dependency relations and relevant POS-tags. A number of the texts have English translations which have been manually aligned to the Tibetan text. This work was created as part of the AHRC-funded project Lexicography in Motion (PI Ulrich Pagel, 2017-2021). Funded by the UK's Arts and Humanities Research Council (grant code: AH/P004644/1)
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.
1,431 Research products, page 1 of 144
Loading
- Research data . 2016 . Embargo End Date: 23 Jul 2020Open Access EnglishAuthors:Nakao, Hisashi; Tamura, Kohei; Arimatsu, Yui; Nakagawa, Tomomi; Matsumoto, Naoko; Matsugi, Takehiko;Nakao, Hisashi; Tamura, Kohei; Arimatsu, Yui; Nakagawa, Tomomi; Matsumoto, Naoko; Matsugi, Takehiko;Publisher: Dryad
Whether man is predisposed to lethal violence, ranging from homicide to warfare, and how that may have impacted human evolution, are among the most controversial topics of debate on human evolution. Although recent studies on the evolution of warfare have been based on various archaeological and ethnographic data, they have reported mixed results: it is unclear whether or not warfare among prehistoric hunter–gatherers was common enough to be a component of human nature and a selective pressure for the evolution of human behaviour. This paper reports the mortality attributable to violence, and the spatio-temporal pattern of violence thus shown among ancient hunter–gatherers using skeletal evidence in prehistoric Japan (the Jomon period: 13 000 cal BC–800 cal BC). Our results suggest that the mortality due to violence was low and spatio-temporally highly restricted in the Jomon period, which implies that violence including warfare in prehistoric Japan was not common. ESM for Violence in Japanese prehistoryDefinition of and sources of data for injured individuals in the Jomon period, and detailed information of all sites where skeletal remains have been recovered.supplement_corrected_final.docx
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2016Open Access GermanAuthors:Hagmann, Dominik; Langendorf, Alarich; Steininger, Andreas;Hagmann, Dominik; Langendorf, Alarich; Steininger, Andreas;Publisher: Zenodo
2015's 3d model of the interior of the so-called witch tower at the south easteren corner of the outer fortifications of Ulmerfeld Castle. The model was made using 3d photogrammetry (image based modeling) and mast aerial photography.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2018Open AccessAuthors:Bonnie, Rick;Bonnie, Rick;Publisher: Zenodo
This open dataset lists, describes, and provides relevant bibliography to all known archaeological sites in Galilee with evidence for stone vessels. It forms part of the dataset used in the monograph Being Jewish in Galilee, 100–200 CE: An Archaeological Study (Brepols). The dataset is available in both PDF and CSV formats. The PDF file provides a detailed description of and bibliography for the evidence of stone vessels at each archaeological site. The CSV file contains the raw data that can be easily imported into spreadsheets and databases.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Luis Naranjo-Zeledón;Luis Naranjo-Zeledón;Publisher: Zenodo
These files contain the calculations of phonological proximity and validations with users carried out on a subset of the Costa Rican Sign Language (LESCO, for its acronym in Spanish). The signs corresponding to the alphabet have been used.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Open AccessAuthors:Mengual, Ximo; Bot, Sander; Chkhartishvili, Tinatin; Reimann, André; Thormann, Jana; von der Mark, Laura;Mengual, Ximo; Bot, Sander; Chkhartishvili, Tinatin; Reimann, André; Thormann, Jana; von der Mark, Laura;Publisher: Zenodo
: Data type: molecular data
- Research data . 2020Open Access EnglishAuthors:Giovanni Spitale;Giovanni Spitale;Publisher: Zenodo
The COVID-19 pandemic generated (and keeps generating) a huge corpus of news articles, easily retrievable in Factiva with very targeted queries. This dataset, generated with an ad-hoc parser and NLP pipeline, analyzes the frequency of lemmas and named entities in news articles (in German, French, Italian and English ) regarding Switzerland and COVID-19. The analysis of large bodies of grey literature via text mining and computational linguistics is an increasingly frequent approach to understand the large-scale trends of specific topics. We used Factiva, a news monitoring and search engine developed and owned by Dow Jones, to gather and download all the news articles published between January 2020 and May 2021 on Covid-19 and Switzerland. Due to Factiva's copyright policy, it is not possible to share the original dataset with the exports of the articles' text; however, we can share the results of our work on the corpus. All the information relevant to reproduce the results is provided. Factiva allows a very granular definition of the queries, and moreover has access to full text articles published by the major media outlet of the world. The query has been defined as follows (syntax in bold, explanation in italics): ((coronavirus or Wuhan virus or corvid19 or corvid 19 or covid19 or covid 19 or ncov or novel coronavirus or sars) and (atleast3 coronavirus or atleast3 wuhan or atleast3 corvid* or atleast3 covid* or atleast3 ncov or atleast3 novel or atleast3 corona*)) Keywords for covid19; must appear at least 3 times in the text and ns=(gsars or gout) Subject is “novel coronaviruses” or “outbreaks and epidemics” and “general news” and la=X Language is X (DE, FR, IT, EN) and rst=tmnb Restrict to TMNB (major news and business publications) and wc>300 At least 300 words and date from 20191001 to 20212005 Date interval and re=SWITZ Region is Switzerland It is important to specify some details that characterize the query. The query is not limited to articles published by Swiss media, but to articles regarding Switzerland. The reason is simple: a Swiss user googling for “Schweiz Coronavirus” or for “Coronavirus Ticino” can easily find and read articles published by foreign media outlets (namely, German or Italian) on that topic. If the objective is capturing and describing the information trends to which people are exposed, this approach makes much more sense than limiting the analysis to articles published by Swiss media. Factiva’s field “NS” is a descriptor for the content of the article. “gsars” is defined in Factiva’s documentation as “All news on Severe Acute Respiratory Syndrome”, and “gout” as “The widespread occurrence of an infectious disease affecting many people or animals in a given population at the same time”; however, the way these descriptors are assigned to articles is not specified in the documentation. Finally, the query has been restricted to major news and business publications of at least 300 words. Duplicate check is performed by Factiva. Given the incredibly large amount of articles published on COVID-19, this (absolutely arbitrary) restriction allows retrieving a corpus that is both meaningful and manageable. metadata.xlsx contains information about the articles retrieved (strategy, amount) This work is part of the PubliCo research project. This work is part of the PubliCo research project, supported by the Swiss National Science Foundation (SNF). Project no. 31CA30_195905
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2015Open Access
In 2014-2015, Caddo vessels from the Tuck Carpenter (41CP5) collection were scanned at the Center for Regional Heritage Research. These scans were generated for use in a study of 3D geometric morphometrics and for public outreach. Many thanks to the Caddo Nation of Oklahoma and the Anthropology and Archaeology Laboratory for the requisite permissions and access.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Barman, Raphaël; Ehrmann, Maud; Clematide, Simon; Ares Oliveira, Sofia;Barman, Raphaël; Ehrmann, Maud; Clematide, Simon; Ares Oliveira, Sofia;Publisher: ZenodoCountry: Switzerland
This record contains the datasets and models used and produced for the work reported in the paper "Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers" (link). Please cite this paper if you are using the models/datasets or find it relevant to your research: @article{barman_combining_2020, title = {{Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers}}, author = {Raphaël Barman and Maud Ehrmann and Simon Clematide and Sofia Ares Oliveira and Frédéric Kaplan}, journal= {Journal of Data Mining \& Digital Humanities}, volume= {HistoInformatics} DOI = {10.5281/zenodo.4065271}, year = {2021}, url = {https://jdmdh.episciences.org/7097}, } Please note that this record contains data under different licenses. 1. DATA Annotations (json files): JSON files contains image annotations, with one file per newspaper containing region annotations (label and coordinates) in VIA format. The following licenses apply: luxwort.json: those annotations are under a CC0 1.0 license. Please refer to the right statement specified for each image in the file. GDL.json, IMP.json and JDG.json: those annotations are under a CC BY-SA 4.0 license. Image files: The archive images.zip contains the Swiss titles image files (GDL, IMP, JDG) used for the experiments described in the paper. Those images are under copyright (property of the journal Le Temps and of ArcInfo) and can be used for academic research or educational purposes only. Redistribution, publication or commercial use are not permitted. These terms of use are similar to the following right statement: http://rightsstatements.org/vocab/InC-EDU/1.0/ 2. MODELS Some of the best models are released under a CC BY-SA 4.0 license (they are also available as assets of the current Github release). JDG_flair-FT: this model was trained on JDG using french Flair and FastText embeddings. It is able to predict the four classes presented in the paper (Serial, Weather, Death notice and Stocks). Luxwort_obituary_flair-bpemb: this model was trained on Luxwort using multilingual Flair and Byte-pair embeddings. It is able to predict the Death notice class. Luxwort_obituary_flair-FT_indomain: this model was trained on Luxwort using in-domain Flair and FastText embeddings (trained on Luxwort data). It is also able to predict the Death notice class. Those models can be used to predict probabilities on new images using the same code as in the original dhSegment repository. One needs to adjust three parameters to the predict function: 1) embeddings_path (the path to the embeddings list), 2) embeddings_map_path(the path to the compressed embedding map), and 3) embeddings_dim (the size of the embeddings). Please refer to the paper for further information or contact us. 3. CODE: https://github.com/dhlab-epfl/dhSegment-text 4. ACKNOWLEDGEMENTS We warmly thank the journal Le Temps (owner of La Gazette de Lausanne and the Journal de Genève) and the group ArcInfo (owner of L'Impartial) for accepting to share the related datasets for academic purposes. We also thank the National Library of Luxembourg for its support with all steps related to the Luxemburger Wort annotation release. This work was realized in the context of the impresso - Media Monitoring of the Past project and supported by the Swiss National Science Foundation under grant CR- SII5_173719. 5. CONTACT Maud Ehrmann (EPFL-DHLAB) Simon Clematide (UZH)
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2023Open Access EnglishAuthors:OSDG; UNDP IICPSD SDG AI Lab; PPMI;OSDG; UNDP IICPSD SDG AI Lab; PPMI;Publisher: Zenodo
The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, which were validated by approximately 1,000 OSDG Community Platform (OSDG-CP) citizen scientists from over 110 countries, with respect to the Sustainable Development Goals (SDGs). Dataset Information In support of the global effort to achieve the Sustainable Development Goals (SDGs), OSDG is realising a series of SDG-labelled text datasets. The OSDG Community Dataset (OSDG-CD) is the direct result of the work of more than 1,000 volunteers from over 110 countries who have contributed to our understanding of SDGs via the OSDG Community Platform (OSDG-CP). The dataset contains tens of thousands of text excerpts (henceforth: texts) which were validated by the Community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches. 📘 The file contains 40,067 text excerpts and a total of 277,524 assigned labels. Source Data The dataset consists of paragraph-length text excerpts derived from publicly available documents, including reports, policy documents and publication abstracts. A significant number of documents (more than 3,000) originate from UN-related sources such as SDG-Pathfinder and SDG Library. These sources often contain documents that already have SDG labels associated with them. Each text is comprised of 3 to 6 sentences and is about 90 words on average. Methodology All the texts are evaluated by volunteers on the OSDG-CP. The platform is an ambitious attempt to bring together researchers, subject-matter experts and SDG advocates from all around the world to create a large and accurate source of textual information on the SDGs. The Community volunteers use the platform to participate in labelling exercises where they validate each text's relevance to SDGs based on their background knowledge. In each exercise, the volunteer is shown a text together with an SDG label associated with it – this usually comes from the source – and asked to either accept or reject the suggested label. There are 3 types of exercises: All volunteers start with the mandatory introductory exercise that consists of 10 pre-selected texts. Each volunteer must complete this exercise before they can access 2 other exercise types. Upon completion, the volunteer reviews the exercise by comparing their answers with the answers of the rest of the Community using aggregated statistics we provide, i.e., the share of those who accepted and rejected the suggested SDG label for each of the 10 texts. This helps the volunteer to get a feel for the platform. SDG-specific exercises where the volunteer validates texts with respect to a single SDG, e.g., SDG 1 No Poverty. All SDGs exercise where the volunteer validates a random sequence of texts where each text can have any SDG as its associated label. After finishing the introductory exercise, the volunteer is free to select either SDG-specific or All SDGs exercises. Each exercise, regardless of its type, consists of 100 texts. Once the exercise is finished, the volunteer can either label more texts or exit the platform. Of course, the volunteer can finish the exercise early. All progress is saved and recorded still. To ensure quality, each text is validated by up to 9 different volunteers and all texts included in the public release of the data have been validated by at least 3 different volunteers. It is worth keeping in mind that all exercises present the volunteers with a binary decision problem, i.e., either accept or reject a suggested label. The volunteers are never asked to select one or more SDGs that a certain text might relate to. The rationale behind this set-up is that asking a volunteer to select from 17 SDGs is extremely inefficient. Currently, all texts are validated against only one associated SDG label. Column Description doi - Digital Object Identifier of the original document text_id - unique text identifier text - text excerpt from the document sdg - the SDG the text is validated against labels_negative - the number of volunteers who rejected the suggested SDG label labels_positive - the number of volunteers who accepted the suggested SDG label agreement - agreement score based on the formula \(agreement = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}}\) Further Information To learn more about the project, please visit the OSDG website and the official GitHub page. Do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. All queries can be directed to community@osdg.ai. This CSV file uses UTF-8 character encoding. For easy access on MS Excel, open the file using Data → From Text/CSV. Please split CSV data into different columns by using a TAB delimiter.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Research data . 2021Open AccessAuthors:Jamyang Dakpa; Tashi Dhondup; Yeshi Jigme Gangne; Garrett, Edward; Meelen, Marieke; Sonam Wangyal;Jamyang Dakpa; Tashi Dhondup; Yeshi Jigme Gangne; Garrett, Edward; Meelen, Marieke; Sonam Wangyal;Publisher: Zenodo
This is a small hand-annotated partial treebank of Modern Tibetan, primarily in CoNLL-U format. Some texts were POS-tagged by machine, and then dependency relations between verbs and their arguments were added by hand. Other texts include only dependency relations and relevant POS-tags. A number of the texts have English translations which have been manually aligned to the Tibetan text. This work was created as part of the AHRC-funded project Lexicography in Motion (PI Ulrich Pagel, 2017-2021). Funded by the UK's Arts and Humanities Research Council (grant code: AH/P004644/1)
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product.