Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
981 Research products, page 1 of 99

  • Digital Humanities and Cultural Heritage
  • Research data
  • Research software
  • Other research products
  • 2018-2022
  • Dataset
  • ZENODO
  • Digital Humanities and Cultural Heritage

10
arrow_drop_down
Relevance
arrow_drop_down
  • Open Access English
    Authors: 
    Julien A. Raemy;
    Publisher: Zenodo

    Anonymised version of the survey results on alternative aggregation mechanisms. The online survey was conducted through Google Forms and was available from April 20 to May 8, 2020 as part of a master’s thesis in Information Science. The main objective of this survey was to gauge the awareness, interest, and use of technologies other than OAI-PMH for (meta)data aggregation. The main target audiences of the survey were the data providers and the aggregators of the Europeana network, albeit it should be noted that it was also open to other organisations and individuals working in the cultural heritage field. Another goal of the survey was to identify possible pilot experiments that Europeana could conduct with interested organisations.

  • Open Access English
    Authors: 
    Spitale, Giovanni;
    Publisher: Zenodo

    The COVID-19 pandemic generated (and keeps generating) a huge corpus of news articles, easily retrievable in Factiva with very targeted queries. This dataset, generated with an ad-hoc parser and NLP pipeline, analyzes the frequency of lemmas and named entities in news articles (in German, French, Italian and English ) regarding Switzerland and COVID-19. The analysis of large bodies of grey literature via text mining and computational linguistics is an increasingly frequent approach to understand the large-scale trends of specific topics. We used Factiva, a news monitoring and search engine developed and owned by Dow Jones, to gather and download all the news articles published between January 2020 and May 2021 on Covid-19 and Switzerland. Due to Factiva's copyright policy, it is not possible to share the original dataset with the exports of the articles' text; however, we can share the results of our work on the corpus. All the information relevant to reproduce the results is provided. Factiva allows a very granular definition of the queries, and moreover has access to full text articles published by the major media outlet of the world. The query has been defined as follows (syntax in bold, explanation in italics): ((coronavirus or Wuhan virus or corvid19 or corvid 19 or covid19 or covid 19 or ncov or novel coronavirus or sars) and (atleast3 coronavirus or atleast3 wuhan or atleast3 corvid* or atleast3 covid* or atleast3 ncov or atleast3 novel or atleast3 corona*)) Keywords for covid19; must appear at least 3 times in the text and ns=(gsars or gout) Subject is “novel coronaviruses” or “outbreaks and epidemics” and “general news” and la=X Language is X (DE, FR, IT, EN) and rst=tmnb Restrict to TMNB (major news and business publications) and wc>300 At least 300 words and date from 20191001 to 20212005 Date interval and re=SWITZ Region is Switzerland It is important to specify some details that characterize the query. The query is not limited to articles published by Swiss media, but to articles regarding Switzerland. The reason is simple: a Swiss user googling for “Schweiz Coronavirus” or for “Coronavirus Ticino” can easily find and read articles published by foreign media outlets (namely, German or Italian) on that topic. If the objective is capturing and describing the information trends to which people are exposed, this approach makes much more sense than limiting the analysis to articles published by Swiss media. Factiva’s field “NS” is a descriptor for the content of the article. “gsars” is defined in Factiva’s documentation as “All news on Severe Acute Respiratory Syndrome”, and “gout” as “The widespread occurrence of an infectious disease affecting many people or animals in a given population at the same time”; however, the way these descriptors are assigned to articles is not specified in the documentation. Finally, the query has been restricted to major news and business publications of at least 300 words. Duplicate check is performed by Factiva. Given the incredibly large amount of articles published on COVID-19, this (absolutely arbitrary) restriction allows retrieving a corpus that is both meaningful and manageable. metadata.xlsx contains information about the articles retrieved (strategy, amount) This work is part of the PubliCo research project. This work is part of the PubliCo research project, supported by the Swiss National Science Foundation (SNF). Project no. 31CA30_195905

  • Research data . 2021
    Open Access
    Authors: 
    Evdokia Saiti; Theoharis Theoharis;
    Publisher: Zenodo
    Project: EC | CHANGE (813789)

    This is the ECHO dataset presented in 'Cross-time registration of 3D point clouds' (https://www.sciencedirect.com/science/article/pii/S0097849321001357). The dataset consists of eroded cultura heritage objects, transformed randomly in 3D space. It is seperated into two datasets, the datasetCulture and the datasetShape. Each dataset contains, the Source object, the Target objects (the transformed and the eroded objects) and the ground truth transformation.

  • Open Access
    Authors: 
    Kilpatrick, Shelby Kerrin; Gibbs, Jason; Mikulas, Martin M.; Spichiger, Sven-Erik; Ostiguy, Nancy; Biddinger, David J.; Lopez-Uribe, Margarita M.;
    Publisher: Zenodo

    Pennsylvanian Andrenidae specimen records from BugGuide

  • Open Access German
    Authors: 
    Sander, Marie; Havemann, Frank;
    Publisher: Zenodo

    Wir präsentieren ein Digitalisat eines Manuskripts von Hans Havemann (1887–1985), in welchem er seine Reise nach Italien im Jahr 1907 beschreibt. Er wanderte als Student der Philosophie zusammen mit dem Maler und Erfinder Ernst Neumann-Neander (1871–1954) von Genua über Pisa und Siena nach Perugia und verfasste darüber diesen literarischen Text, den er aber nicht veröffentlichen konnte. Als Urenkelin und Enkel wanderten wir die Route der Beiden nach und transkribierten dafür das Manuskript. Das Digitalisat erscheint in drei Stufen: Bilder (PDF, DIN-A4) des Deckblatts und der 198 Manuskriptseiten komprimiert in der Datei scan.zip, zeilentreues Transkript als ASCII-Datei (Kodierung: ISO-Latin 1), die als LaTeX-Quelldatei dient: Transkript-Havemann-Italien.tex, PDF-Datei der Druckvorlage für eine Buchveröffentlichung (DIN-A5): Havemann-Italienische-Reise-Doemitz-Druckvorlage.pdf. Mittlerweile haben wir den Text als Buch herausgegeben (s. Referenz) und bereiten zusammen mit Petra und Reinhold Kraft eine Ausstellung in ihrer Galerie in Dömitz vor, in der Bilder, die E. Neumann-Neander während der Reise 1907 anfertigte, und Textteile aus der Reisebeschreibung von H. Havemann gezeigt werden (vgl. Datei Handzettel-Italienische-Reisen-v5.pdf). In den Nachbemerkungen des Buches sind die Regeln nachzulesen, nach denen wir den Text transkribiert haben. Außerdem editorische und historische Anmerkungen sowie persönliche Erinnerungen. Im Anhang befinden sich drei Transkripte: 1. eines Briefs von Hans Havemann an Martin Buber (vom 19. 10. 1911), 2. einer Notiz von Ernst Neumann-Neander in Siena (18. April 1907), und 3. eines Tagebucheintrags von Ernst Neumann-Neander (13. März 1950). Alle drei Dokumente und zwei farbige Skizzen, die Neumann 1907 in Genua bzw. Siena anfertigte, sind als Bilddateien deponiert (Havemann-Buber-Okt.1911.pdf, Neumann-Notiz-Siena.tiff, Neumann1950.pdf, Neumann1907Genua.tiff, Neumann1907Siena.tiff). Deponiert sind außerdem die LaTeX-Quelldatei für die Druckvorlage (Havemann-Italienische-Reise-Doemitz-Druckvorl.tex), die von Paul Berthold erstellte Datei für das Transkriptions-usepackage (italien190x.sty) sowie die verwendete bibtex-Datei (Havemann.bib). Außerdem das im Buch abgedruckte Foto von Hans Havemann (Hans-Havemann-1905-1906.jpg). {"references": ["Havemann, Hans (2022). Italienische Reise. Herausgegeben von Marie Sander und Frank Havemann. Verlag: Atelier Kraft, D\u00f6mitz"]}

  • Open Access
    Authors: 
    Hellrich, Johannes; Buechel, Sven; Hahn, Udo;
    Publisher: Zenodo

    Models for diachronic lexical semantics used by the Jena Semantic Explorer (JeSemE) web site described in our COLING 2018 paper "JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion". Also described and applied in Johannes Hellrich's Ph.D. thesis "Word Embeddings: Reliability & Semantic Change" who was funded by the Deutsche Forschungsgemeinschaft (DFG) within the graduate school "The Romantic Model" (GRK 2041/1). One ZIP file per corpus, each containing several CSV files: CHI.csv with χ2 word association values (structure: word-id, word-id, time, value) EMBEDDING.csv with SVD-PPMI word embeddings (aligned; structure: word-id, time, values) EMOTION.csv with VAD word emotion values (structure: word-id, time, values) FREQUENCY.csv with relative word frequency values (structure: word-id, time, value) PPMI.csv with PPMI word association values (structure: word-id, word-id, time, value) SIMILARITY.csv with word embedding derived word similarity values (structure: word-id, word-id, time, value) WORDIDS.csv mapping words to their corpus specific IDs Corpora are: coha: Corpus of Historical American English dta: Deutsches Textarchiv 'German Text Archive' google_fiction: Google Books N-Gram corpus, English fiction subcorpus google_german: Google Books N-Gram corpus, German subcorpus rsc: Royal Society Corpus

  • Research data . 2020
    Open Access English
    Authors: 
    Christie, Heather; Piscitelli, Matthew; De Los Ríos Farfán, Gabriela; Opitz, Rachel; Simon, Katie; Menzer, Jeremy; Williford, Carl;
    Publisher: Zenodo

    Caballete is a Late Archaic (3000 - 1800 BC) site located in a valley off the north bank of the Fortaleza River, approximately 8km from the Pacific coast. It consists of multiple lesser mounds and a single larger mound. One of the mounds sits as an outlier to the southeast while the other five form a U-shaped pattern on the terrace. The outlier is believed to be the earliest attempt at mound construction on the site. Three of the other mounds are associated with sunken circular plazas, one of which is outlined by huancas, or large standing stones. The aim of this project was to identify buried architecture for subsequent, targeted excavation at a Late Archaic ceremonial site in Peru. Two of the mounds (B and C) were surveyed using GPR, photogrammetry and magnetometry (Mound B only), with surveys occurring on the top and flank of Mound B. A low mound referred to as "Center Area" and the sunken circular plaza associated with the huancas and Mound A (the principal mound of Caballete) were surveyed with GPR, magnetometry and photogrammetry as well. Finally, a campsite area (20 x 40m) located to the northwest of the site's U-shaped configuration of features was surveyed using magnetometry. The site contains characteristics of typical campsite areas and, due to the ephemeral nature of campsites and general lack of permanent structures, was not surveyed with GPR or photogrammetry. This upload contains the raw data for ground-penetrating radar survey done during the field season at Caballete in 2015. SPARC is currently supported by NSF Award #1822110. Funding for this fieldwork was provided in part by a grant from the National Geographic Society.

  • Open Access
    Authors: 
    Naranjo-Zeledón, Luis;
    Publisher: Zenodo

    These files contain the calculations of phonological proximity and validations with users carried out on a subset of the Costa Rican Sign Language (LESCO, for its acronym in Spanish). The signs corresponding to the alphabet have been used.

  • Open Access English

    The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, which were validated by approximately 1,000 OSDG Community Platform (OSDG-CP) citizen scientists from over 110 countries, with respect to the Sustainable Development Goals (SDGs). Dataset Information In support of the global effort to achieve the Sustainable Development Goals (SDGs), OSDG is realising a series of SDG-labelled text datasets. The OSDG Community Dataset (OSDG-CD) is the direct result of the work of more than 1,000 volunteers from over 110 countries who have contributed to our understanding of SDGs via the OSDG Community Platform (OSDG-CP). The dataset contains tens of thousands of text excerpts (henceforth: texts) which were validated by the Community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches. 📘 The file contains 37,575 text excerpts and a total of 237,076 assigned labels. ⚠️ IMPORTANT: For the first time, the OSDG-CD includes 5,105 texts validated with respect to SDG 16. To accelerate data collection throughout a part of this quarter, volunteers were only able to assess this particular SDG. The efforts for the remaining 15 SDGs are temporarily paused, but should be resumed in time for the upcoming release (scheduled for January 2023). Source Data The dataset consists of paragraph-length text excerpts derived from publicly available documents, including reports, policy documents and publication abstracts. A significant number of documents (more than 3,000) originate from UN-related sources such as SDG-Pathfinder and SDG Library. These sources often contain documents that already have SDG labels associated with them. Each text is comprised of 3 to 6 sentences and is about 90 words on average. Methodology All the texts are evaluated by volunteers on the OSDG-CP. The platform is an ambitious attempt to bring together researchers, subject-matter experts and SDG advocates from all around the world to create a large and accurate source of textual information on the SDGs. The Community volunteers use the platform to participate in labelling exercises where they validate each text's relevance to SDGs based on their background knowledge. In each exercise, the volunteer is shown a text together with an SDG label associated with it – this usually comes from the source – and asked to either accept or reject the suggested label. There are 3 types of exercises: All volunteers start with the mandatory introductory exercise that consists of 10 pre-selected texts. Each volunteer must complete this exercise before they can access 2 other exercise types. Upon completion, the volunteer reviews the exercise by comparing their answers with the answers of the rest of the Community using aggregated statistics we provide, i.e., the share of those who accepted and rejected the suggested SDG label for each of the 10 texts. This helps the volunteer to get a feel for the platform. SDG-specific exercises where the volunteer validates texts with respect to a single SDG, e.g., SDG 1 No Poverty. All SDGs exercise where the volunteer validates a random sequence of texts where each text can have any SDG as its associated label. After finishing the introductory exercise, the volunteer is free to select either SDG-specific or All SDGs exercises. Each exercise, regardless of its type, consists of 100 texts. Once the exercise is finished, the volunteer can either label more texts or exit the platform. Of course, the volunteer can finish the exercise early. All progress is saved and recorded still. To ensure quality, each text is validated by up to 9 different volunteers and all texts included in the public release of the data have been validated by at least 3 different volunteers. It is worth keeping in mind that all exercises present the volunteers with a binary decision problem, i.e., either accept or reject a suggested label. The volunteers are never asked to select one or more SDGs that a certain text might relate to. The rationale behind this set-up is that asking a volunteer to select from 17 SDGs is extremely inefficient. Currently, all texts are validated against only one associated SDG label. Column Description doi - Digital Object Identifier of the original document text_id - unique text identifier text - text excerpt from the document sdg - the SDG the text is validated against labels_negative - the number of volunteers who rejected the suggested SDG label labels_positive - the number of volunteers who accepted the suggested SDG label agreement - agreement score based on the formula \(agreement = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}}\) Further Information To learn more about the project, please visit the OSDG website and the official GitHub page. Do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. All queries can be directed to community@osdg.ai. This CSV file uses UTF-8 character encoding. For easy access on MS Excel, open the file using Data → From Text/CSV. Please split CSV data into different columns by using a TAB delimiter.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
981 Research products, page 1 of 99
  • Open Access English
    Authors: 
    Julien A. Raemy;
    Publisher: Zenodo

    Anonymised version of the survey results on alternative aggregation mechanisms. The online survey was conducted through Google Forms and was available from April 20 to May 8, 2020 as part of a master’s thesis in Information Science. The main objective of this survey was to gauge the awareness, interest, and use of technologies other than OAI-PMH for (meta)data aggregation. The main target audiences of the survey were the data providers and the aggregators of the Europeana network, albeit it should be noted that it was also open to other organisations and individuals working in the cultural heritage field. Another goal of the survey was to identify possible pilot experiments that Europeana could conduct with interested organisations.

  • Open Access English
    Authors: 
    Spitale, Giovanni;
    Publisher: Zenodo

    The COVID-19 pandemic generated (and keeps generating) a huge corpus of news articles, easily retrievable in Factiva with very targeted queries. This dataset, generated with an ad-hoc parser and NLP pipeline, analyzes the frequency of lemmas and named entities in news articles (in German, French, Italian and English ) regarding Switzerland and COVID-19. The analysis of large bodies of grey literature via text mining and computational linguistics is an increasingly frequent approach to understand the large-scale trends of specific topics. We used Factiva, a news monitoring and search engine developed and owned by Dow Jones, to gather and download all the news articles published between January 2020 and May 2021 on Covid-19 and Switzerland. Due to Factiva's copyright policy, it is not possible to share the original dataset with the exports of the articles' text; however, we can share the results of our work on the corpus. All the information relevant to reproduce the results is provided. Factiva allows a very granular definition of the queries, and moreover has access to full text articles published by the major media outlet of the world. The query has been defined as follows (syntax in bold, explanation in italics): ((coronavirus or Wuhan virus or corvid19 or corvid 19 or covid19 or covid 19 or ncov or novel coronavirus or sars) and (atleast3 coronavirus or atleast3 wuhan or atleast3 corvid* or atleast3 covid* or atleast3 ncov or atleast3 novel or atleast3 corona*)) Keywords for covid19; must appear at least 3 times in the text and ns=(gsars or gout) Subject is “novel coronaviruses” or “outbreaks and epidemics” and “general news” and la=X Language is X (DE, FR, IT, EN) and rst=tmnb Restrict to TMNB (major news and business publications) and wc>300 At least 300 words and date from 20191001 to 20212005 Date interval and re=SWITZ Region is Switzerland It is important to specify some details that characterize the query. The query is not limited to articles published by Swiss media, but to articles regarding Switzerland. The reason is simple: a Swiss user googling for “Schweiz Coronavirus” or for “Coronavirus Ticino” can easily find and read articles published by foreign media outlets (namely, German or Italian) on that topic. If the objective is capturing and describing the information trends to which people are exposed, this approach makes much more sense than limiting the analysis to articles published by Swiss media. Factiva’s field “NS” is a descriptor for the content of the article. “gsars” is defined in Factiva’s documentation as “All news on Severe Acute Respiratory Syndrome”, and “gout” as “The widespread occurrence of an infectious disease affecting many people or animals in a given population at the same time”; however, the way these descriptors are assigned to articles is not specified in the documentation. Finally, the query has been restricted to major news and business publications of at least 300 words. Duplicate check is performed by Factiva. Given the incredibly large amount of articles published on COVID-19, this (absolutely arbitrary) restriction allows retrieving a corpus that is both meaningful and manageable. metadata.xlsx contains information about the articles retrieved (strategy, amount) This work is part of the PubliCo research project. This work is part of the PubliCo research project, supported by the Swiss National Science Foundation (SNF). Project no. 31CA30_195905

  • Research data . 2021
    Open Access
    Authors: 
    Evdokia Saiti; Theoharis Theoharis;
    Publisher: Zenodo
    Project: EC | CHANGE (813789)

    This is the ECHO dataset presented in 'Cross-time registration of 3D point clouds' (https://www.sciencedirect.com/science/article/pii/S0097849321001357). The dataset consists of eroded cultura heritage objects, transformed randomly in 3D space. It is seperated into two datasets, the datasetCulture and the datasetShape. Each dataset contains, the Source object, the Target objects (the transformed and the eroded objects) and the ground truth transformation.

  • Open Access
    Authors: 
    Kilpatrick, Shelby Kerrin; Gibbs, Jason; Mikulas, Martin M.; Spichiger, Sven-Erik; Ostiguy, Nancy; Biddinger, David J.; Lopez-Uribe, Margarita M.;
    Publisher: Zenodo

    Pennsylvanian Andrenidae specimen records from BugGuide

  • Open Access German
    Authors: 
    Sander, Marie; Havemann, Frank;
    Publisher: Zenodo

    Wir präsentieren ein Digitalisat eines Manuskripts von Hans Havemann (1887–1985), in welchem er seine Reise nach Italien im Jahr 1907 beschreibt. Er wanderte als Student der Philosophie zusammen mit dem Maler und Erfinder Ernst Neumann-Neander (1871–1954) von Genua über Pisa und Siena nach Perugia und verfasste darüber diesen literarischen Text, den er aber nicht veröffentlichen konnte. Als Urenkelin und Enkel wanderten wir die Route der Beiden nach und transkribierten dafür das Manuskript. Das Digitalisat erscheint in drei Stufen: Bilder (PDF, DIN-A4) des Deckblatts und der 198 Manuskriptseiten komprimiert in der Datei scan.zip, zeilentreues Transkript als ASCII-Datei (Kodierung: ISO-Latin 1), die als LaTeX-Quelldatei dient: Transkript-Havemann-Italien.tex, PDF-Datei der Druckvorlage für eine Buchveröffentlichung (DIN-A5): Havemann-Italienische-Reise-Doemitz-Druckvorlage.pdf. Mittlerweile haben wir den Text als Buch herausgegeben (s. Referenz) und bereiten zusammen mit Petra und Reinhold Kraft eine Ausstellung in ihrer Galerie in Dömitz vor, in der Bilder, die E. Neumann-Neander während der Reise 1907 anfertigte, und Textteile aus der Reisebeschreibung von H. Havemann gezeigt werden (vgl. Datei Handzettel-Italienische-Reisen-v5.pdf). In den Nachbemerkungen des Buches sind die Regeln nachzulesen, nach denen wir den Text transkribiert haben. Außerdem editorische und historische Anmerkungen sowie persönliche Erinnerungen. Im Anhang befinden sich drei Transkripte: 1. eines Briefs von Hans Havemann an Martin Buber (vom 19. 10. 1911), 2. einer Notiz von Ernst Neumann-Neander in Siena (18. April 1907), und 3. eines Tagebucheintrags von Ernst Neumann-Neander (13. März 1950). Alle drei Dokumente und zwei farbige Skizzen, die Neumann 1907 in Genua bzw. Siena anfertigte, sind als Bilddateien deponiert (Havemann-Buber-Okt.1911.pdf, Neumann-Notiz-Siena.tiff, Neumann1950.pdf, Neumann1907Genua.tiff, Neumann1907Siena.tiff). Deponiert sind außerdem die LaTeX-Quelldatei für die Druckvorlage (Havemann-Italienische-Reise-Doemitz-Druckvorl.tex), die von Paul Berthold erstellte Datei für das Transkriptions-usepackage (italien190x.sty) sowie die verwendete bibtex-Datei (Havemann.bib). Außerdem das im Buch abgedruckte Foto von Hans Havemann (Hans-Havemann-1905-1906.jpg). {"references": ["Havemann, Hans (2022). Italienische Reise. Herausgegeben von Marie Sander und Frank Havemann. Verlag: Atelier Kraft, D\u00f6mitz"]}

  • Open Access
    Authors: 
    Hellrich, Johannes; Buechel, Sven; Hahn, Udo;
    Publisher: Zenodo

    Models for diachronic lexical semantics used by the Jena Semantic Explorer (JeSemE) web site described in our COLING 2018 paper "JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion". Also described and applied in Johannes Hellrich's Ph.D. thesis "Word Embeddings: Reliability & Semantic Change" who was funded by the Deutsche Forschungsgemeinschaft (DFG) within the graduate school "The Romantic Model" (GRK 2041/1). One ZIP file per corpus, each containing several CSV files: CHI.csv with χ2 word association values (structure: word-id, word-id, time, value) EMBEDDING.csv with SVD-PPMI word embeddings (aligned; structure: word-id, time, values) EMOTION.csv with VAD word emotion values (structure: word-id, time, values) FREQUENCY.csv with relative word frequency values (structure: word-id, time, value) PPMI.csv with PPMI word association values (structure: word-id, word-id, time, value) SIMILARITY.csv with word embedding derived word similarity values (structure: word-id, word-id, time, value) WORDIDS.csv mapping words to their corpus specific IDs Corpora are: coha: Corpus of Historical American English dta: Deutsches Textarchiv 'German Text Archive' google_fiction: Google Books N-Gram corpus, English fiction subcorpus google_german: Google Books N-Gram corpus, German subcorpus rsc: Royal Society Corpus

  • Research data . 2020
    Open Access English
    Authors: 
    Christie, Heather; Piscitelli, Matthew; De Los Ríos Farfán, Gabriela; Opitz, Rachel; Simon, Katie; Menzer, Jeremy; Williford, Carl;
    Publisher: Zenodo

    Caballete is a Late Archaic (3000 - 1800 BC) site located in a valley off the north bank of the Fortaleza River, approximately 8km from the Pacific coast. It consists of multiple lesser mounds and a single larger mound. One of the mounds sits as an outlier to the southeast while the other five form a U-shaped pattern on the terrace. The outlier is believed to be the earliest attempt at mound construction on the site. Three of the other mounds are associated with sunken circular plazas, one of which is outlined by huancas, or large standing stones. The aim of this project was to identify buried architecture for subsequent, targeted excavation at a Late Archaic ceremonial site in Peru. Two of the mounds (B and C) were surveyed using GPR, photogrammetry and magnetometry (Mound B only), with surveys occurring on the top and flank of Mound B. A low mound referred to as "Center Area" and the sunken circular plaza associated with the huancas and Mound A (the principal mound of Caballete) were surveyed with GPR, magnetometry and photogrammetry as well. Finally, a campsite area (20 x 40m) located to the northwest of the site's U-shaped configuration of features was surveyed using magnetometry. The site contains characteristics of typical campsite areas and, due to the ephemeral nature of campsites and general lack of permanent structures, was not surveyed with GPR or photogrammetry. This upload contains the raw data for ground-penetrating radar survey done during the field season at Caballete in 2015. SPARC is currently supported by NSF Award #1822110. Funding for this fieldwork was provided in part by a grant from the National Geographic Society.

  • Open Access
    Authors: 
    Naranjo-Zeledón, Luis;
    Publisher: Zenodo

    These files contain the calculations of phonological proximity and validations with users carried out on a subset of the Costa Rican Sign Language (LESCO, for its acronym in Spanish). The signs corresponding to the alphabet have been used.

  • Open Access English

    The OSDG Community Dataset (OSDG-CD) is a public dataset of thousands of text excerpts, which were validated by approximately 1,000 OSDG Community Platform (OSDG-CP) citizen scientists from over 110 countries, with respect to the Sustainable Development Goals (SDGs). Dataset Information In support of the global effort to achieve the Sustainable Development Goals (SDGs), OSDG is realising a series of SDG-labelled text datasets. The OSDG Community Dataset (OSDG-CD) is the direct result of the work of more than 1,000 volunteers from over 110 countries who have contributed to our understanding of SDGs via the OSDG Community Platform (OSDG-CP). The dataset contains tens of thousands of text excerpts (henceforth: texts) which were validated by the Community volunteers with respect to SDGs. The data can be used to derive insights into the nature of SDGs using either ontology-based or machine learning approaches. 📘 The file contains 37,575 text excerpts and a total of 237,076 assigned labels. ⚠️ IMPORTANT: For the first time, the OSDG-CD includes 5,105 texts validated with respect to SDG 16. To accelerate data collection throughout a part of this quarter, volunteers were only able to assess this particular SDG. The efforts for the remaining 15 SDGs are temporarily paused, but should be resumed in time for the upcoming release (scheduled for January 2023). Source Data The dataset consists of paragraph-length text excerpts derived from publicly available documents, including reports, policy documents and publication abstracts. A significant number of documents (more than 3,000) originate from UN-related sources such as SDG-Pathfinder and SDG Library. These sources often contain documents that already have SDG labels associated with them. Each text is comprised of 3 to 6 sentences and is about 90 words on average. Methodology All the texts are evaluated by volunteers on the OSDG-CP. The platform is an ambitious attempt to bring together researchers, subject-matter experts and SDG advocates from all around the world to create a large and accurate source of textual information on the SDGs. The Community volunteers use the platform to participate in labelling exercises where they validate each text's relevance to SDGs based on their background knowledge. In each exercise, the volunteer is shown a text together with an SDG label associated with it – this usually comes from the source – and asked to either accept or reject the suggested label. There are 3 types of exercises: All volunteers start with the mandatory introductory exercise that consists of 10 pre-selected texts. Each volunteer must complete this exercise before they can access 2 other exercise types. Upon completion, the volunteer reviews the exercise by comparing their answers with the answers of the rest of the Community using aggregated statistics we provide, i.e., the share of those who accepted and rejected the suggested SDG label for each of the 10 texts. This helps the volunteer to get a feel for the platform. SDG-specific exercises where the volunteer validates texts with respect to a single SDG, e.g., SDG 1 No Poverty. All SDGs exercise where the volunteer validates a random sequence of texts where each text can have any SDG as its associated label. After finishing the introductory exercise, the volunteer is free to select either SDG-specific or All SDGs exercises. Each exercise, regardless of its type, consists of 100 texts. Once the exercise is finished, the volunteer can either label more texts or exit the platform. Of course, the volunteer can finish the exercise early. All progress is saved and recorded still. To ensure quality, each text is validated by up to 9 different volunteers and all texts included in the public release of the data have been validated by at least 3 different volunteers. It is worth keeping in mind that all exercises present the volunteers with a binary decision problem, i.e., either accept or reject a suggested label. The volunteers are never asked to select one or more SDGs that a certain text might relate to. The rationale behind this set-up is that asking a volunteer to select from 17 SDGs is extremely inefficient. Currently, all texts are validated against only one associated SDG label. Column Description doi - Digital Object Identifier of the original document text_id - unique text identifier text - text excerpt from the document sdg - the SDG the text is validated against labels_negative - the number of volunteers who rejected the suggested SDG label labels_positive - the number of volunteers who accepted the suggested SDG label agreement - agreement score based on the formula \(agreement = \frac{|labels_{positive} - labels_{negative}|}{labels_{positive} + labels_{negative}}\) Further Information To learn more about the project, please visit the OSDG website and the official GitHub page. Do not hesitate to share with us your outputs, be it a research paper, a machine learning model, a blog post, or just an interesting observation. All queries can be directed to community@osdg.ai. This CSV file uses UTF-8 character encoding. For easy access on MS Excel, open the file using Data → From Text/CSV. Please split CSV data into different columns by using a TAB delimiter.