Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
2,135 Research products, page 1 of 214

  • Digital Humanities and Cultural Heritage
  • Research data
  • Research software
  • Other research products
  • 2018-2022
  • ZENODO

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Research data . Audiovisual . 2022
    Open Access German
    Authors: 
    van Oorschot, Frederike;
    Publisher: Zenodo

    Der Vortrag entfaltet hermeneutische und epistemologische Fragen, die durch die sich entwickelnde digitale Forschung in den Geisteswissenschaften (Digital Humanities, DH) ausgelöst werden. Im Vordergrund steht die Skizze einer auf die DH bezogenen Wissenschaftsphilosophie anhand der folgenden Leitfragen: Was bedeutet das Narrativ einer „neuen“ Wissenschaft? Wer ist Subjekt der DH? Wo finden sich neue epistemische Logiken? Was ist das forschungspolitische Setting der DH? Und wie verhält sich dies alles zum Selbstverständnis „klassischer“ Geisteswissenschaften? Dabei zielt der Vortrag auf eine „digitale Hermeneutik“ in den Geisteswissenschaften.

  • Open Access Polish
    Authors: 
    Aleksandra Kubiak-Schneider; Aleksandra Sulikowska;
    Publisher: Zenodo

    It is a short biographical note for the Digital National Museum in Warsaw for the jubilee of 160 years of existence.

  • Open Access English
    Authors: 
    Dhrangadhariya, Anjani; Müller, Henning;
    Publisher: Dryad

    This upload contains four main zip files. ds_cto_dict.zip: This zip file contains the four distant supervision dictionaries (P: participant.txt, I = intervention.txt, intervetion_syn.txt, O: outcome.txt) generated from clinicaltrials.gov using the Methodology described in Distant-CTO (https://aclanthology.org/2022.bionlp-1.34/). These dictionaries were used to create distant supervision labelling functions as described in the Labelling sources subsection of the Methodology. The data was derived from https://clinicaltrials.gov/ handcrafted_dictionaries.zip: This zip folder contains three files 1) gender_sexuality.txt: a list of possible genders and sexual orientations found across the web. The list needs to be more comprehensive. 2) endpoints_dict.txt: contains outcome names and the names of questionnaires used to measure outcomes assembled from PROM questionnaires and PROMs. and 3) comparator_dict: contains a list of idiosyncratic comparator terms like a sham, saline, placebo, etc., compiled from the literature search. The list needs to be more comprehensive. test_ebm_correctedlabels.tsv: EBM-PICO is a widely used dataset with PICO annotations at two levels: span-level or coarse-grained and entity-level or fine-grained. Span-level annotations encompass the full information about each class. Entity-level annotations cover the more fine-grained information at the entity level, with PICO classes further divided into fine-grained subclasses. For example, the coarse-grained Participant span is further divided into participant age, gender, condition and sample size in the randomised controlled trial. This dataset comes pre-divided into a training set (n=4,933) annotated through crowd-sourcing and an expert annotated gold test set (n=191) for evaluation. The EBM-PICO annotation guidelines caution about variable annotation quality. Abaho et al. developed a framework to post-hoc correct EBM-PICO outcomes annotation inconsistencies. Lee et al. studied annotation span disagreements suggesting variability across the annotators. Low annotation quality in the training dataset is excusable, but the errors in the test set can lead to faulty evaluation of the downstream ML methods. We evaluate 1% of the EBM-PICO training set tokens to gauge the possible reasons for the fine-grained labelling errors and use this exercise to conduct an error-focused PICO re-annotation for the EBM-PICO gold test set. The file 'test_ebm_correctedlabels.tsv' has error corrected EBM-PICO gold test set. This dataset could be used as a complementary evalution set along with EBM-PICO test set. error_analysis.zip: This .zip file contains three .tsv files for each PICO class to identify possible errors in about 1% (about 12,962 tokens) of the EBM-PICO training set. Objective: PICO (Participants, Interventions, Comparators, Outcomes) analysis is vital but time-consuming for conducting systematic reviews (SRs). Supervised machine learning can help fully automate it, but a lack of large annotated corpora limits the quality of automated PICO recognition systems. The largest currently available PICO corpus is manually annotated, which is an approach that is often too expensive for the scientific community to apply. Depending on the specific SR question, PICO criteria are extended to PICOC (C-Context), PICOT (T-timeframe), and PIBOSO (B-Background, S-Study design, O-Other) meaning the static hand-labelled corpora need to undergo costly re-annotation as per the downstream requirements. We aim to test the feasibility of designing a weak supervision system to extract these entities without hand-labelled data. Methodology: We decompose PICO spans into its constituent entities and re-purpose multiple medical and non-medical ontologies and expert-generated rules to obtain multiple noisy labels for these entities. These labels obtained using several sources are then aggregated using simple majority voting and generative modelling approaches. The resulting programmatic labels are used as weak signals to train a weakly-supervised discriminative model and observe performance changes. We explore mistakes in the currently available PICO corpus that could have led to inaccurate evaluation of several automation methods. Results: We present Weak-PICO, a weakly-supervised PICO entity recognition approach using medical and non-medical ontologies, dictionaries and expert-generated rules. Our approach does not use hand-labelled data. Conclusion: Weak supervision using weak-PICO for PICO entity recognition has encouraging results, and the approach can potentially extend to more clinical entities readily. All the datasets could be opened using text editors or Google sheets. The .zip files in the dataset can be opened using the archive utility on Mac OS and unzip functionality in Linux. (All Windows and Apple operating systems support the use of ZIP files without additional third-party software)

  • Restricted English
    Authors: 
    GEN;
    Publisher: Zenodo

    This repository is created for sharing materials (e.g., sample data, trained models, and demo files) for our work. The demo files in the repository allow users to run our models on their own data or on sample data that we provide. The repository includes the following four components: A code demonstration of review text preprocessing. (ReviewPreprocess.zip) The lexicon and a code demonstration of using the lexicon to generate input for the two lexicon-based classification models. (LexiconModels.zip) The trained Doc2Vec model and a code demonstration of obtaining Doc2Vec embeddings using this model. (Doc2VecEmbeddings.zip) Trained base-learner classification models (M2, M3, M4), optimized weights for the ensemble model E2, and the trained ensemble model (E3). We also provide a code demonstration of classifying reviews using our proposed models. (ClassificationModels.zip) The data used for building these models can be requested from the Global Emancipation Network for approved uses established in a data use agreement. This work was funded by the National Science Foundation under award #1936331.

  • Open Access
    Authors: 
    Leonardo Santiago Benitez Pereira;
    Publisher: Zenodo

    Collection of 300 support tickets manually labeled for semantic similarity, obtained from a IT support company in the Florianópolis (Brazil) region. Each ticket is represented by an unstructured text field, which is typed by the user that opened the call. The labeling process was performed in 2022 by three IT support professionals. The corpus contains tickets in many languages, mainly English, German, Portuguese and Spanish. All Personal Identifiable Information (PII) and sensitive information were removed (substituted by a tag indicating the original content, for instance: the sentence "this text was written by Leonardo" is converted to "this text was written by [NAME]"). The removal was performed in three steps: first, the automated machine learning-based tool AWS Comprehend PII Removal was used; then, a sequence of custom regular expressions was applied; last, the entire corpus was manually verified.

  • Research software . 2022
    Open Access English
    Authors: 
    Mähr, Moritz;
    Publisher: Zenodo

    Full Changelog: https://github.com/maehr/the-corpus-as-a-network/commits/v0.1.0-alpha If you use this dataset, please cite it using the metadata from this file.

  • Research software . 2022
    Open Access

    ASReview Insights is an extension to ASReview LAB that extends the software with tools for plotting and extracting the statistical results of several performance metrics. The extension is especially useful in combination with the simulation functionality of ASReview LAB.

  • Open Access
    Authors: 
    Jan Moens; Koen De Groote;
    Publisher: Zenodo

    Bijlage bij 'Moens J. & De Groote K. 2022: Ieper - De Meersen. Deel 2. De studie van het leer', een onderzoeksrapport van het agentschap Onroerend Erfgoed: - volledige inventaris van de leervondsten (als .xlsx-bestand)

  • Open Access English
    Authors: 
    Sarker, Abeed;
    Publisher: Zenodo

    This dataset accompanies the article titled: "Can accurate demographic information about people who use prescription medications non-medically be derived from Twitter?" submitted to PNAS. See the README.txt file for more details.

  • Open Access
    Authors: 
    Helling, Patrick; Borges, Rebekka; Gius, Evelyn;
    Publisher: Zenodo

    Diese Sicherung beinhaltet das Github Repository zur DHd-Jahreskonferenz 2022. Es enthält alle publizierten Konferenzbeiträge im TEI-XML- sowie im PDF-Format, alle publizierten Posterbeiträge im PDF-Format sowie die Metadaten zu allen Beiträgen im XML- und CSV-Format.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
2,135 Research products, page 1 of 214
  • Research data . Audiovisual . 2022
    Open Access German
    Authors: 
    van Oorschot, Frederike;
    Publisher: Zenodo

    Der Vortrag entfaltet hermeneutische und epistemologische Fragen, die durch die sich entwickelnde digitale Forschung in den Geisteswissenschaften (Digital Humanities, DH) ausgelöst werden. Im Vordergrund steht die Skizze einer auf die DH bezogenen Wissenschaftsphilosophie anhand der folgenden Leitfragen: Was bedeutet das Narrativ einer „neuen“ Wissenschaft? Wer ist Subjekt der DH? Wo finden sich neue epistemische Logiken? Was ist das forschungspolitische Setting der DH? Und wie verhält sich dies alles zum Selbstverständnis „klassischer“ Geisteswissenschaften? Dabei zielt der Vortrag auf eine „digitale Hermeneutik“ in den Geisteswissenschaften.

  • Open Access Polish
    Authors: 
    Aleksandra Kubiak-Schneider; Aleksandra Sulikowska;
    Publisher: Zenodo

    It is a short biographical note for the Digital National Museum in Warsaw for the jubilee of 160 years of existence.

  • Open Access English
    Authors: 
    Dhrangadhariya, Anjani; Müller, Henning;
    Publisher: Dryad

    This upload contains four main zip files. ds_cto_dict.zip: This zip file contains the four distant supervision dictionaries (P: participant.txt, I = intervention.txt, intervetion_syn.txt, O: outcome.txt) generated from clinicaltrials.gov using the Methodology described in Distant-CTO (https://aclanthology.org/2022.bionlp-1.34/). These dictionaries were used to create distant supervision labelling functions as described in the Labelling sources subsection of the Methodology. The data was derived from https://clinicaltrials.gov/ handcrafted_dictionaries.zip: This zip folder contains three files 1) gender_sexuality.txt: a list of possible genders and sexual orientations found across the web. The list needs to be more comprehensive. 2) endpoints_dict.txt: contains outcome names and the names of questionnaires used to measure outcomes assembled from PROM questionnaires and PROMs. and 3) comparator_dict: contains a list of idiosyncratic comparator terms like a sham, saline, placebo, etc., compiled from the literature search. The list needs to be more comprehensive. test_ebm_correctedlabels.tsv: EBM-PICO is a widely used dataset with PICO annotations at two levels: span-level or coarse-grained and entity-level or fine-grained. Span-level annotations encompass the full information about each class. Entity-level annotations cover the more fine-grained information at the entity level, with PICO classes further divided into fine-grained subclasses. For example, the coarse-grained Participant span is further divided into participant age, gender, condition and sample size in the randomised controlled trial. This dataset comes pre-divided into a training set (n=4,933) annotated through crowd-sourcing and an expert annotated gold test set (n=191) for evaluation. The EBM-PICO annotation guidelines caution about variable annotation quality. Abaho et al. developed a framework to post-hoc correct EBM-PICO outcomes annotation inconsistencies. Lee et al. studied annotation span disagreements suggesting variability across the annotators. Low annotation quality in the training dataset is excusable, but the errors in the test set can lead to faulty evaluation of the downstream ML methods. We evaluate 1% of the EBM-PICO training set tokens to gauge the possible reasons for the fine-grained labelling errors and use this exercise to conduct an error-focused PICO re-annotation for the EBM-PICO gold test set. The file 'test_ebm_correctedlabels.tsv' has error corrected EBM-PICO gold test set. This dataset could be used as a complementary evalution set along with EBM-PICO test set. error_analysis.zip: This .zip file contains three .tsv files for each PICO class to identify possible errors in about 1% (about 12,962 tokens) of the EBM-PICO training set. Objective: PICO (Participants, Interventions, Comparators, Outcomes) analysis is vital but time-consuming for conducting systematic reviews (SRs). Supervised machine learning can help fully automate it, but a lack of large annotated corpora limits the quality of automated PICO recognition systems. The largest currently available PICO corpus is manually annotated, which is an approach that is often too expensive for the scientific community to apply. Depending on the specific SR question, PICO criteria are extended to PICOC (C-Context), PICOT (T-timeframe), and PIBOSO (B-Background, S-Study design, O-Other) meaning the static hand-labelled corpora need to undergo costly re-annotation as per the downstream requirements. We aim to test the feasibility of designing a weak supervision system to extract these entities without hand-labelled data. Methodology: We decompose PICO spans into its constituent entities and re-purpose multiple medical and non-medical ontologies and expert-generated rules to obtain multiple noisy labels for these entities. These labels obtained using several sources are then aggregated using simple majority voting and generative modelling approaches. The resulting programmatic labels are used as weak signals to train a weakly-supervised discriminative model and observe performance changes. We explore mistakes in the currently available PICO corpus that could have led to inaccurate evaluation of several automation methods. Results: We present Weak-PICO, a weakly-supervised PICO entity recognition approach using medical and non-medical ontologies, dictionaries and expert-generated rules. Our approach does not use hand-labelled data. Conclusion: Weak supervision using weak-PICO for PICO entity recognition has encouraging results, and the approach can potentially extend to more clinical entities readily. All the datasets could be opened using text editors or Google sheets. The .zip files in the dataset can be opened using the archive utility on Mac OS and unzip functionality in Linux. (All Windows and Apple operating systems support the use of ZIP files without additional third-party software)

  • Restricted English
    Authors: 
    GEN;
    Publisher: Zenodo

    This repository is created for sharing materials (e.g., sample data, trained models, and demo files) for our work. The demo files in the repository allow users to run our models on their own data or on sample data that we provide. The repository includes the following four components: A code demonstration of review text preprocessing. (ReviewPreprocess.zip) The lexicon and a code demonstration of using the lexicon to generate input for the two lexicon-based classification models. (LexiconModels.zip) The trained Doc2Vec model and a code demonstration of obtaining Doc2Vec embeddings using this model. (Doc2VecEmbeddings.zip) Trained base-learner classification models (M2, M3, M4), optimized weights for the ensemble model E2, and the trained ensemble model (E3). We also provide a code demonstration of classifying reviews using our proposed models. (ClassificationModels.zip) The data used for building these models can be requested from the Global Emancipation Network for approved uses established in a data use agreement. This work was funded by the National Science Foundation under award #1936331.

  • Open Access
    Authors: 
    Leonardo Santiago Benitez Pereira;
    Publisher: Zenodo

    Collection of 300 support tickets manually labeled for semantic similarity, obtained from a IT support company in the Florianópolis (Brazil) region. Each ticket is represented by an unstructured text field, which is typed by the user that opened the call. The labeling process was performed in 2022 by three IT support professionals. The corpus contains tickets in many languages, mainly English, German, Portuguese and Spanish. All Personal Identifiable Information (PII) and sensitive information were removed (substituted by a tag indicating the original content, for instance: the sentence "this text was written by Leonardo" is converted to "this text was written by [NAME]"). The removal was performed in three steps: first, the automated machine learning-based tool AWS Comprehend PII Removal was used; then, a sequence of custom regular expressions was applied; last, the entire corpus was manually verified.

  • Research software . 2022
    Open Access English
    Authors: 
    Mähr, Moritz;
    Publisher: Zenodo

    Full Changelog: https://github.com/maehr/the-corpus-as-a-network/commits/v0.1.0-alpha If you use this dataset, please cite it using the metadata from this file.

  • Research software . 2022
    Open Access

    ASReview Insights is an extension to ASReview LAB that extends the software with tools for plotting and extracting the statistical results of several performance metrics. The extension is especially useful in combination with the simulation functionality of ASReview LAB.

  • Open Access
    Authors: 
    Jan Moens; Koen De Groote;
    Publisher: Zenodo

    Bijlage bij 'Moens J. & De Groote K. 2022: Ieper - De Meersen. Deel 2. De studie van het leer', een onderzoeksrapport van het agentschap Onroerend Erfgoed: - volledige inventaris van de leervondsten (als .xlsx-bestand)

  • Open Access English
    Authors: 
    Sarker, Abeed;
    Publisher: Zenodo

    This dataset accompanies the article titled: "Can accurate demographic information about people who use prescription medications non-medically be derived from Twitter?" submitted to PNAS. See the README.txt file for more details.

  • Open Access
    Authors: 
    Helling, Patrick; Borges, Rebekka; Gius, Evelyn;
    Publisher: Zenodo

    Diese Sicherung beinhaltet das Github Repository zur DHd-Jahreskonferenz 2022. Es enthält alle publizierten Konferenzbeiträge im TEI-XML- sowie im PDF-Format, alle publizierten Posterbeiträge im PDF-Format sowie die Metadaten zu allen Beiträgen im XML- und CSV-Format.