Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
186 Research products, page 1 of 19

  • Digital Humanities and Cultural Heritage
  • Research software
  • Software
  • English

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Restricted English
    Authors: 
    Ravinder, Rohitha; Castro, Leyla Jael; Rebholz-Schuhmann, Dietrich;
    Publisher: Zenodo

    This release corresponds to a thesis work that explores how information for protein functions can be exploited through embeddings so that the produced information can be used to improve protein function annotations. The underlying hypothesis here is that any pair of proteins with high sequence similarity will also share a similar biological function which would be reflected by the corresponding protein embeddings. The comparison and evaluation of this is done using two text-driven embedding approaches: Word2doc2Vec and Hybrid-Word2doc2Vec.

  • Open Access English
    Authors: 
    Valdestilhas, Andre; Hanke, Thomas;
    Publisher: Zenodo

    Materials science experiments involve complex data that are often very heterogeneous and challenging to reproduce. This was observed, for example, in a previous study on harnessing lightweight design potentials via the Materials Data Space [3] for which the data from materials sciences engineering experiments were generated using linked open data principles [1,2], e.g., Resource Description Framework (RDF) as the standard model for data interchange on the Web. However, detailed knowledge of formulating questions in the query language SPARQL is necessary to query the data. A lack of knowledge in SPARQL to query data was observed by domain experts in materials science. With this work, we aim to develop NaturalMSEQueries an approach for the material science domain expert where instead of SPARQL queries, the user can develop expressions in natural language, e.g., English, to query the data. This will significantly improve the usability of Semantic Web approaches in materials science and lower the adoption threshold of the methods for the domain experts. We plan to evaluate our approach, with varying amounts of data, from different sources. Furthermore, we want to compare with synthetic data to assess the quality of the implementation of our approach. References [1] T Berners-Lee, J Hendler, O Lassila - Scientific American, 2001, 284, 34–43. [2] RDF specification. 2023. available at: https://www.w3.org/RDF/ [3] Huschka M, Dlugosch M, Friedmann V, Trelles EG, Hoschke K, Klotz UE, Patil S, Preußner J, Schweizer C, Tiberto D. The “AluTrace” Use Case: Harnessing Lightweight Design Potentials via the Materials Data Space®.

  • Research software . 2023
    Open Access English
    Authors: 
    Sherratt, Tim;
    Publisher: Zenodo

    Current version: v1.0.0 Examples of exploring the different data sources that are aggregated into Trove from institutional contributors. For more information see the Trove contributors section of the GLAM Workbench. Notebook topics Create a flat list of organisations contributing metadata to Trove – converts the nested data available from the /contributor endpoint into a single flat list of contributors Get the number of records from each contributor by zone and format – gets the number of records contributed by each organisation, aggregated by zone and format Data See this repository for weekly harvests using code from the above notebooks. See the GLAM Workbench for more details.

  • Research software . 2023
    Open Access English
    Authors: 
    Richard-Trémeau Emma;
    Publisher: Zenodo
    Project: EC | MaltaPot (795633)

    This is the first release of two latex scripts used to create detailed catalogues of archaeological artefacts, here pottery sherds, including photographs and descriptions. These two examples make one entry on one page. These scripts can be used with any latex editor. The resulting entry can be previewed in the two PDF files.

  • Research software . 2023
    Open Access English
    Authors: 
    Dermentzi, Maria;
    Publisher: Zenodo

    This experimental span categorization model was designed solely for research purposes and was trained to detect names of people, organizations, concentration camps, ghettos, and dates in Holocaust-related texts. Please note that this model may produce false positive and false negative errors. The model was trained using spaCy (v3.5.0 - https://spacy.io/) on a very small corpus (196 documents that were split into train/test datasets) consisting of annotated early Holocaust testimonies (https://early-testimony.ehri-project.eu/) and diplomatic reports (https://diplomatic-reports.ehri-project.eu/). In addition to the 'spancat' pipeline component which predicts what labels should be assigned to different spans of texts, the processing pipeline also includes a series of rule-based components to compare and match the input text with entries in controlled vocabularies created by EHRI. These components are: 'ehri-camps', 'ehri-ghettos', 'ehri-personalities', 'ehri-corporate-bodies', 'ehri-terms', 'ehri-camps-fuzzy', 'ehri-ghettos-fuzzy', 'ehri-personalities-fuzzy', 'ehri-corporate-bodies-fuzzy', 'ehri-terms-fuzzy'. The pipeline components that include the word 'fuzzy' in their names support fuzzy matching, which means they can identify matches even if the input text contains minor variations such as small spelling mistakes. For fuzzy matching, a Levenshtein edit distance of one is used. Users may choose to activate or disable any components based on the needs of their projects. Please keep in mind that while this model may be useful for research purposes, it should not be relied upon for accurate results in production environments, and a human expert should carefully evaluate any output.

  • Restricted English
    Authors: 
    Aiken, William; Mvula, Paul K.; Branco, Paula; Jourdan, Guy-Vincent; Sabetzadeh, Mehrdad; Viktor, Herna;
    Publisher: Zenodo

    This Zenodo repository hosts the source code of our proposed approach for detecting self-admitted technical debt (SATD) comments in both intra- and cross-project scenarios. The repository includes the source code as it existed at the time of publication of our conference paper. In addition, it contains the manually annotated comments from 20 Java projects, as reported by da S. Maldonado et al. [1] and Guo et al. [2]. We intend to provide future updates to this GitHub repository with additional features and improvements.

  • Research software . 2023
    Open Access English
    Authors: 
    Sherratt, Tim;
    Publisher: Zenodo

    Current version: v1.0.0 Experiments and examples relating to Trove's 'Diaries, letters, and archives' zone See the Trove unpublished section of the GLAM Workbench for more details. Notebook topics Finding unpublished works that might be entering the public domain on 1 January 2019 Exploring unpublished works that might be entering the public domain on 1 January 2019 Find urls of digitised finding aids – harvest urls of digitised NLA finding aids Collect information about digitised finding aids – work through the list of urls, extracting additional information for each finding aid Convert a HTML finding aid to JSON – scrapes a HTML finding aid to extract basic details and hierarchical structure Datasets: Unpublished works that might be entering the public domain on 1 January 2019 – Download CSV file (1.8mb) finding-aids.csv – list of urls for NLA digitised finding aids finding-aids-totals.csv – summary information describing NLA digitised finding aids Cite as See the GLAM Workbench or Zenodo for up-to-date citation details. This repository is part of the GLAM Workbench.

  • Open Access English
    Authors: 
    Felipe Coelho Argolo;
    Publisher: Zenodo

    SemanticTrajectories.jl is package built upon DynamicalSystems.jl and Embeddings.jl. It is designed to analyze semantic coherence in text. A popular approach in literature is to calculate coherence among consecutive words ("first order coherence") or among two words having a third word in-between ("second order coherence"). Then, one evaluates properties such as average, minimum and maximum coherence. Semantic SemanticTrajectories.jl leverages DynamicalSystems.jl Recurrence Quantification Analysis (RQA) capabilities to analyze semantic coherence in the entire trajectory. Wellcome Trust Grant - 223139/Z/21/Z

  • Open Access English
    Authors: 
    Indika, Amila; Washington, Peter Y.; Peruma, Anthony;
    Publisher: Zenodo

    This is the code and dataset that accompanies the study: "Performance Comparison of Binary Machine Learning Classifiers in Identifying Code Comment Types: An Exploratory Study." This study has been accepted for publication at the 2023 International Workshop on Natural Language-based Software Engineering. Following is the abstract of the study: Code comments are vital to source code as they help developers with program comprehension tasks. Written in natural language (usually English), code comments convey a variety of different information, which are grouped into specific categories. In this study, we construct 19 binary machine learning classifiers for code comment categories that belong to three different programming languages. We present a comparison of performance scores for different types of machine learning classifiers and show that the Linear SVC classifier has the highest average F1 score of 0.5474.

  • Open Access English
    Authors: 
    Plutniak, Sébastien; Araujo, Renata; Giardino, Sara;
    Publisher: Zenodo

    An R 'Shiny' application for the visualisation, interactive exploration, and web communication of archaeological excavation data. It includes interactive 3D and 2D visualisations, generation of cross sections and maps of the remains, basic spatial analysis methods (convex hull, regression surfaces, 2D kernel density estimation), and excavation timeline visualisation. 'archeoViz' can be used locally or deployed on a server, either with interactive input of data or with a static data set.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
186 Research products, page 1 of 19
  • Restricted English
    Authors: 
    Ravinder, Rohitha; Castro, Leyla Jael; Rebholz-Schuhmann, Dietrich;
    Publisher: Zenodo

    This release corresponds to a thesis work that explores how information for protein functions can be exploited through embeddings so that the produced information can be used to improve protein function annotations. The underlying hypothesis here is that any pair of proteins with high sequence similarity will also share a similar biological function which would be reflected by the corresponding protein embeddings. The comparison and evaluation of this is done using two text-driven embedding approaches: Word2doc2Vec and Hybrid-Word2doc2Vec.

  • Open Access English
    Authors: 
    Valdestilhas, Andre; Hanke, Thomas;
    Publisher: Zenodo

    Materials science experiments involve complex data that are often very heterogeneous and challenging to reproduce. This was observed, for example, in a previous study on harnessing lightweight design potentials via the Materials Data Space [3] for which the data from materials sciences engineering experiments were generated using linked open data principles [1,2], e.g., Resource Description Framework (RDF) as the standard model for data interchange on the Web. However, detailed knowledge of formulating questions in the query language SPARQL is necessary to query the data. A lack of knowledge in SPARQL to query data was observed by domain experts in materials science. With this work, we aim to develop NaturalMSEQueries an approach for the material science domain expert where instead of SPARQL queries, the user can develop expressions in natural language, e.g., English, to query the data. This will significantly improve the usability of Semantic Web approaches in materials science and lower the adoption threshold of the methods for the domain experts. We plan to evaluate our approach, with varying amounts of data, from different sources. Furthermore, we want to compare with synthetic data to assess the quality of the implementation of our approach. References [1] T Berners-Lee, J Hendler, O Lassila - Scientific American, 2001, 284, 34–43. [2] RDF specification. 2023. available at: https://www.w3.org/RDF/ [3] Huschka M, Dlugosch M, Friedmann V, Trelles EG, Hoschke K, Klotz UE, Patil S, Preußner J, Schweizer C, Tiberto D. The “AluTrace” Use Case: Harnessing Lightweight Design Potentials via the Materials Data Space®.

  • Research software . 2023
    Open Access English
    Authors: 
    Sherratt, Tim;
    Publisher: Zenodo

    Current version: v1.0.0 Examples of exploring the different data sources that are aggregated into Trove from institutional contributors. For more information see the Trove contributors section of the GLAM Workbench. Notebook topics Create a flat list of organisations contributing metadata to Trove – converts the nested data available from the /contributor endpoint into a single flat list of contributors Get the number of records from each contributor by zone and format – gets the number of records contributed by each organisation, aggregated by zone and format Data See this repository for weekly harvests using code from the above notebooks. See the GLAM Workbench for more details.

  • Research software . 2023
    Open Access English
    Authors: 
    Richard-Trémeau Emma;
    Publisher: Zenodo
    Project: EC | MaltaPot (795633)

    This is the first release of two latex scripts used to create detailed catalogues of archaeological artefacts, here pottery sherds, including photographs and descriptions. These two examples make one entry on one page. These scripts can be used with any latex editor. The resulting entry can be previewed in the two PDF files.

  • Research software . 2023
    Open Access English
    Authors: 
    Dermentzi, Maria;
    Publisher: Zenodo

    This experimental span categorization model was designed solely for research purposes and was trained to detect names of people, organizations, concentration camps, ghettos, and dates in Holocaust-related texts. Please note that this model may produce false positive and false negative errors. The model was trained using spaCy (v3.5.0 - https://spacy.io/) on a very small corpus (196 documents that were split into train/test datasets) consisting of annotated early Holocaust testimonies (https://early-testimony.ehri-project.eu/) and diplomatic reports (https://diplomatic-reports.ehri-project.eu/). In addition to the 'spancat' pipeline component which predicts what labels should be assigned to different spans of texts, the processing pipeline also includes a series of rule-based components to compare and match the input text with entries in controlled vocabularies created by EHRI. These components are: 'ehri-camps', 'ehri-ghettos', 'ehri-personalities', 'ehri-corporate-bodies', 'ehri-terms', 'ehri-camps-fuzzy', 'ehri-ghettos-fuzzy', 'ehri-personalities-fuzzy', 'ehri-corporate-bodies-fuzzy', 'ehri-terms-fuzzy'. The pipeline components that include the word 'fuzzy' in their names support fuzzy matching, which means they can identify matches even if the input text contains minor variations such as small spelling mistakes. For fuzzy matching, a Levenshtein edit distance of one is used. Users may choose to activate or disable any components based on the needs of their projects. Please keep in mind that while this model may be useful for research purposes, it should not be relied upon for accurate results in production environments, and a human expert should carefully evaluate any output.

  • Restricted English
    Authors: 
    Aiken, William; Mvula, Paul K.; Branco, Paula; Jourdan, Guy-Vincent; Sabetzadeh, Mehrdad; Viktor, Herna;
    Publisher: Zenodo

    This Zenodo repository hosts the source code of our proposed approach for detecting self-admitted technical debt (SATD) comments in both intra- and cross-project scenarios. The repository includes the source code as it existed at the time of publication of our conference paper. In addition, it contains the manually annotated comments from 20 Java projects, as reported by da S. Maldonado et al. [1] and Guo et al. [2]. We intend to provide future updates to this GitHub repository with additional features and improvements.

  • Research software . 2023
    Open Access English
    Authors: 
    Sherratt, Tim;
    Publisher: Zenodo

    Current version: v1.0.0 Experiments and examples relating to Trove's 'Diaries, letters, and archives' zone See the Trove unpublished section of the GLAM Workbench for more details. Notebook topics Finding unpublished works that might be entering the public domain on 1 January 2019 Exploring unpublished works that might be entering the public domain on 1 January 2019 Find urls of digitised finding aids – harvest urls of digitised NLA finding aids Collect information about digitised finding aids – work through the list of urls, extracting additional information for each finding aid Convert a HTML finding aid to JSON – scrapes a HTML finding aid to extract basic details and hierarchical structure Datasets: Unpublished works that might be entering the public domain on 1 January 2019 – Download CSV file (1.8mb) finding-aids.csv – list of urls for NLA digitised finding aids finding-aids-totals.csv – summary information describing NLA digitised finding aids Cite as See the GLAM Workbench or Zenodo for up-to-date citation details. This repository is part of the GLAM Workbench.

  • Open Access English
    Authors: 
    Felipe Coelho Argolo;
    Publisher: Zenodo

    SemanticTrajectories.jl is package built upon DynamicalSystems.jl and Embeddings.jl. It is designed to analyze semantic coherence in text. A popular approach in literature is to calculate coherence among consecutive words ("first order coherence") or among two words having a third word in-between ("second order coherence"). Then, one evaluates properties such as average, minimum and maximum coherence. Semantic SemanticTrajectories.jl leverages DynamicalSystems.jl Recurrence Quantification Analysis (RQA) capabilities to analyze semantic coherence in the entire trajectory. Wellcome Trust Grant - 223139/Z/21/Z

  • Open Access English
    Authors: 
    Indika, Amila; Washington, Peter Y.; Peruma, Anthony;
    Publisher: Zenodo

    This is the code and dataset that accompanies the study: "Performance Comparison of Binary Machine Learning Classifiers in Identifying Code Comment Types: An Exploratory Study." This study has been accepted for publication at the 2023 International Workshop on Natural Language-based Software Engineering. Following is the abstract of the study: Code comments are vital to source code as they help developers with program comprehension tasks. Written in natural language (usually English), code comments convey a variety of different information, which are grouped into specific categories. In this study, we construct 19 binary machine learning classifiers for code comment categories that belong to three different programming languages. We present a comparison of performance scores for different types of machine learning classifiers and show that the Linear SVC classifier has the highest average F1 score of 0.5474.

  • Open Access English
    Authors: 
    Plutniak, Sébastien; Araujo, Renata; Giardino, Sara;
    Publisher: Zenodo

    An R 'Shiny' application for the visualisation, interactive exploration, and web communication of archaeological excavation data. It includes interactive 3D and 2D visualisations, generation of cross sections and maps of the remains, basic spatial analysis methods (convex hull, regression surfaces, 2D kernel density estimation), and excavation timeline visualisation. 'archeoViz' can be used locally or deployed on a server, either with interactive input of data or with a static data set.