Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
1,462 Research products, page 1 of 147

  • Digital Humanities and Cultural Heritage
  • Research data
  • Research software
  • Other research products
  • Dataset
  • ZENODO

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access English
    Authors: 
    Rosson, David; Mäkelä, Eetu; Vaara, Ville; Mahadevan, Ananth; Ryan, Yann; Tolonen, Mikko;
    Publisher: Zenodo

    A sample dataset to accompany an article on text reuse. The data is a subset of larger corpus of text reuses detected in all of Eighteenth Century Collections Online (ECCO) and Early English Books Online (EEBO), and includes all detected text reuse instances connected with works of Wllam Shakespeare. See the accompanying README.md for details on files. The data is available under Attribution 4.0 International (CC BY 4.0) license, and is free to use with proper citation to the article. We thank the Finnish IT Center for Science (CSC), EuroHPC and LUMI for providing computing infrastructure The HPC-HD project is funded by the Academy of Finland under Grants 1333716 and 1347706.

  • Open Access
    Authors: 
    Marivate, Vukosi; Njini, Daniel; Madodonga, Andani; Lastrucci, Richard; Dzingirai, Isheanesu;
    Publisher: Zenodo

    ## About dataset The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the [data/raw](data/raw/) folder. The PDFS were obtained from the [Vuk'uzenzele website](https://www.vukuzenzele.gov.za/). The datasets contain government magazine editions in 11 languages, namely: | Language | Code | Language | Code | |------------|-------|------------|-------| | English | (eng) | Sepedi | (sep) | | Afrikaans | (afr) | Setswana | (tsn) | | isiNdebele | (nbl) | Siswati | (ssw) | | isiXhosa | (xho) | Tshivenda | (ven) | | isiZulu | (zul) | Xitstonga | (tso) | | Sesotho | (nso) | The dataset is present in several forms on the repo. Generally the dataset is split by edition, eg. `2020-01-ed1` The data directory is broken down as follows ``` ./data ├── interim # I am not really sure - looks like interim in regards to processed. ├── processed # The data from scraping the raw pdfs ├── sentence_align_output # The output (csv) of the sentence alignment with LASER language encoders └── simple_align_output # The output (csv) of a simple one to one sentence alignment

  • Open Access
    Authors: 
    Tijdens, Kea;
    Publisher: Zenodo
    Project: EC | SSHOC (823782), EC | SERISS (654221)

    Occupation is a key variable in socio-economic research, used in a wide variety of studies, but its measurement is a major challenge. The national stocks of job titles are large with 10,000’s of job titles, they are unstructured with vague boundaries between job titles, and the stock has no fixed list but instead many entries and exits over time. Measuring occupations in a multi-country survey is even a larger challenge, because occupations with the same tasks have to be coded similarly across countries. Most surveys use an open-ended survey question to measure occupations. The challenge relates to time-consuming and expensive office-coding. Alternatively, web surveys and CAPI surveys allow using a look-up database with occupational titles. The Surveycodings team and WageIndicator Foundation provide a multilingual database of coded and translated occupational titles that allow for urvey respondents' self-identification of their occupational titles, thereby tackling the challenge for multi-country surveys to classify job titles into ISCO-08 classification of occupations and to do so consistently across countries. The database is gradually extended with more occupational titles and more languages. The current version, as of 20230202, holds 55 languages for at most 4,000 titles, though some languages have only half of the titles translated, among others because the occupations do not exist in the country at stake or because no translations were aavailable. Details about this and related databases as well as related publications can be found at https://www.surveycodings.org/articles/codings/occupation.

  • Open Access English
    Authors: 
    Shoaib Sufi;
    Publisher: Zenodo
    Project: UKRI | The UK Software Sustainab... (EP/S021779/1)

    This is the data collected from the SSI survey of digital/software requirements run for the AHRC in 2021. Personally Identifiable Information (names, email addresses) have been removed, as have other information to minimise the chance of deductive disclosure (job role, institution).

  • Research data . 2023
    Open Access English
    Authors: 
    Kubiak-Schneider, Aleksandra;
    Publisher: Zenodo

    This is a tessera, an object made of clay, representing and naming the goddess Atargatis. It is special, because we do not have a lot of references from Palmyra concerning this deity. She was worshipped in an own temple, but it had never been discovered. This file presents an extract from the prepared database in the Nodegoat and mapping the object. Furthermore, it provides new and enhanced reading, which indicates the attribution of this object to the goddess Atargatis. {"references": ["Ingholt, Starcky, Seyrig, Caquot, Recueil de Tesseres de Palmyre, 1955, nr 162"]}

  • Research data . 2023
    Open Access English
    Authors: 
    Wang, Josiah;
    Publisher: Zenodo

    This dataset contains images and textual descriptions for ten categories (species) of butterflies. More specifically, it contains: Images for ten butterfly categories Segmentation masks for each image Textual descriptions for each butterfly category The image dataset comprises 832 images in total, with the distribution ranging from 55 to 100 images per category. Images were collected from Google Images by querying with the scientific (Latin) name of the species, for example "Danaus plexippus", and manually filtered for those depicting the butterfly of interest. The textual descriptions for each butterfly category were obtained from the eNature online nature guide back in 2008 (website no longer available). Please refer to our paper for a more detailed description of the dataset: Josiah Wang, Katja Markert, Mark Everingham (2009). Learning Models for Object Recognition from Natural Language Descriptions. In Proceedings of the 20th British Machine Vision Conference (BMVC2009), September 2009. Also see the video recording of the oral presentation.

  • Open Access English
    Authors: 
    Jiménez Rios, Alejandro; Plevris, Vagelis; Nogal, Maria;
    Publisher: Zenodo

    This database contains all the bibliographic information about the 8673 records found after applying the Search Strategy used for the Digital Twin Anomaly Detection Decision-Making for Bridge Management Systematic Review. Such strategy consisted on using seven initial keywords and similar terms of interest (namely: bridge and bridges, etc.): Bridge. Digital twin. Bridge information modelling. Finite elements. Bridge health monitoring. Anomaly detection algorithm. Cultural heritage. Six initial queries were done combining the first keyword with the rest of them: bridge* AND "digital twin*" bridge* AND (BrIM OR "bridge information model*") bridge* AND (FEM OR FEA OR "finite element method*" OR "finite element analy*") bridge* AND ("bridge health monitoring" OR "structural health monitoring") bridge* AND (ADA OR "anomaly detection algorithm*") bridge* AND ("cultural heritage" OR "monument* bridge*" OR "old bridge*" OR "ancient bridge*" OR "historic* bridge*") As a first screening step, the combination of these 6 initial searches was done to obtain relevant works containing at least three of the main keywords of interest: #1 AND #2 #1 AND #3 #1 AND #4 #1 AND #5 #1 AND #6 #2 AND #3 #2 AND #4 #2 AND #5 #2 AND #6 #3 AND #4 #3 AND #5 #3 AND #6 #4 AND #5 #4 AND #6 #5 AND #6 All records found in Scopus where downloaded both in .ris and .csv format and are included in this database. The search was conducted on 10/12/2022. Note: Searches 10, 14, 17 and 21 did not return any records. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101066739.

  • Restricted
    Authors: 
    Kuhlke, Olaf; Zachwieja, Alexandra; Verstraete, Emma;
    Publisher: Zenodo

    Full datas set of archival materials from the Otto Geist collection of the University of Alaska Fairbanks Archives. Includes historical mining maps, reports on mining deposits, census data, ecosystem data, economic data and cultural practice information. Geographical focus of the data is the Nome region of Alaska, St. Lawrence Island, Point Hope and Savoonga. Contains approximately 200 archival documents with ~1000 pages of text, plus maps and photographs.

  • Open Access English
    Authors: 
    Niccolucci, Franco; Markhoff, Béatrice; Theodoridou, Maria; Felicetti, Achille; Hermon, Sorin;
    Publisher: Zenodo
    Project: EC | ARIADNEplus (823914)

    The file contains the data used in the case studies of the paper "The Heritage Digital Twin: a bicycle made for two. The integration of digital methodologies into cultural heritage research" published on ORE.

  • Open Access English
    Authors: 
    Sisodia Yogendra;
    Publisher: Zenodo

    Annual reports Assessment Dataset This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. Following Sub Dataset(s) are there : a) pdf and corresponding OCR text of 100 Indian annual reports These 100 annual reports are for the 100 largest companies listed on the Bombay Stock Exchange. The total number of words in OCRed text is 12.25 million. b) A Few Examples of Sentences with Corresponding Classes The author defined 16 widely used topics used in the investment community as classes like: Accounting Standards Accounting for Revenue Recognition Corporate Social Responsbility Credit Ratings Diversity Equity and Inclusion Electronic Voting Environment and Sustainability Hedging Strategy Intellectual Property Infringement Risk Litigation Risk Order Book Related Party Transaction Remuneration Research and Development Talent Management Whistle Blower Policy These classes should help generate ideas and investment decisions, as well as identify red flags and early warning signs of trouble when everything appears to be proceeding smoothly. ABOUT DATA :: "scrips.json" is a json with name of companies "SC_CODE" is BSE Scrip Id "SC_NAME" is Listed Companies Name "NET_TURNOV" is Turnover on the day of consideration "source_pdf" is folder containing both PDF and OCR Output from Tesseract "raw_pdf.zip" contains raw PDF and it can be used to try another OCR. "ocr.zip" contains json file (annual_report_content.json) containing OCR text for each pdf. "annual_report_content.json" is an array of 100 elements and each element is having two keys "file_name" and "content" "classif_data_rank_freezed.json" is used for evaluation of results contains "sentence" and corresponding "class" The author released this dataset for analyzing annual reports. The author also released a few class labels with examples that can be used by equity research analysts for analyzing annual reports. This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. In the future, the author wants to expand the number of class labels and examples.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
1,462 Research products, page 1 of 147
  • Open Access English
    Authors: 
    Rosson, David; Mäkelä, Eetu; Vaara, Ville; Mahadevan, Ananth; Ryan, Yann; Tolonen, Mikko;
    Publisher: Zenodo

    A sample dataset to accompany an article on text reuse. The data is a subset of larger corpus of text reuses detected in all of Eighteenth Century Collections Online (ECCO) and Early English Books Online (EEBO), and includes all detected text reuse instances connected with works of Wllam Shakespeare. See the accompanying README.md for details on files. The data is available under Attribution 4.0 International (CC BY 4.0) license, and is free to use with proper citation to the article. We thank the Finnish IT Center for Science (CSC), EuroHPC and LUMI for providing computing infrastructure The HPC-HD project is funded by the Academy of Finland under Grants 1333716 and 1347706.

  • Open Access
    Authors: 
    Marivate, Vukosi; Njini, Daniel; Madodonga, Andani; Lastrucci, Richard; Dzingirai, Isheanesu;
    Publisher: Zenodo

    ## About dataset The dataset contains editions from the South African government magazine Vuk'uzenzele. Data was scraped from PDFs that have been placed in the [data/raw](data/raw/) folder. The PDFS were obtained from the [Vuk'uzenzele website](https://www.vukuzenzele.gov.za/). The datasets contain government magazine editions in 11 languages, namely: | Language | Code | Language | Code | |------------|-------|------------|-------| | English | (eng) | Sepedi | (sep) | | Afrikaans | (afr) | Setswana | (tsn) | | isiNdebele | (nbl) | Siswati | (ssw) | | isiXhosa | (xho) | Tshivenda | (ven) | | isiZulu | (zul) | Xitstonga | (tso) | | Sesotho | (nso) | The dataset is present in several forms on the repo. Generally the dataset is split by edition, eg. `2020-01-ed1` The data directory is broken down as follows ``` ./data ├── interim # I am not really sure - looks like interim in regards to processed. ├── processed # The data from scraping the raw pdfs ├── sentence_align_output # The output (csv) of the sentence alignment with LASER language encoders └── simple_align_output # The output (csv) of a simple one to one sentence alignment

  • Open Access
    Authors: 
    Tijdens, Kea;
    Publisher: Zenodo
    Project: EC | SSHOC (823782), EC | SERISS (654221)

    Occupation is a key variable in socio-economic research, used in a wide variety of studies, but its measurement is a major challenge. The national stocks of job titles are large with 10,000’s of job titles, they are unstructured with vague boundaries between job titles, and the stock has no fixed list but instead many entries and exits over time. Measuring occupations in a multi-country survey is even a larger challenge, because occupations with the same tasks have to be coded similarly across countries. Most surveys use an open-ended survey question to measure occupations. The challenge relates to time-consuming and expensive office-coding. Alternatively, web surveys and CAPI surveys allow using a look-up database with occupational titles. The Surveycodings team and WageIndicator Foundation provide a multilingual database of coded and translated occupational titles that allow for urvey respondents' self-identification of their occupational titles, thereby tackling the challenge for multi-country surveys to classify job titles into ISCO-08 classification of occupations and to do so consistently across countries. The database is gradually extended with more occupational titles and more languages. The current version, as of 20230202, holds 55 languages for at most 4,000 titles, though some languages have only half of the titles translated, among others because the occupations do not exist in the country at stake or because no translations were aavailable. Details about this and related databases as well as related publications can be found at https://www.surveycodings.org/articles/codings/occupation.

  • Open Access English
    Authors: 
    Shoaib Sufi;
    Publisher: Zenodo
    Project: UKRI | The UK Software Sustainab... (EP/S021779/1)

    This is the data collected from the SSI survey of digital/software requirements run for the AHRC in 2021. Personally Identifiable Information (names, email addresses) have been removed, as have other information to minimise the chance of deductive disclosure (job role, institution).

  • Research data . 2023
    Open Access English
    Authors: 
    Kubiak-Schneider, Aleksandra;
    Publisher: Zenodo

    This is a tessera, an object made of clay, representing and naming the goddess Atargatis. It is special, because we do not have a lot of references from Palmyra concerning this deity. She was worshipped in an own temple, but it had never been discovered. This file presents an extract from the prepared database in the Nodegoat and mapping the object. Furthermore, it provides new and enhanced reading, which indicates the attribution of this object to the goddess Atargatis. {"references": ["Ingholt, Starcky, Seyrig, Caquot, Recueil de Tesseres de Palmyre, 1955, nr 162"]}

  • Research data . 2023
    Open Access English
    Authors: 
    Wang, Josiah;
    Publisher: Zenodo

    This dataset contains images and textual descriptions for ten categories (species) of butterflies. More specifically, it contains: Images for ten butterfly categories Segmentation masks for each image Textual descriptions for each butterfly category The image dataset comprises 832 images in total, with the distribution ranging from 55 to 100 images per category. Images were collected from Google Images by querying with the scientific (Latin) name of the species, for example "Danaus plexippus", and manually filtered for those depicting the butterfly of interest. The textual descriptions for each butterfly category were obtained from the eNature online nature guide back in 2008 (website no longer available). Please refer to our paper for a more detailed description of the dataset: Josiah Wang, Katja Markert, Mark Everingham (2009). Learning Models for Object Recognition from Natural Language Descriptions. In Proceedings of the 20th British Machine Vision Conference (BMVC2009), September 2009. Also see the video recording of the oral presentation.

  • Open Access English
    Authors: 
    Jiménez Rios, Alejandro; Plevris, Vagelis; Nogal, Maria;
    Publisher: Zenodo

    This database contains all the bibliographic information about the 8673 records found after applying the Search Strategy used for the Digital Twin Anomaly Detection Decision-Making for Bridge Management Systematic Review. Such strategy consisted on using seven initial keywords and similar terms of interest (namely: bridge and bridges, etc.): Bridge. Digital twin. Bridge information modelling. Finite elements. Bridge health monitoring. Anomaly detection algorithm. Cultural heritage. Six initial queries were done combining the first keyword with the rest of them: bridge* AND "digital twin*" bridge* AND (BrIM OR "bridge information model*") bridge* AND (FEM OR FEA OR "finite element method*" OR "finite element analy*") bridge* AND ("bridge health monitoring" OR "structural health monitoring") bridge* AND (ADA OR "anomaly detection algorithm*") bridge* AND ("cultural heritage" OR "monument* bridge*" OR "old bridge*" OR "ancient bridge*" OR "historic* bridge*") As a first screening step, the combination of these 6 initial searches was done to obtain relevant works containing at least three of the main keywords of interest: #1 AND #2 #1 AND #3 #1 AND #4 #1 AND #5 #1 AND #6 #2 AND #3 #2 AND #4 #2 AND #5 #2 AND #6 #3 AND #4 #3 AND #5 #3 AND #6 #4 AND #5 #4 AND #6 #5 AND #6 All records found in Scopus where downloaded both in .ris and .csv format and are included in this database. The search was conducted on 10/12/2022. Note: Searches 10, 14, 17 and 21 did not return any records. This project has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 101066739.

  • Restricted
    Authors: 
    Kuhlke, Olaf; Zachwieja, Alexandra; Verstraete, Emma;
    Publisher: Zenodo

    Full datas set of archival materials from the Otto Geist collection of the University of Alaska Fairbanks Archives. Includes historical mining maps, reports on mining deposits, census data, ecosystem data, economic data and cultural practice information. Geographical focus of the data is the Nome region of Alaska, St. Lawrence Island, Point Hope and Savoonga. Contains approximately 200 archival documents with ~1000 pages of text, plus maps and photographs.

  • Open Access English
    Authors: 
    Niccolucci, Franco; Markhoff, Béatrice; Theodoridou, Maria; Felicetti, Achille; Hermon, Sorin;
    Publisher: Zenodo
    Project: EC | ARIADNEplus (823914)

    The file contains the data used in the case studies of the paper "The Heritage Digital Twin: a bicycle made for two. The integration of digital methodologies into cultural heritage research" published on ORE.

  • Open Access English
    Authors: 
    Sisodia Yogendra;
    Publisher: Zenodo

    Annual reports Assessment Dataset This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. Following Sub Dataset(s) are there : a) pdf and corresponding OCR text of 100 Indian annual reports These 100 annual reports are for the 100 largest companies listed on the Bombay Stock Exchange. The total number of words in OCRed text is 12.25 million. b) A Few Examples of Sentences with Corresponding Classes The author defined 16 widely used topics used in the investment community as classes like: Accounting Standards Accounting for Revenue Recognition Corporate Social Responsbility Credit Ratings Diversity Equity and Inclusion Electronic Voting Environment and Sustainability Hedging Strategy Intellectual Property Infringement Risk Litigation Risk Order Book Related Party Transaction Remuneration Research and Development Talent Management Whistle Blower Policy These classes should help generate ideas and investment decisions, as well as identify red flags and early warning signs of trouble when everything appears to be proceeding smoothly. ABOUT DATA :: "scrips.json" is a json with name of companies "SC_CODE" is BSE Scrip Id "SC_NAME" is Listed Companies Name "NET_TURNOV" is Turnover on the day of consideration "source_pdf" is folder containing both PDF and OCR Output from Tesseract "raw_pdf.zip" contains raw PDF and it can be used to try another OCR. "ocr.zip" contains json file (annual_report_content.json) containing OCR text for each pdf. "annual_report_content.json" is an array of 100 elements and each element is having two keys "file_name" and "content" "classif_data_rank_freezed.json" is used for evaluation of results contains "sentence" and corresponding "class" The author released this dataset for analyzing annual reports. The author also released a few class labels with examples that can be used by equity research analysts for analyzing annual reports. This dataset will help investors, merchant bankers, credit rating agencies, and the community of equity research analysts explore annual reports in a more automated way, saving them time. In the future, the author wants to expand the number of class labels and examples.