Anonymized responses to the ARIADNEplus questionnaire to gather information for the aggregation of metadata about archaelogical resources to be included in the ARIADNEplus Knowledge Base and portal (https://portal.ariadne-infrastructure.eu/). The csv includes only the plain responses as provided by 31 archaelogical content providers until 18 October 2021. The excel file includes also two additional sheets where the responses about the formats and the aggregation update schedule have been normalised. The responses are discussed in deliverable D12.4 "Final report on data integration" currently under preparation.
The study was undertaken in eleven flashed glass samples, provided by LambertsGlas® consisting of a colorless base glass covered by layers of different colors and thicknesses. This dataset consists of images of the samples; Laser-induced Breakdown Spectrocopy (LIBS) spectra; Laser-induced Fluorescence (LIF) spectra; Optical Microscopy (OM) images; UV-Vis-IR spectra and Field Emission Scanning Electron Microscopy (FESEM) images and the assingment of the Energy-dispersive X-ray (EDS) analysis. This information allows characterizing the composition of both sides of the glasses and determining the chemilcal identification of chromophores responsible for the flashed glass coloration. Images are presented in JPG. All spectra are presented in cvs format, in a single page. Descriptions of the samples and the experimental conditions in which the spectra were taken and the name of the column values are included at the top of each page. For LIBS, 1 file per sample of elemental composition of the flashed glasses are included. Each file is composed of 2 columns (wavelength and intensity). For LIF, 1 file per sample of the analysis of fluorescent species of each flashed glass are included. Each file is composed of 2 columns (wavelength and intensity). For UV-Vis-IR spectroscopy, 1 file per sample of glass chromophores, just for the colored side. Each file is composed of 2 columns (wavelength and intensity). For FESEM-EDS, 2 files per sample. In the first one: "PHOTOS", 1 cross section image per sample is included. In the second group of files: "EDS", 1 file per sample of the assignment of the main elements. Each file is composed of 3 columns (the main elements, the results of the glass base and the colored layer in weight percentage, respectively). -- This dataset is subject to a Creative Commons Attribution 4.0 International (CC BY 4.0) License. There are 5 files which correspond to each technic employed for the analysis of the eleven different samples. The file title "PHOTOS" contains: Fig. 1_Flashedglasses_Photo; Fig. 2_OM_Photo. The file title “LIBS” contains: LIBS_Black-Baseglass; LIBS_Black-Coloredlayer; LIBS_Blue1-Baseglass; LIBS_Blue1-Coloredlayer; LIBS_Blue2-Baseglass; LIBS_Blue2-Coloredlayer; LIBS_Blue3-Baseglass; LIBS_Blue3-Coloredlayer; LIBS_Brown1-Baseglass; LIBS_Brown1-Coloredlayer; LIBS_Brown2-Baseglass; LIBS_Brown2-Coloredlayer; LIBS_Green1-Baseglass; LIBS_Green1-Coloredlayer; LIBS_Green2-Baseglass; LIBS_Green2-Coloredlayer; LIBS_Green3-Baseglass; LIBS_Green3-Coloredlayer; LIBS_Pink1-Baseglass; LIBS_Pink1-Coloredlayer; LIBS_Pink2-Baseglass; LIBS_Pink2-Coloredlayer. The file for “LIF” contains: LIF_Black-Baseglass; LIF_Black-Coloredlayer; LIF_Blue1-Baseglass; LIF_Blue1-Coloredlayer; LIF_Blue2-Baseglass; LIF_Blue2-Coloredlayer; LIF_Blue3-Baseglass; LIF_Blue3-Coloredlayer; LIF_Brown1-Baseglass; LIF_Brown1-Coloredlayer; LIF_Brown2-Baseglass; LIF_Brown2-Coloredlayer; LIF_Green1-Baseglass; LIF_Green1-Coloredlayer; LIF_Green2-Baseglass; LIF_Green2-Coloredlayer; LIF_Green3-Baseglass; LIF_Green3-Coloredlayer; LIF_Pink1-Baseglass; LIF_Pink1-Coloredlayer; LIF_Pink2-Baseglass; LIF_Pink2-Coloredlayer. For the “FESEM-EDS” there are two files inside. One title "EDS" which contains the documents: EDS_Black; EDS_Blue1; EDS_Blue2; EDS_Blue3; EDS_Brown1; EDS_Brown2; EDS_Brown2; EDS_Green1; EDS_Green2; EDS_Green3; EDS_Pink1; EDS_Pink2. And the other called "PHOTOS" which contains: FESEM_Black; FESEM_Blue1; FESEM_Blue2; FESEM_Blue3; FESEM_Brown1; FESEM_Brown2; FESEM_Green1; FESEM_Green2; FESEM_Green3; FESEM_Pink1; FESEM_Pink2. This is the experimental dataset used in the paper Appl. Sci., 12(11), 5760 (2022) (https://www.mdpi.com/2076-3417/12/11/5760). Flashed glasses are composed of a base glass and a thin colored layer and have been used since medieval times in stained glass windows. Their study can be challenging because of their complex composition and multilayer structure. In the present work, a set of optical and spectroscopic techniques have been used for the characterization of a representative set of flashed glasses commonly used in the manufacture of stained glass windows. The structural and chemical composition of the pieces were investigated by optical microscopy, field emission scanning electron microscopy-energy dispersive X-ray spectrometry (FESEM-EDS), UV-Vis-IR spectroscopy, laser-induced breakdown spectroscopy (LIBS), and laser-induced fluorescence (LIF). Optical microscopy and FESEM-EDS allowed the determination of the thicknesses of the colored layers, while LIBS, EDS, UV-Vis-IR, and LIF spectroscopies served for elemental, molecular, and chromophores characterization of the base glasses and colored layers. Results obtained using the micro-invasive LIBS technique were compared with those retrieved by the cross-sectional technique FESEM-EDS, which requires sample taking, and showed significant consistency and agreement. In addition, LIBS results revealed the presence of additional elements in the composition of flashed glasses that could not be detected by FESEM-EDS. The combination of UV-Vis-IR and LIF results allowed precise chemical identification of chromophores responsible for the flashed glass coloration. This research has been funded by the Spanish State Research Agency (AEI) through project PID2019-104124RB-I00/AEI/10.13039/501100011033, the Fundación General CSIC (ComFuturo Programme), by project TOP Heritage-CM (S2018/NMT-4372) from Community of Madrid, and by the H2020 European project IPERION HS (Integrated Platform for the European Research Infrastructure ON Heritage Science, GA 871034). Peer reviewed
The project provides the digital edition of the libretti staged for the election of the Council of the Elders in the Republic of Lucca. The celebration, known as funzione delle Tasche, was repeated every three years from 1636 to 1797. The present edition collects the works from 1636 to 1705 in order to analyze changes and recurring motifs throughout the 17th century in a republican context.
ELEXIS-WSD is a parallel sense-annotated corpus in which content words (nouns, adjectives, verbs, and adverbs) have been assigned senses. Version 1.0 contains sentences for 10 languages: Bulgarian, Danish, English, Spanish, Estonian, Hungarian, Italian, Dutch, Portuguese, and Slovene. The corpus was compiled by automatically extracting a set of sentences from WikiMatrix (Schwenk et al., 2019), a large open-access collection of parallel sentences derived from Wikipedia, using an automatic approach based on multilingual sentence embeddings. The sentences were manually validated according to specific formal, lexical and semantic criteria (e.g. by removing incorrect punctuation, morphological errors, notes in square brackets and etymological information typically provided in Wikipedia pages). To obtain a satisfying semantic coverage, we filtered out sentences with less than 5 words and less than 2 polysemous words were filtered out. Subsequently, in order to obtain datasets in the other nine target languages, for each selected sentence in English, the corresponding WikiMatrix translation into each of the other languages was retrieved. If no translation was available, the English sentence was translated manually. The resulting corpus is comprised of 2,024 sentences for each language. The sentences were tokenized, lemmatized, and tagged with POS tags using UDPipe v2.6 (https://lindat.mff.cuni.cz/services/udpipe/). Senses were annotated using LexTag (https://elexis.babelscape.com/): each content word (noun, verb, adjective, and adverb) was assigned a sense from among the available senses from the sense inventory selected for the language (see below) or BabelNet. Sense inventories were also updated with new senses during annotation. List of sense inventories BG: Dictionary of Bulgarian DA: DanNet – The Danish WordNet EN: Open English WordNet ES: Spanish Wiktionary ET: The EKI Combined Dictionary of Estonian HU: The Explanatory Dictionary of the Hungarian Language IT: PSC + Italian WordNet NL: Open Dutch WordNet PT: Portuguese Academy Dictionary (DACL) SL: Digital Dictionary Database of Slovene The corpus is available in a CONLL-like tab-separated format. In order, the columns contain the token ID, its form, its lemma, its UPOS-tag, its whitespace information (whether the token is followed by a whitespace or not), the ID of the sense assigned to the token, and the index of the multiword expression (if the token is part of an annotated multiword expression). Each language has a separate sense inventory containing all the senses (and their definitions) used for annotation in the corpus. Not all the senses from the sense inventory are necessarily included in the corpus annotations: for instance, all occurrences of the English noun "bank" in the corpus might be annotated with the sense of "financial institution", but the sense inventory also contains the sense "edge of a river" as well as all other possible senses to disambiguate between. For more information, please refer to 00README.txt.
MALDI-TOF-MS spectra of extracted collagen from modern reference and archaeological bone samples to develop markers for Zooarchaeology by Mass Spectrometry (ZooMS) to distinguish between Equus species. For each sample digestions were done in both trypsin and chymotrypsin separately. Information about the species of the samples can be found in 'sample metadata.csv' file. Information on the extraction and digestion protocol can be found in the associated manuscript. The sequence data contains alignments of the proteins COL1A1 and COL1A2 for available Equus collagen protein sequences. More information on these files can be found in the corresponding manuscript to this dataset.
SILKNOW Multimodal Cultural Heritage Dataset. Includes text descriptions, images, labels, and predictions made by individual modality classifiers. The data resulted from an export of the SILKNOW Knowledge Graph. See: https://zenodo.org/record/5743090 Repository with code using this dataset available at: https://github.com/silknow/multimodal_cultural_heritage
In 2018 the IPERION-CH Grounds Database was presented to examine how the data produced through the scientific examination of historic painting preparation or grounds samples, from multiple institutions could be combined in a flexible digital form. Exploring the presentation of interrelated high resolution images, text, complex metadata and procedural documentation. The original main user interface is live, though password protected at this time. Work within the SSHOC project aimed to reformat the data to create a more FAIR data-set, so in addition to mapping it to a standard ontology, to increase Interoperability, it has also been made available in the form of open linkable data combined with a SPARQL end-point. A draft version of this live data presentation can been found Here. This is a draft data-set and further work is planned to debug and improve its semantic structure.This deposit contains the CIDOC-CRM mapped data formatted in XML and an example model diagram representing some of the key relationships covered in the data-set. Live access to this data, with documentation and worked examples, can be found at: https://rdf.ng-london.org.uk/sshoc
In 2007 the Raphael Research Resource project began to examine how complex conservation, scientific and art historical research could be combined in a flexible digital form. Exploring the presentation of interrelated high resolution images and text, along with how the data could be stored in relation to an event driven ontology in the form of RDF triples. The original main user interface is still live, In 2021/21 as part of the SSHOC Project the raw data stored within the system was mapped to the CIDOC CRM using a custom set of Python scripts (https://doi.org/10.5281/zenodo.6461654). The SSHOC work aimed to make this data more FAIR so in addition to mapping it to a standard ontology, to increase Interoperability, it has also been made available in the form of open linkable data combined with a SPARQL end-point. This live data presentation can been found Here. This deposit contains the CIDOC-CRM mapped data formatted in XML and an example model diagram representing some of the key relationships covered in the data-set. Live access to this data, with documentation and worked examples, can be found at: https://rdf.ng-london.org.uk/sshoc
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Project: EC | Bergamot (825303)
Document-level testsuite for evaluation of gender translation consistency. Our Document-Level test set consists of selected English documents from the WMT21 newstest annotated with gender information. Czech unnanotated references are also added for convenience. We semi-automatically annotated person names and pronouns to identify the gender of these elements as well as coreferences. Our proposed annotation consists of three elements: (1) an ID, (2) an element class, and (3) gender. The ID identifies a person's name and its occurrences (name and pronouns). The element class identifies whether the tag refers to a name or a pronoun. Finally, the gender information defines whether the element is masculine or feminine. We performed a series of NLP techniques to automatically identify person names and coreferences. This initial process resulted in a set containing 45 documents to be manually annotated. Thus, we started a manual annotation of these documents to make sure they are correctly tagged. See README.md for more details.
Publisher: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Project: EC | Bergamot (825303)
CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 1.0 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 1.0 consists of the same corpora and languages as the previous version 0.2; however, the English GUM dataset has been updated to a newer and larger version, and in the Czech/English PCEDT dataset, the train-dev-test split has been changed to be compatible with OntoNotes. Nevertheless, the main change is in the file format (the MISC attributes have new form and interpretation).