Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
208 Research products, page 1 of 21

  • Digital Humanities and Cultural Heritage
  • Research data
  • DRYAD

10
arrow_drop_down
Date (most recent)
arrow_drop_down
  • Open Access English
    Authors: 
    Lundy, Jasmine;
    Publisher: Dryad

    From the 9th to 14th centuries AD, Sicily experienced a series of rapid and quite radical changes in political regime, but the impact of these regime changes on the lives of the people that experienced them remains largely elusive within the historical narrative. We use a multi-faceted lipid residue approach to give direct chemical evidence of the use of 248 everyday domestic ceramic containers from Islamic and post-Islamic contexts in western Sicily to aid our understanding of daily habits throughout this period of political change. A range of commodities was successfully identified, including animal fats, vegetable products, fruit products, (potentially including wine), and plant resins. The study highlights the complexity of residues in Early Medieval Mediterranean society, as in many cases mixtures of commodities were observed, reflecting sequential cooking events and/or the complex mixtures reflective of medieval recipes. However, overall there were no clear changes in the composition of the residues following the imposition of Norman control over the island and through subsequent periods, despite some differences between urban centres and rural sites. Thus, lending to the idea that post-Islamic populations largely flourished and benefited from the agricultural systems, resources, and recipes left by their predecessors. This data set is comprised of data files produced by Gas Chromatography-Mass Spectrometry (GC-MS) of lipids extracted using acid extraction method from pottery sherds from 9th-14th century contexts in Sicily. These data are linked to the published journal where methods of extraction, the context of pottery and the interpretation of data are fully described. Each file corresponds to the sample name as recorded in S1 data and contains a usable cdf. file. The acquisition method for all files is given in TEXT format.

  • Open Access
    Authors: 
    Gravis, David; Roy, Nicolas; Ruffini-Ronzani, Nicolas; Houssiau, Laurent; Felten, Alexandre; Tumanov, Nikolay; Deparis, Olivier;
    Publisher: Zenodo

    XRD diffractograms, ToF-SIMS MS and ATR-FTIR spectrometry spectra, recorded on inks on historical parchments (pigments, inked areas). ToF-SIMS and ATR-FTIR spectra from non-inked areas of the parchments. See the read-me file for complete description of the files and structure, and the main manuscript for the methodology. The PCA algorithm code is also provided. Article abstract : Book production by medieval scriptoria have gained growing interest in recent studies. In this context, identifying ink compositions and parchment animal species from illuminated manuscripts is of great importance. Here, we introduce time-of-flight secondary ion mass spectrometry (ToF-SIMS) as a non-invasive tool to identify both inks and animal skins in manuscripts, at the same time. For this purpose, both positive and negative ion spectra in inked and non-inked areas were recorded. Chemical compositions of pigments (decoration) or black inks (text) were determined by searching for characteristic ion mass peaks. Animal skins were identified by data processing of raw ToF-SIMS spectra using Principal Component Analysis (PCA). In illuminated manuscripts from 15th c. to 16th c., malachite (green), azurite (blue), cinnabar (red) inorganic pigments, as well as iron-gall black ink, were identified. Carbon black and indigo (blue) organic pigments were also identified. Animal skins were identified in modern parchments of known animal species by a two-steps PCA procedure. We believe the proposed method will find extensive application in material studies of medieval manuscripts, as it is non-invasive, highly sensitive and able to identify both inks and animal skins at the same time, even from traces of pigments and tiny scanned areas. In-house PCA algorithm requires python. ToF-SIMS raw data require SurfaceLab software. ATR-FTIR raw data (.0) can be read with free-licence software (Fityk). XRD diffractograms are directly exported in .txt from .xyz files. Analytical and data processing methods can be found in the manuscript.

  • Open Access
    Authors: 
    Oh, Inez; Schindler, Suzanne; Ghoshal, Nupur; Lai, Albert; Payne, Philip; Gupta, Aditi;
    Publisher: Dryad

    Objectives There is much interest in utilizing clinical data for developing prediction models for Alzheimer disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured Electronic Health Record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by two clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results Documentation rates for each phenotype varied in the structured versus unstructured EHR. Inter-annotator agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with the NLP-based phenotype extraction pipeline’s performance (average F1-score = 0.65-0.99) for each phenotype. Discussion We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine-learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability. We developed a natural language processing (NLP)-based pipeline which contains independent NLP modules that target the extraction of ten clinical phenotypes relevant to Alzheimer disease dementia progression. The pipeline was trained on unstructured clinical notes originating from Allscripts TouchWorks associated with AD dementia patient office vsits that occurred between June 1, 2013, to May 31, 2018, extracted from the Washington University in St. Louis Research Data Core (RDC), a repository of patient clinical data from BJC HealthCare and Washington University Physicians. The targeted phenotypes included neurobehavioral test scores (Clinical Dementia Rating and Mini-Mental State Exam) and their corresponding test dates, comorbidities (hypertension and depression), neuroimaging findings (presence of atrophy or infarct), behavioral indicators of dementia (repeating and misplacing), biomarker levels (total and phosphorylated tau protein levels), and family history (whether there was a family history of dementia, and if yes, which family member(s)). The clinical notes extracted from EHR were in rich text format (RTF) contained within tab-delimited files (TXT) alongside metadata such as the patient medical record number, author, and date authored. These were preprocessed before being analyzed by the NLP-based phenotype extraction pipeline. This entailed converting the TXT files to comma-separated files (CSV), accounting for additional tab, quote, and newline characters present, and stripping the RTF formatting. Data preprocessing steps were performed using the Python Pandas and striprtf (version 0.0.10) packages. Linguamatics I2E query files (*.i2qy) and Enterprise Architect Simulation Library (EASL) code for each NLP module can be found on the Linguamatics Community webpage (https://community.linguamatics.com/queries), accessible with the creation of a free account. Linguamatics I2E software is required to open the query files (*.i2qy) directly, but the logic underlying the NLP modules can be understood by referencing the EASL code.

  • Open Access English
    Authors: 
    Dhrangadhariya, Anjani; Müller, Henning;
    Publisher: Dryad

    This upload contains four main zip files. ds_cto_dict.zip: This zip file contains the four distant supervision dictionaries (P: participant.txt, I = intervention.txt, intervetion_syn.txt, O: outcome.txt) generated from clinicaltrials.gov using the Methodology described in Distant-CTO (https://aclanthology.org/2022.bionlp-1.34/). These dictionaries were used to create distant supervision labelling functions as described in the Labelling sources subsection of the Methodology. The data was derived from https://clinicaltrials.gov/ handcrafted_dictionaries.zip: This zip folder contains three files 1) gender_sexuality.txt: a list of possible genders and sexual orientations found across the web. The list needs to be more comprehensive. 2) endpoints_dict.txt: contains outcome names and the names of questionnaires used to measure outcomes assembled from PROM questionnaires and PROMs. and 3) comparator_dict: contains a list of idiosyncratic comparator terms like a sham, saline, placebo, etc., compiled from the literature search. The list needs to be more comprehensive. test_ebm_correctedlabels.tsv: EBM-PICO is a widely used dataset with PICO annotations at two levels: span-level or coarse-grained and entity-level or fine-grained. Span-level annotations encompass the full information about each class. Entity-level annotations cover the more fine-grained information at the entity level, with PICO classes further divided into fine-grained subclasses. For example, the coarse-grained Participant span is further divided into participant age, gender, condition and sample size in the randomised controlled trial. This dataset comes pre-divided into a training set (n=4,933) annotated through crowd-sourcing and an expert annotated gold test set (n=191) for evaluation. The EBM-PICO annotation guidelines caution about variable annotation quality. Abaho et al. developed a framework to post-hoc correct EBM-PICO outcomes annotation inconsistencies. Lee et al. studied annotation span disagreements suggesting variability across the annotators. Low annotation quality in the training dataset is excusable, but the errors in the test set can lead to faulty evaluation of the downstream ML methods. We evaluate 1% of the EBM-PICO training set tokens to gauge the possible reasons for the fine-grained labelling errors and use this exercise to conduct an error-focused PICO re-annotation for the EBM-PICO gold test set. The file 'test_ebm_correctedlabels.tsv' has error corrected EBM-PICO gold test set. This dataset could be used as a complementary evalution set along with EBM-PICO test set. error_analysis.zip: This .zip file contains three .tsv files for each PICO class to identify possible errors in about 1% (about 12,962 tokens) of the EBM-PICO training set. Objective: PICO (Participants, Interventions, Comparators, Outcomes) analysis is vital but time-consuming for conducting systematic reviews (SRs). Supervised machine learning can help fully automate it, but a lack of large annotated corpora limits the quality of automated PICO recognition systems. The largest currently available PICO corpus is manually annotated, which is an approach that is often too expensive for the scientific community to apply. Depending on the specific SR question, PICO criteria are extended to PICOC (C-Context), PICOT (T-timeframe), and PIBOSO (B-Background, S-Study design, O-Other) meaning the static hand-labelled corpora need to undergo costly re-annotation as per the downstream requirements. We aim to test the feasibility of designing a weak supervision system to extract these entities without hand-labelled data. Methodology: We decompose PICO spans into its constituent entities and re-purpose multiple medical and non-medical ontologies and expert-generated rules to obtain multiple noisy labels for these entities. These labels obtained using several sources are then aggregated using simple majority voting and generative modelling approaches. The resulting programmatic labels are used as weak signals to train a weakly-supervised discriminative model and observe performance changes. We explore mistakes in the currently available PICO corpus that could have led to inaccurate evaluation of several automation methods. Results: We present Weak-PICO, a weakly-supervised PICO entity recognition approach using medical and non-medical ontologies, dictionaries and expert-generated rules. Our approach does not use hand-labelled data. Conclusion: Weak supervision using weak-PICO for PICO entity recognition has encouraging results, and the approach can potentially extend to more clinical entities readily. All the datasets could be opened using text editors or Google sheets. The .zip files in the dataset can be opened using the archive utility on Mac OS and unzip functionality in Linux. (All Windows and Apple operating systems support the use of ZIP files without additional third-party software)

  • Open Access
    Authors: 
    Chang, Charlotte;
    Publisher: Zenodo

    Social media platforms, such as Twitter, are an increasingly important source of information and are forums for discourse within and between interest groups. Research highlights how social media communities have amplified movements such as the Arab Spring, #MeToo, and Black Lives Matter. But environmental digital discourse remains underexplored. In the present article, we apply automated text analysis to 200,000 Twitter users in several countries following leading environmental nongovernmental organizations. Some issues such as public action to decarbonize society or species conservation were discussed more intensely than agriculture or marine conservation. Our results illustrate where environmental discourse diverges and converges on Twitter across countries, states, and characteristics, such as political ideology. Using the coterminous United States as a case study, we observed that the prominence of issues varies across states and, in some cases, covaries with political ideology across counties. Our findings show paths forward to characterizing environmental priorities across many issues at unprecedented scale and extent. In this repository, we provide data and code to reproduce the results in the main text of this manuscript. Twitter data querying must comply with Twitter's Terms of Service. Researchers should obtain human ethics approval from their Institutional Review Board before scraping Twitter data. These data were scraped from Twitter. Please consult the README for more information.

  • Open Access
    Authors: 
    Wilks, Stefania; Louderback, Lisbeth;
    Publisher: Zenodo

    Starch Extraction and Identification All samples were processed for starch following standard lab procedures (see Louderback et al., 2015). Large ground stone tools were surface sampled for ~5 min with a sonicating toothbrush and DH2O ; smaller tools were sampled after sonicating in a DH2O bath for 3 min. Each sample was rinsed through an Endecott mesh sieve (125µm) with DH2O and sample liquid <125µm was retained in a 50mL test tube. Sample extract >125µm was discarded. The sample liquid (presumably containing starch) was centrifuged for 3minutes at 3,000 rpm. The supernatant was decanted and the sample pellet was transferred to a sterile 15mL tube with DH2O , mixed with a vortex, centrifuged for 3minutes at 3,000 rpm, and decanted. Samples were resuspended with ~7mL of heavy liquid (lithium heteropolytungstate; specific gravity 2.2), vortex-mixed, and centrifuged for 15 minutes at 1,000rpm. Heavy liquid separates the lighter organic material, including starch granules, from the heavier content. The lighter organics were collected using a pipette and transferred to a new, sterile 15mL tube. Residual heavy liquid was rinsed from the organic matter with ~10mL of DH2O , mixed with a vortex, and centrifuged for 3 minutes at 3,000rpm; twice. The samples were decanted and rinsed with ~7mL of acetone, vortex-mixed, and centrifuged for 3 minutes at 3,000rpm. After decanting one final time, processed samples were left to dry overnight before mounting on glass slides for microscopy observation. Starch granules were measured and described based on a set of established criteria, including maximum length through the hilum (µm), hilum position, two-dimensional shape, clarity of the extinction cross, and the presence or absence of surface features such as fissures and pressure facets (Brown and Louderback 2020; Holst et al., 2007; ICSN, 2011; Joyce et al 2021; Louderback and Pavlik, 2017; Musaubach et al., 2013; Piperno et al., 2004, 2009; Reichert, 1913; Torrence and Barton, 2016). These criteria were recorded as absent (0) or present (1) and expressed as a percentage of the occurrence. Slides were scanned with a transmitted brightfield microscope using polarizing filters and Nomarski optics (Zeiss Axioskop Imager M1, Zeiss International, Göttingen, Germany). Observations were obtained using randomly generated X and Y coordinates on the microscope stage. All starch granules observed within each field of view were measured and described. Images and measurements at 400X were captured under polarized light (POL) with a digital camera (Zeiss AxioCam MRc5) using Zen Core 3.1 imaging and measurement software. The presence of surface features was imaged and recorded in differential interference contrast (DIC) micrographs. Identification and Quantification of Treated Starch The relative proportions and arrangements of amylose and amylopectin affect both granule morphology and functionality (Vamadevan and Bertoft, 2015). They can also cause various mechanical and chemical changes, such as granule swelling (gelatinization), pasting, and loss of birefringence in response to different food processing methods (Cai et al., 2014; Crowther, 2012; Di Poala et al., 2003; Gong et al., 2011; Mason, 2009; Wang et al., 2014). Exposing starch granules to heat can cause morphological damage such that granules may be difficult to identify. Granules exposed to high temperatures have been shown to enlarge in size (swell) and gelatinize, losing their birefringence cross, amylose layers, and surface characteristics before dispersing away entirely (Crowther, 2012; Singh et al., 2002; Vamadan and Bertoft, 2015). Congo Red dye (empirical formula C32H22N6O6S2Na2) reacts with granules when amylose layers are broken down (usually due to cooking or some other form of damage), staining them orange to vivid red. Undamaged starch granules (with intact amylose layers), however, are hydrophobic and, therefore, do not react with Congo Red (Lamb and Loy, 2005). To measure the percent or degree of damage to starch granules, milled and burned samples were treated with a Congo Red solution following a protocol similar to Lamb and Loy (2005). Dried residue samples were resuspended with 25µL of Congo Red and absorbed the stain for 15 minutes before diluting with 100µL of DH2O (1:4 ratio). Slides were prepped with 25µL of the hydrated sample and a cover slip was applied but not affixed with fingernail polish (experimentation found that clear fingernail polish interfered with Congo Red stain). The liquified stained samples dried within ~45 minutes, therefore, all observations were photographed immediately. Microscope observations on slides from the milled and burned (close proximity) samples were obtained using randomly generated X and Y coordinates. Samples exposed directly to flame, however, produced fewer measurable granules, so observations were collected by scanning the entire slide. Size distributions from the control, milled, and burned samples were statistically analyzed with a Kolmogorov-Smirnov (K-S) test to determine any significant difference in the distribution of median lengths. Boxplot-stripcharts were generated in R Statistical Software (v4.0.2; R Core Team 2020) to show the quartile summary variation Intense wildfires destroy everything in their path, including archaeological sites. Prehistorically, archaeological sites were regularly and intentionally burned. In what ways does burning affect those sites? With increased wildfire activity, research has begun to describe the effects of fire on archaeological materials through post-fire and experimental treatment, yet, little is known about the effects of fire on microbotanical remains, such as starch granules. Although some studies address the impact of fire on starch-rich foods, there is virtually no research on the fire effects of starch granules embedded in ground stone tools. The current study examines changes in the morphology of starch granules embedded in ground stone tools before and after exposure to flame combustion. A measurable amount of intact and identifiable starch granules was recovered from all of the treated samples. However, significantly fewer intact, identifiable granules were found as tools were exposed to higher temperatures for longer periods of time. Windows Excel Spreadsheets

  • Open Access
    Authors: 
    Gatherer, Derek;
    Publisher: Zenodo

    The data file is a spreadsheet used to record queries made via CQPweb (https://cqpweb.lancs.ac.uk). Search Terms For clarity, in the ensuing descriptions, we use bold font for search terms and italic font for collocates and other quotations. Based on clinical descriptions of COVID-19 (reviewed by Cevik et al., 2020), we identified the following search terms: 1) “cough”, 2) “fever”, 3) “pneumonia”. To avoid confusion with years when influenza pandemics may have occurred, we added 4) “influenza” and 5) “epidemic”. Any combination of terms 1 to 3 co-occurring with term 4 alone or terms 4 and 5 together, would be indicative of a respiratory outbreak caused by, or at the least attributed to, influenza. By contrast, any combination of terms 1 to 3 co-occurring with term 5 alone, or without either of terms 4 and 5, would suggest a respiratory disease that was not confidently identified as influenza at the time. This outbreak would provide a candidate coronavirus epidemic for further investigation. Newspapers Newspapers and years searched were as follows: Belfast Newsletter (1828-1900), The Era (1838-1900), Glasgow Herald (1820-1900), Hampshire & Portsmouth Telegraph (1799-1900), Ipswich Journal (1800-1900), Liverpool Mercury (1811-1900), Northern Echo (1870-1900) Pall Mall Gazette (1865-1900), Reynold’s Daily (1850-1900), Western Mail (1869-1900) and The Times (1785-2009). The search in The Times was extended to 2009 in order to provide a comparison with the 20th century. Searches were performed using Lancaster University’s instance of the CQPweb (Corpus Query Processor) corpus analysis software (https://cqpweb.lancs.ac.uk/; Hardie, 2012). CQPweb’s database is populated from the newspapers listed, using optical character recognition (OCR), so for older publications in particular, some errors may be present (McEnery et al., 2019). Statistics The occurrence of each of the five search terms was calculated per million words within the annual output of each publication, in CQPweb. This is compared to a background distribution constituting the corresponding words per million for each search term over the total year range for each newspaper. Within the annual distributions, for each search term and each newspaper, we determined the years lying in the top 1% (i.e. p<0.05 after application of a Bonferroni correction), following Gabrielatos et al. (2012). These are deemed to be years when that search term was in statistically significant usage above its background level for the newspaper in which it occurs. For years when search terms were significantly elevated, we also calculated collocates at range n. Collocates, in corpus linguistics, are other words found at statistically significant usage, over their own background levels, in a window from n positions to the left to n positions to the right of the search term. In other words, they are found in significant proximity to the search term. A default value of n=10 was used throughout, unless specified. Collocation analysis therefore assists in showing how a search term associates with other words within a corpus, providing information about the context in which that search term is used. CQPweb provides a log ratio method for the quantification of the strength of collocation. COVID-19 is the first known coronavirus pandemic. Nevertheless, the seasonal circulation of the four milder coronaviruses of humans – OC43, NL63, 229E and HKU1 – raises the possibility that these viruses are the descendants of more ancient coronavirus pandemics. This proposal arises by analogy to the observed descent of seasonal influenza subtypes H2N2 (now extinct), H3N2 and H1H1 from the pandemic strains of 1957, 1968 and 2009, respectively. Recent historical revisionist speculation has focussed on the influenza pandemic of 1889-1892, based on molecular phylogenetic reconstructions that show the emergence of human coronavirus OC43 around that time, probably by zoonosis from cattle. If the “Russian influenza”, as The Times named it in early 1890, was not influenza but caused by a coronavirus, the origins of the other three milder human coronaviruses may also have left a residue of clinical evidence in the 19th century medical literature and popular press. In this paper, we search digitised 19th century British newspapers for evidence of previously unsuspected coronavirus pandemics. We conclude that there is little or no corpus linguistic signal in the UK national press for large-scale outbreaks of unidentified respiratory disease for the period 1785 to 1890. To view data, open in Microsoft Excel. To reproduce the data from scratch, a login is needed to CQPweb (https://cqpweb.lancs.ac.uk). This is free of charge but requires authorization, which can be applied for at the URL given.

  • Open Access
    Authors: 
    Prendin, Angela Luisa; Normand, Signe; Carrer, Marco; Bjerregaard Pedersen, Nanna; Matthiesen, Henning; Westergaard-Nielsen, Andreas; Elbering, Bo; Treier, Urs Albert; Hollesen, Jørgen;
    Publisher: Zenodo

    The combined effects of climate change and nutrient availability on Arctic vegetation growth are poorly understood. Archaeological sites in the Arctic could represent unique nutrient hotspots for studying the long-term effect of nutrient enrichment. In this study, we analysed a time-series of ring widths of Salix glauca L. collected at nine archaeological sites and in their natural surroundings along a climate gradient in the Nuuk fjord region, Southwest Greenland, stretching from the edge of the Greenlandic Ice Sheet in the east to the open sea in the west. We assessed the temperature-growth relationship for the last four decades distinguishing between soils with past anthropogenic nutrient enrichment (PANE) and without (controls). Along the East–West gradient, the inner fjord sites showed a stronger temperature signal compared to the outermost ones. Individuals growing in PANE soils had wider ring widths than individuals growing in the control soils and a stronger climate-growth relation, especially in the inner fjord sites. Thereby, the individuals growing on the archaeological sites seem to have benefited more from the climate warming in recent decades. Our results suggest that higher nutrient availability due to past human activities plays a role in Arctic vegetation growth and should be considered when assessing both the future impact of plants on archaeological sites and the general greening in landscapes with contrasting nutrient availability. This file includes the data used to analyse: i) time series of mean ring width of Salix glauca L. and climate from the nine archaeological sites (PANE) and in their natural surroundings (CONT) in Nuuk Fjord (West Greenland); ii) climate-growth relationship and the effect of nutrient availability; iii) climate and nutrient sensitivity accounting for the effect of insect outbreaks and their carry-over effects (Supplementary Information). In particular, time series of raw and standardized ring width (RW and Z-score), mean temperature of June-August and the average sum of thawing (TDD) and growing degree days (GDD) for all the PANE and CONT sites. Variables are organized in columns and named with the acronyms described in Table 1, Table 2. and Figure 2-5.

  • Open Access English
    Authors: 
    Egeland, Charles; Fadem, Cynthia; Byerly, Ryan; Henderson, Cory; Fitzgerald, Curran; Mabulla, Audax; Baquedano, Enrique; Gidna, Agniss;
    Publisher: Dryad

    Variable Description Type Type of sample (calibration = calibration coin for Delta Innov-X Analyzer; standard = NIST geological standard; geological = geological sample from lithic raw material source; artifact = archaeological specimen) Replicate Replicate measurement (Yes or No) Source Geological source (delta = Delta Innov-x Analyzer calibration coin; nist = National Institute of Standards and Technology (NIST) geological standard; NS = Naibor Soit; NH = Naisuisui Hill; OL = Oldonye Okule; LD = Lemagarut drainage; SS = Shifting Sand; KG = Kelogi Hills; EN = Engelosin) Outcrop Individual outcrop within geological source (NSM = Naibor Soit Main Hill; NSMH = Naibor Soit Manyata Hill; NSSO = Naibor Soit Southern Outlier; NH = Naisuisui Hill; LD1 = Lemagarut Drainage 1; LD2 = Lemagarut Drainage 2; BKE = BK East; SS = Shifting Sand; KG = Kelogi Hills; EN = Engelosin; NA = Not applicable) Specimen Individual find or sample number Material Raw material type (QTZ = "Quartz-rich"; GN = Gneiss; FGV = Fine-grained volcanic) Element concentration estimate Reported for each element (e.g., P, Cl, Ca; empty cells are "non-detect") Analytical error Reported for each element (e.g., P +/-, Cl +/-, Ca +/-; no error reported for "non-detect" elements) The published analysis focused only on granulite specimens (n = 186) and, more specifically, on six elements (Fe, Ti, Zr, K, Sr, and Y) that had detection rates >75% in the granulite specimens (that is, these elements were detected in more than 75% of the granulite specimens). These elements were used in the predictive models from the published analysis. Two of the 186 granulite specimens were missing values for five out of the six elements and were therefore not included in the statistical analyses. Of the remaining 184 specimens, 55 had missing data for one element, and two of those 55 had missing data for two elements. These missing values were treated as censored data (that is, the element is present but could not be measured precisely enough for the instrument to report a value). These missing values were interpolated in one of two ways. For those specimens subjected to replicate pXRF runs (n = 7), the missing value was replaced with the mean value of the replicates. The missing values for the remaining specimens (n = 48) were replaced with the mean of the four closest (as determined in two-dimensional space) specimens with measured (rather than interpolated) values. The main data file does not include these interpolated values. Should analysts choose to use them, interpolated values can be found in the additional .csv file. The invention and proliferation of stone tool technology in the Early Stone Age (ESA) marks a watershed in human evolution. Patterns of lithic procurement, manufacture, use, and discard have much to tell us about ESA hominin cognition and land use. However, these issues cannot be fully explored outside the context of the physical attributes and spatio-temporal availability of the lithic raw materials themselves. The Olduvai Basin of northern Tanzania, which is home to both a wide variety of potential toolstones and a rich collection of ESA archaeological sites, provides an excellent opportunity to investigate the relationship between lithic technology and raw material characteristics. Here, we examine two attributes of the basin's igneous and metamorphic rocks: spatial location and fracture predictability. A total of 244 geological specimens were analyzed with non-destructive portable XRF (pXRF) to determine the geochemical distinctiveness of five primary and secondary sources, while 110 geological specimens were subjected to Schmidt rebound hardness tests to measure fracture predictability. Element concentrations derived via pXRF show significant differences between sources, and multivariate predictive models classify geological specimens with 75–80% accuracy. The predictive models identify Naibor Soit as the most likely source for a small sample of three lithic artifacts from Bed II, which supports the idea that this inselberg served as a source of toolstone during the early Pleistocene. Clear patterns in fracture predictability exist within and between both sources and rock types. Fine-grained volcanics show high rebound values (associated with high fracture predictability), while finer-grained metamorphics and coarsegrained gneisses show intermediate and low rebound values, respectively. Artifact data from Bed I and II suggest that fracture predictability played a role in raw material selection at some sites, but other attributes like durability, expediency, and nodule size and shape were more significant. A total of 244 rock specimens (aka "geological specimens") were collected from eight primary (six granulite outcrops, one gneiss outcrop, one phonolite outcrop) and one secondary (a seasonal drainage containing basalt blocks) lithic raw material sources in the Olduvai Basin. Rock specimens were flaked directly from the sources with a rockhammer. Only granulite specimens with visually quartz-rich compositions were selected. Five quartz-rich metamorphic artifacts (aka "archaeological specimens") from BK East, a ca. 1.5 million-year-old site on the south wall of the side gorge in Olduvai Gorge, were also included. Portable XRF (pXRF) analyses were conducted with an Innov-X Delta Classic Environmental Analyzer equipped with a 4W Au anode X-ray tube and a Si-PIN diode detector. All analyses were performed while the instrument was docked into a stable, hands-free test stand. An unweathered, non-cortical surface free of sediment matrix was placed over, and completely covered, the detector window. Each specimen was measured for 360 seconds using all three of the instrument's beams (120 seconds/beam). After an initial energy scale calibration test with a factory issued metal coin of known composition, the following protocol was observed: (1) a powdered sample of Standard Reference Material (SRM) 2702 with elemental concentrations certified by NIST was measured; (2) four geological/archaeological specimens were then measured; (3) the fifth geological/archaeological specimen in a series was measured five times (that is, five consecutive 360-second cycles) without being moved or reoriented; (4) after the fifth geological/archaeological specimen was measured, the SRM 2702 sample was measured once again, which initiated the next series of measurements. Element concentrations were derived with the Compton Normalization correction model and the factory-set “Soil Environmental” calibration.

  • Open Access English
    Authors: 
    Wei, Shichao; Li, Zitong; Momigliano, Paolo; Fu, Chao; Wu, Hua; Merilä, Juha;
    Publisher: Dryad
    Project: AKA | Evolutionary Genetics of ... (218343), AKA | Centre of Excellence in E... (129662), AKA | Evolutionary and conserva... (316294), AKA | Evolutionary genetics of ... (134728)

    The role of geological events and Pleistocene climatic fluctuations as drivers of current patterns of genetic variation in extant species has been a topic of continued interest among evolutionary biologists. Nevertheless, comprehensive studies of widely distributed species are still rare, especially from Asia. Using geographically extensive sampling of many individuals and a large number of nuclear single nucleotide polymorphisms (SNPs), we studied the phylogeography and historical demography of Hyla annectans populations in southern China. Thirty-five sampled populations were grouped into seven clearly defined genetic clusters that closely match phenotype-based subspecies classification. These lineages diverged 2.32–5.23 million years ago, a timing that closely aligns with the rapid and drastic uplifting of the Qinghai-Tibet Plateau and adjacent southwest China. Demographic analyses and species distribution models indicate that different populations of this species have responded differently to past climatic changes. In the Hengduan Mountains, most populations experienced a bottleneck, whereas the populations located outside of the Hengduan Mountains have gradually declined in size since the end of the last glaciation. In addition, the levels of phenotypic and genetic divergence were strongly correlated across major clades. These results highlight the combined effects of geological events and past climatic fluctuations, as well as natural selection, as drivers of contemporary patterns of genetic and phenotypic variation in a widely distributed anuran in Asia. 'SNP_data_for_H.annectans' is the SNP data for Hyla annectans in vcf formats. Which is used for the phylogeney tree, genetic structure, genetic differentiation, demographic analyses. 'Morphological_data_info' are the statistic data of snout-vent length (SVL), weight and spots numbers used for morphological analyses and QST-FST comparison. 'SDM_input_ascii' are the SDM ascii files used for SDMs. 'SDM_locality_info' are the occurrence data points of five genetic clusters for the H. annectans.

Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
208 Research products, page 1 of 21
  • Open Access English
    Authors: 
    Lundy, Jasmine;
    Publisher: Dryad

    From the 9th to 14th centuries AD, Sicily experienced a series of rapid and quite radical changes in political regime, but the impact of these regime changes on the lives of the people that experienced them remains largely elusive within the historical narrative. We use a multi-faceted lipid residue approach to give direct chemical evidence of the use of 248 everyday domestic ceramic containers from Islamic and post-Islamic contexts in western Sicily to aid our understanding of daily habits throughout this period of political change. A range of commodities was successfully identified, including animal fats, vegetable products, fruit products, (potentially including wine), and plant resins. The study highlights the complexity of residues in Early Medieval Mediterranean society, as in many cases mixtures of commodities were observed, reflecting sequential cooking events and/or the complex mixtures reflective of medieval recipes. However, overall there were no clear changes in the composition of the residues following the imposition of Norman control over the island and through subsequent periods, despite some differences between urban centres and rural sites. Thus, lending to the idea that post-Islamic populations largely flourished and benefited from the agricultural systems, resources, and recipes left by their predecessors. This data set is comprised of data files produced by Gas Chromatography-Mass Spectrometry (GC-MS) of lipids extracted using acid extraction method from pottery sherds from 9th-14th century contexts in Sicily. These data are linked to the published journal where methods of extraction, the context of pottery and the interpretation of data are fully described. Each file corresponds to the sample name as recorded in S1 data and contains a usable cdf. file. The acquisition method for all files is given in TEXT format.

  • Open Access
    Authors: 
    Gravis, David; Roy, Nicolas; Ruffini-Ronzani, Nicolas; Houssiau, Laurent; Felten, Alexandre; Tumanov, Nikolay; Deparis, Olivier;
    Publisher: Zenodo

    XRD diffractograms, ToF-SIMS MS and ATR-FTIR spectrometry spectra, recorded on inks on historical parchments (pigments, inked areas). ToF-SIMS and ATR-FTIR spectra from non-inked areas of the parchments. See the read-me file for complete description of the files and structure, and the main manuscript for the methodology. The PCA algorithm code is also provided. Article abstract : Book production by medieval scriptoria have gained growing interest in recent studies. In this context, identifying ink compositions and parchment animal species from illuminated manuscripts is of great importance. Here, we introduce time-of-flight secondary ion mass spectrometry (ToF-SIMS) as a non-invasive tool to identify both inks and animal skins in manuscripts, at the same time. For this purpose, both positive and negative ion spectra in inked and non-inked areas were recorded. Chemical compositions of pigments (decoration) or black inks (text) were determined by searching for characteristic ion mass peaks. Animal skins were identified by data processing of raw ToF-SIMS spectra using Principal Component Analysis (PCA). In illuminated manuscripts from 15th c. to 16th c., malachite (green), azurite (blue), cinnabar (red) inorganic pigments, as well as iron-gall black ink, were identified. Carbon black and indigo (blue) organic pigments were also identified. Animal skins were identified in modern parchments of known animal species by a two-steps PCA procedure. We believe the proposed method will find extensive application in material studies of medieval manuscripts, as it is non-invasive, highly sensitive and able to identify both inks and animal skins at the same time, even from traces of pigments and tiny scanned areas. In-house PCA algorithm requires python. ToF-SIMS raw data require SurfaceLab software. ATR-FTIR raw data (.0) can be read with free-licence software (Fityk). XRD diffractograms are directly exported in .txt from .xyz files. Analytical and data processing methods can be found in the manuscript.

  • Open Access
    Authors: 
    Oh, Inez; Schindler, Suzanne; Ghoshal, Nupur; Lai, Albert; Payne, Philip; Gupta, Aditi;
    Publisher: Dryad

    Objectives There is much interest in utilizing clinical data for developing prediction models for Alzheimer disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured Electronic Health Record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by two clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results Documentation rates for each phenotype varied in the structured versus unstructured EHR. Inter-annotator agreement was high (Cohen’s kappa = 0.72–1) and positively correlated with the NLP-based phenotype extraction pipeline’s performance (average F1-score = 0.65-0.99) for each phenotype. Discussion We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine-learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability. We developed a natural language processing (NLP)-based pipeline which contains independent NLP modules that target the extraction of ten clinical phenotypes relevant to Alzheimer disease dementia progression. The pipeline was trained on unstructured clinical notes originating from Allscripts TouchWorks associated with AD dementia patient office vsits that occurred between June 1, 2013, to May 31, 2018, extracted from the Washington University in St. Louis Research Data Core (RDC), a repository of patient clinical data from BJC HealthCare and Washington University Physicians. The targeted phenotypes included neurobehavioral test scores (Clinical Dementia Rating and Mini-Mental State Exam) and their corresponding test dates, comorbidities (hypertension and depression), neuroimaging findings (presence of atrophy or infarct), behavioral indicators of dementia (repeating and misplacing), biomarker levels (total and phosphorylated tau protein levels), and family history (whether there was a family history of dementia, and if yes, which family member(s)). The clinical notes extracted from EHR were in rich text format (RTF) contained within tab-delimited files (TXT) alongside metadata such as the patient medical record number, author, and date authored. These were preprocessed before being analyzed by the NLP-based phenotype extraction pipeline. This entailed converting the TXT files to comma-separated files (CSV), accounting for additional tab, quote, and newline characters present, and stripping the RTF formatting. Data preprocessing steps were performed using the Python Pandas and striprtf (version 0.0.10) packages. Linguamatics I2E query files (*.i2qy) and Enterprise Architect Simulation Library (EASL) code for each NLP module can be found on the Linguamatics Community webpage (https://community.linguamatics.com/queries), accessible with the creation of a free account. Linguamatics I2E software is required to open the query files (*.i2qy) directly, but the logic underlying the NLP modules can be understood by referencing the EASL code.

  • Open Access English
    Authors: 
    Dhrangadhariya, Anjani; Müller, Henning;
    Publisher: Dryad

    This upload contains four main zip files. ds_cto_dict.zip: This zip file contains the four distant supervision dictionaries (P: participant.txt, I = intervention.txt, intervetion_syn.txt, O: outcome.txt) generated from clinicaltrials.gov using the Methodology described in Distant-CTO (https://aclanthology.org/2022.bionlp-1.34/). These dictionaries were used to create distant supervision labelling functions as described in the Labelling sources subsection of the Methodology. The data was derived from https://clinicaltrials.gov/ handcrafted_dictionaries.zip: This zip folder contains three files 1) gender_sexuality.txt: a list of possible genders and sexual orientations found across the web. The list needs to be more comprehensive. 2) endpoints_dict.txt: contains outcome names and the names of questionnaires used to measure outcomes assembled from PROM questionnaires and PROMs. and 3) comparator_dict: contains a list of idiosyncratic comparator terms like a sham, saline, placebo, etc., compiled from the literature search. The list needs to be more comprehensive. test_ebm_correctedlabels.tsv: EBM-PICO is a widely used dataset with PICO annotations at two levels: span-level or coarse-grained and entity-level or fine-grained. Span-level annotations encompass the full information about each class. Entity-level annotations cover the more fine-grained information at the entity level, with PICO classes further divided into fine-grained subclasses. For example, the coarse-grained Participant span is further divided into participant age, gender, condition and sample size in the randomised controlled trial. This dataset comes pre-divided into a training set (n=4,933) annotated through crowd-sourcing and an expert annotated gold test set (n=191) for evaluation. The EBM-PICO annotation guidelines caution about variable annotation quality. Abaho et al. developed a framework to post-hoc correct EBM-PICO outcomes annotation inconsistencies. Lee et al. studied annotation span disagreements suggesting variability across the annotators. Low annotation quality in the training dataset is excusable, but the errors in the test set can lead to faulty evaluation of the downstream ML methods. We evaluate 1% of the EBM-PICO training set tokens to gauge the possible reasons for the fine-grained labelling errors and use this exercise to conduct an error-focused PICO re-annotation for the EBM-PICO gold test set. The file 'test_ebm_correctedlabels.tsv' has error corrected EBM-PICO gold test set. This dataset could be used as a complementary evalution set along with EBM-PICO test set. error_analysis.zip: This .zip file contains three .tsv files for each PICO class to identify possible errors in about 1% (about 12,962 tokens) of the EBM-PICO training set. Objective: PICO (Participants, Interventions, Comparators, Outcomes) analysis is vital but time-consuming for conducting systematic reviews (SRs). Supervised machine learning can help fully automate it, but a lack of large annotated corpora limits the quality of automated PICO recognition systems. The largest currently available PICO corpus is manually annotated, which is an approach that is often too expensive for the scientific community to apply. Depending on the specific SR question, PICO criteria are extended to PICOC (C-Context), PICOT (T-timeframe), and PIBOSO (B-Background, S-Study design, O-Other) meaning the static hand-labelled corpora need to undergo costly re-annotation as per the downstream requirements. We aim to test the feasibility of designing a weak supervision system to extract these entities without hand-labelled data. Methodology: We decompose PICO spans into its constituent entities and re-purpose multiple medical and non-medical ontologies and expert-generated rules to obtain multiple noisy labels for these entities. These labels obtained using several sources are then aggregated using simple majority voting and generative modelling approaches. The resulting programmatic labels are used as weak signals to train a weakly-supervised discriminative model and observe performance changes. We explore mistakes in the currently available PICO corpus that could have led to inaccurate evaluation of several automation methods. Results: We present Weak-PICO, a weakly-supervised PICO entity recognition approach using medical and non-medical ontologies, dictionaries and expert-generated rules. Our approach does not use hand-labelled data. Conclusion: Weak supervision using weak-PICO for PICO entity recognition has encouraging results, and the approach can potentially extend to more clinical entities readily. All the datasets could be opened using text editors or Google sheets. The .zip files in the dataset can be opened using the archive utility on Mac OS and unzip functionality in Linux. (All Windows and Apple operating systems support the use of ZIP files without additional third-party software)

  • Open Access
    Authors: 
    Chang, Charlotte;
    Publisher: Zenodo

    Social media platforms, such as Twitter, are an increasingly important source of information and are forums for discourse within and between interest groups. Research highlights how social media communities have amplified movements such as the Arab Spring, #MeToo, and Black Lives Matter. But environmental digital discourse remains underexplored. In the present article, we apply automated text analysis to 200,000 Twitter users in several countries following leading environmental nongovernmental organizations. Some issues such as public action to decarbonize society or species conservation were discussed more intensely than agriculture or marine conservation. Our results illustrate where environmental discourse diverges and converges on Twitter across countries, states, and characteristics, such as political ideology. Using the coterminous United States as a case study, we observed that the prominence of issues varies across states and, in some cases, covaries with political ideology across counties. Our findings show paths forward to characterizing environmental priorities across many issues at unprecedented scale and extent. In this repository, we provide data and code to reproduce the results in the main text of this manuscript. Twitter data querying must comply with Twitter's Terms of Service. Researchers should obtain human ethics approval from their Institutional Review Board before scraping Twitter data. These data were scraped from Twitter. Please consult the README for more information.

  • Open Access
    Authors: 
    Wilks, Stefania; Louderback, Lisbeth;
    Publisher: Zenodo

    Starch Extraction and Identification All samples were processed for starch following standard lab procedures (see Louderback et al., 2015). Large ground stone tools were surface sampled for ~5 min with a sonicating toothbrush and DH2O ; smaller tools were sampled after sonicating in a DH2O bath for 3 min. Each sample was rinsed through an Endecott mesh sieve (125µm) with DH2O and sample liquid <125µm was retained in a 50mL test tube. Sample extract >125µm was discarded. The sample liquid (presumably containing starch) was centrifuged for 3minutes at 3,000 rpm. The supernatant was decanted and the sample pellet was transferred to a sterile 15mL tube with DH2O , mixed with a vortex, centrifuged for 3minutes at 3,000 rpm, and decanted. Samples were resuspended with ~7mL of heavy liquid (lithium heteropolytungstate; specific gravity 2.2), vortex-mixed, and centrifuged for 15 minutes at 1,000rpm. Heavy liquid separates the lighter organic material, including starch granules, from the heavier content. The lighter organics were collected using a pipette and transferred to a new, sterile 15mL tube. Residual heavy liquid was rinsed from the organic matter with ~10mL of DH2O , mixed with a vortex, and centrifuged for 3 minutes at 3,000rpm; twice. The samples were decanted and rinsed with ~7mL of acetone, vortex-mixed, and centrifuged for 3 minutes at 3,000rpm. After decanting one final time, processed samples were left to dry overnight before mounting on glass slides for microscopy observation. Starch granules were measured and described based on a set of established criteria, including maximum length through the hilum (µm), hilum position, two-dimensional shape, clarity of the extinction cross, and the presence or absence of surface features such as fissures and pressure facets (Brown and Louderback 2020; Holst et al., 2007; ICSN, 2011; Joyce et al 2021; Louderback and Pavlik, 2017; Musaubach et al., 2013; Piperno et al., 2004, 2009; Reichert, 1913; Torrence and Barton, 2016). These criteria were recorded as absent (0) or present (1) and expressed as a percentage of the occurrence. Slides were scanned with a transmitted brightfield microscope using polarizing filters and Nomarski optics (Zeiss Axioskop Imager M1, Zeiss International, Göttingen, Germany). Observations were obtained using randomly generated X and Y coordinates on the microscope stage. All starch granules observed within each field of view were measured and described. Images and measurements at 400X were captured under polarized light (POL) with a digital camera (Zeiss AxioCam MRc5) using Zen Core 3.1 imaging and measurement software. The presence of surface features was imaged and recorded in differential interference contrast (DIC) micrographs. Identification and Quantification of Treated Starch The relative proportions and arrangements of amylose and amylopectin affect both granule morphology and functionality (Vamadevan and Bertoft, 2015). They can also cause various mechanical and chemical changes, such as granule swelling (gelatinization), pasting, and loss of birefringence in response to different food processing methods (Cai et al., 2014; Crowther, 2012; Di Poala et al., 2003; Gong et al., 2011; Mason, 2009; Wang et al., 2014). Exposing starch granules to heat can cause morphological damage such that granules may be difficult to identify. Granules exposed to high temperatures have been shown to enlarge in size (swell) and gelatinize, losing their birefringence cross, amylose layers, and surface characteristics before dispersing away entirely (Crowther, 2012; Singh et al., 2002; Vamadan and Bertoft, 2015). Congo Red dye (empirical formula C32H22N6O6S2Na2) reacts with granules when amylose layers are broken down (usually due to cooking or some other form of damage), staining them orange to vivid red. Undamaged starch granules (with intact amylose layers), however, are hydrophobic and, therefore, do not react with Congo Red (Lamb and Loy, 2005). To measure the percent or degree of damage to starch granules, milled and burned samples were treated with a Congo Red solution following a protocol similar to Lamb and Loy (2005). Dried residue samples were resuspended with 25µL of Congo Red and absorbed the stain for 15 minutes before diluting with 100µL of DH2O (1:4 ratio). Slides were prepped with 25µL of the hydrated sample and a cover slip was applied but not affixed with fingernail polish (experimentation found that clear fingernail polish interfered with Congo Red stain). The liquified stained samples dried within ~45 minutes, therefore, all observations were photographed immediately. Microscope observations on slides from the milled and burned (close proximity) samples were obtained using randomly generated X and Y coordinates. Samples exposed directly to flame, however, produced fewer measurable granules, so observations were collected by scanning the entire slide. Size distributions from the control, milled, and burned samples were statistically analyzed with a Kolmogorov-Smirnov (K-S) test to determine any significant difference in the distribution of median lengths. Boxplot-stripcharts were generated in R Statistical Software (v4.0.2; R Core Team 2020) to show the quartile summary variation Intense wildfires destroy everything in their path, including archaeological sites. Prehistorically, archaeological sites were regularly and intentionally burned. In what ways does burning affect those sites? With increased wildfire activity, research has begun to describe the effects of fire on archaeological materials through post-fire and experimental treatment, yet, little is known about the effects of fire on microbotanical remains, such as starch granules. Although some studies address the impact of fire on starch-rich foods, there is virtually no research on the fire effects of starch granules embedded in ground stone tools. The current study examines changes in the morphology of starch granules embedded in ground stone tools before and after exposure to flame combustion. A measurable amount of intact and identifiable starch granules was recovered from all of the treated samples. However, significantly fewer intact, identifiable granules were found as tools were exposed to higher temperatures for longer periods of time. Windows Excel Spreadsheets

  • Open Access
    Authors: 
    Gatherer, Derek;
    Publisher: Zenodo

    The data file is a spreadsheet used to record queries made via CQPweb (https://cqpweb.lancs.ac.uk). Search Terms For clarity, in the ensuing descriptions, we use bold font for search terms and italic font for collocates and other quotations. Based on clinical descriptions of COVID-19 (reviewed by Cevik et al., 2020), we identified the following search terms: 1) “cough”, 2) “fever”, 3) “pneumonia”. To avoid confusion with years when influenza pandemics may have occurred, we added 4) “influenza” and 5) “epidemic”. Any combination of terms 1 to 3 co-occurring with term 4 alone or terms 4 and 5 together, would be indicative of a respiratory outbreak caused by, or at the least attributed to, influenza. By contrast, any combination of terms 1 to 3 co-occurring with term 5 alone, or without either of terms 4 and 5, would suggest a respiratory disease that was not confidently identified as influenza at the time. This outbreak would provide a candidate coronavirus epidemic for further investigation. Newspapers Newspapers and years searched were as follows: Belfast Newsletter (1828-1900), The Era (1838-1900), Glasgow Herald (1820-1900), Hampshire & Portsmouth Telegraph (1799-1900), Ipswich Journal (1800-1900), Liverpool Mercury (1811-1900), Northern Echo (1870-1900) Pall Mall Gazette (1865-1900), Reynold’s Daily (1850-1900), Western Mail (1869-1900) and The Times (1785-2009). The search in The Times was extended to 2009 in order to provide a comparison with the 20th century. Searches were performed using Lancaster University’s instance of the CQPweb (Corpus Query Processor) corpus analysis software (https://cqpweb.lancs.ac.uk/; Hardie, 2012). CQPweb’s database is populated from the newspapers listed, using optical character recognition (OCR), so for older publications in particular, some errors may be present (McEnery et al., 2019). Statistics The occurrence of each of the five search terms was calculated per million words within the annual output of each publication, in CQPweb. This is compared to a background distribution constituting the corresponding words per million for each search term over the total year range for each newspaper. Within the annual distributions, for each search term and each newspaper, we determined the years lying in the top 1% (i.e. p<0.05 after application of a Bonferroni correction), following Gabrielatos et al. (2012). These are deemed to be years when that search term was in statistically significant usage above its background level for the newspaper in which it occurs. For years when search terms were significantly elevated, we also calculated collocates at range n. Collocates, in corpus linguistics, are other words found at statistically significant usage, over their own background levels, in a window from n positions to the left to n positions to the right of the search term. In other words, they are found in significant proximity to the search term. A default value of n=10 was used throughout, unless specified. Collocation analysis therefore assists in showing how a search term associates with other words within a corpus, providing information about the context in which that search term is used. CQPweb provides a log ratio method for the quantification of the strength of collocation. COVID-19 is the first known coronavirus pandemic. Nevertheless, the seasonal circulation of the four milder coronaviruses of humans – OC43, NL63, 229E and HKU1 – raises the possibility that these viruses are the descendants of more ancient coronavirus pandemics. This proposal arises by analogy to the observed descent of seasonal influenza subtypes H2N2 (now extinct), H3N2 and H1H1 from the pandemic strains of 1957, 1968 and 2009, respectively. Recent historical revisionist speculation has focussed on the influenza pandemic of 1889-1892, based on molecular phylogenetic reconstructions that show the emergence of human coronavirus OC43 around that time, probably by zoonosis from cattle. If the “Russian influenza”, as The Times named it in early 1890, was not influenza but caused by a coronavirus, the origins of the other three milder human coronaviruses may also have left a residue of clinical evidence in the 19th century medical literature and popular press. In this paper, we search digitised 19th century British newspapers for evidence of previously unsuspected coronavirus pandemics. We conclude that there is little or no corpus linguistic signal in the UK national press for large-scale outbreaks of unidentified respiratory disease for the period 1785 to 1890. To view data, open in Microsoft Excel. To reproduce the data from scratch, a login is needed to CQPweb (https://cqpweb.lancs.ac.uk). This is free of charge but requires authorization, which can be applied for at the URL given.

  • Open Access
    Authors: 
    Prendin, Angela Luisa; Normand, Signe; Carrer, Marco; Bjerregaard Pedersen, Nanna; Matthiesen, Henning; Westergaard-Nielsen, Andreas; Elbering, Bo; Treier, Urs Albert; Hollesen, Jørgen;
    Publisher: Zenodo

    The combined effects of climate change and nutrient availability on Arctic vegetation growth are poorly understood. Archaeological sites in the Arctic could represent unique nutrient hotspots for studying the long-term effect of nutrient enrichment. In this study, we analysed a time-series of ring widths of Salix glauca L. collected at nine archaeological sites and in their natural surroundings along a climate gradient in the Nuuk fjord region, Southwest Greenland, stretching from the edge of the Greenlandic Ice Sheet in the east to the open sea in the west. We assessed the temperature-growth relationship for the last four decades distinguishing between soils with past anthropogenic nutrient enrichment (PANE) and without (controls). Along the East–West gradient, the inner fjord sites showed a stronger temperature signal compared to the outermost ones. Individuals growing in PANE soils had wider ring widths than individuals growing in the control soils and a stronger climate-growth relation, especially in the inner fjord sites. Thereby, the individuals growing on the archaeological sites seem to have benefited more from the climate warming in recent decades. Our results suggest that higher nutrient availability due to past human activities plays a role in Arctic vegetation growth and should be considered when assessing both the future impact of plants on archaeological sites and the general greening in landscapes with contrasting nutrient availability. This file includes the data used to analyse: i) time series of mean ring width of Salix glauca L. and climate from the nine archaeological sites (PANE) and in their natural surroundings (CONT) in Nuuk Fjord (West Greenland); ii) climate-growth relationship and the effect of nutrient availability; iii) climate and nutrient sensitivity accounting for the effect of insect outbreaks and their carry-over effects (Supplementary Information). In particular, time series of raw and standardized ring width (RW and Z-score), mean temperature of June-August and the average sum of thawing (TDD) and growing degree days (GDD) for all the PANE and CONT sites. Variables are organized in columns and named with the acronyms described in Table 1, Table 2. and Figure 2-5.

  • Open Access English
    Authors: 
    Egeland, Charles; Fadem, Cynthia; Byerly, Ryan; Henderson, Cory; Fitzgerald, Curran; Mabulla, Audax; Baquedano, Enrique; Gidna, Agniss;
    Publisher: Dryad

    Variable Description Type Type of sample (calibration = calibration coin for Delta Innov-X Analyzer; standard = NIST geological standard; geological = geological sample from lithic raw material source; artifact = archaeological specimen) Replicate Replicate measurement (Yes or No) Source Geological source (delta = Delta Innov-x Analyzer calibration coin; nist = National Institute of Standards and Technology (NIST) geological standard; NS = Naibor Soit; NH = Naisuisui Hill; OL = Oldonye Okule; LD = Lemagarut drainage; SS = Shifting Sand; KG = Kelogi Hills; EN = Engelosin) Outcrop Individual outcrop within geological source (NSM = Naibor Soit Main Hill; NSMH = Naibor Soit Manyata Hill; NSSO = Naibor Soit Southern Outlier; NH = Naisuisui Hill; LD1 = Lemagarut Drainage 1; LD2 = Lemagarut Drainage 2; BKE = BK East; SS = Shifting Sand; KG = Kelogi Hills; EN = Engelosin; NA = Not applicable) Specimen Individual find or sample number Material Raw material type (QTZ = "Quartz-rich"; GN = Gneiss; FGV = Fine-grained volcanic) Element concentration estimate Reported for each element (e.g., P, Cl, Ca; empty cells are "non-detect") Analytical error Reported for each element (e.g., P +/-, Cl +/-, Ca +/-; no error reported for "non-detect" elements) The published analysis focused only on granulite specimens (n = 186) and, more specifically, on six elements (Fe, Ti, Zr, K, Sr, and Y) that had detection rates >75% in the granulite specimens (that is, these elements were detected in more than 75% of the granulite specimens). These elements were used in the predictive models from the published analysis. Two of the 186 granulite specimens were missing values for five out of the six elements and were therefore not included in the statistical analyses. Of the remaining 184 specimens, 55 had missing data for one element, and two of those 55 had missing data for two elements. These missing values were treated as censored data (that is, the element is present but could not be measured precisely enough for the instrument to report a value). These missing values were interpolated in one of two ways. For those specimens subjected to replicate pXRF runs (n = 7), the missing value was replaced with the mean value of the replicates. The missing values for the remaining specimens (n = 48) were replaced with the mean of the four closest (as determined in two-dimensional space) specimens with measured (rather than interpolated) values. The main data file does not include these interpolated values. Should analysts choose to use them, interpolated values can be found in the additional .csv file. The invention and proliferation of stone tool technology in the Early Stone Age (ESA) marks a watershed in human evolution. Patterns of lithic procurement, manufacture, use, and discard have much to tell us about ESA hominin cognition and land use. However, these issues cannot be fully explored outside the context of the physical attributes and spatio-temporal availability of the lithic raw materials themselves. The Olduvai Basin of northern Tanzania, which is home to both a wide variety of potential toolstones and a rich collection of ESA archaeological sites, provides an excellent opportunity to investigate the relationship between lithic technology and raw material characteristics. Here, we examine two attributes of the basin's igneous and metamorphic rocks: spatial location and fracture predictability. A total of 244 geological specimens were analyzed with non-destructive portable XRF (pXRF) to determine the geochemical distinctiveness of five primary and secondary sources, while 110 geological specimens were subjected to Schmidt rebound hardness tests to measure fracture predictability. Element concentrations derived via pXRF show significant differences between sources, and multivariate predictive models classify geological specimens with 75–80% accuracy. The predictive models identify Naibor Soit as the most likely source for a small sample of three lithic artifacts from Bed II, which supports the idea that this inselberg served as a source of toolstone during the early Pleistocene. Clear patterns in fracture predictability exist within and between both sources and rock types. Fine-grained volcanics show high rebound values (associated with high fracture predictability), while finer-grained metamorphics and coarsegrained gneisses show intermediate and low rebound values, respectively. Artifact data from Bed I and II suggest that fracture predictability played a role in raw material selection at some sites, but other attributes like durability, expediency, and nodule size and shape were more significant. A total of 244 rock specimens (aka "geological specimens") were collected from eight primary (six granulite outcrops, one gneiss outcrop, one phonolite outcrop) and one secondary (a seasonal drainage containing basalt blocks) lithic raw material sources in the Olduvai Basin. Rock specimens were flaked directly from the sources with a rockhammer. Only granulite specimens with visually quartz-rich compositions were selected. Five quartz-rich metamorphic artifacts (aka "archaeological specimens") from BK East, a ca. 1.5 million-year-old site on the south wall of the side gorge in Olduvai Gorge, were also included. Portable XRF (pXRF) analyses were conducted with an Innov-X Delta Classic Environmental Analyzer equipped with a 4W Au anode X-ray tube and a Si-PIN diode detector. All analyses were performed while the instrument was docked into a stable, hands-free test stand. An unweathered, non-cortical surface free of sediment matrix was placed over, and completely covered, the detector window. Each specimen was measured for 360 seconds using all three of the instrument's beams (120 seconds/beam). After an initial energy scale calibration test with a factory issued metal coin of known composition, the following protocol was observed: (1) a powdered sample of Standard Reference Material (SRM) 2702 with elemental concentrations certified by NIST was measured; (2) four geological/archaeological specimens were then measured; (3) the fifth geological/archaeological specimen in a series was measured five times (that is, five consecutive 360-second cycles) without being moved or reoriented; (4) after the fifth geological/archaeological specimen was measured, the SRM 2702 sample was measured once again, which initiated the next series of measurements. Element concentrations were derived with the Compton Normalization correction model and the factory-set “Soil Environmental” calibration.

  • Open Access English
    Authors: 
    Wei, Shichao; Li, Zitong; Momigliano, Paolo; Fu, Chao; Wu, Hua; Merilä, Juha;
    Publisher: Dryad
    Project: AKA | Evolutionary Genetics of ... (218343), AKA | Centre of Excellence in E... (129662), AKA | Evolutionary and conserva... (316294), AKA | Evolutionary genetics of ... (134728)

    The role of geological events and Pleistocene climatic fluctuations as drivers of current patterns of genetic variation in extant species has been a topic of continued interest among evolutionary biologists. Nevertheless, comprehensive studies of widely distributed species are still rare, especially from Asia. Using geographically extensive sampling of many individuals and a large number of nuclear single nucleotide polymorphisms (SNPs), we studied the phylogeography and historical demography of Hyla annectans populations in southern China. Thirty-five sampled populations were grouped into seven clearly defined genetic clusters that closely match phenotype-based subspecies classification. These lineages diverged 2.32–5.23 million years ago, a timing that closely aligns with the rapid and drastic uplifting of the Qinghai-Tibet Plateau and adjacent southwest China. Demographic analyses and species distribution models indicate that different populations of this species have responded differently to past climatic changes. In the Hengduan Mountains, most populations experienced a bottleneck, whereas the populations located outside of the Hengduan Mountains have gradually declined in size since the end of the last glaciation. In addition, the levels of phenotypic and genetic divergence were strongly correlated across major clades. These results highlight the combined effects of geological events and past climatic fluctuations, as well as natural selection, as drivers of contemporary patterns of genetic and phenotypic variation in a widely distributed anuran in Asia. 'SNP_data_for_H.annectans' is the SNP data for Hyla annectans in vcf formats. Which is used for the phylogeney tree, genetic structure, genetic differentiation, demographic analyses. 'Morphological_data_info' are the statistic data of snout-vent length (SVL), weight and spots numbers used for morphological analyses and QST-FST comparison. 'SDM_input_ascii' are the SDM ascii files used for SDMs. 'SDM_locality_info' are the occurrence data points of five genetic clusters for the H. annectans.