Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
8 Research products, page 1 of 1

  • Digital Humanities and Cultural Heritage
  • Publications
  • Research data
  • Research software
  • 05 social sciences
  • 050905 science studies
  • European Commission
  • EC|H2020
  • EU
  • OpenAIRE
  • Scientometrics

Relevance
arrow_drop_down
  • Open Access English
    Authors: 
    Lotta Vikström; Helena Haage; Erling Häggström Lundevaller;
    Publisher: Umeå universitet, Enheten för demografi och åldrandeforskning (CEDAR)
    Country: Sweden
    Project: EC | DISLIFE (647125)

    Historically, little is known about whether and to what extent disabled people found work and formed families. To fill this gap, this study analyses the life course trajectories of both disabled and non-disabled individuals, between the ages of 15 and 33, from the Sundsvall region in Sweden during the nineteenth century. Having access to micro-data that report disabilities in a population of 8,874 individuals from the parish registers digitised by the Demographic Data Base, Umeå University, we employ sequence analysis on a series of events that are expected to occur in life of young adults: getting a job, marrying and becoming a parent, while also taking into account out-migration and death. Through this method we obtain a holistic picture of the life course of disabled people. Main findings show that their trajectories did not include work or family to the same extent as those of non-disabled people. Secondary findings concerning migration and mortality indicate that the disabled rarely out-migrated from the region, and they suffered from premature deaths. To our knowledge this is the first study to employ sequence analysis on a substantially large number of cases to provide demographic evidence of how disability shaped human trajectories in the past during an extended period of life. Accordingly, we detail our motivation for this method, describe our analytical approach, and discuss the advantages and disadvantages associated with sequence analysis for our case study.

  • Open Access English
    Authors: 
    Camil Demetrescu; Andrea Ribichini; Marco Schaerf;
    Publisher: Springer Verlag
    Country: Italy
    Project: EC | SecondHands (643950)

    We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages.

  • Open Access English
    Authors: 
    Mercklé, Pierre; Zalc, Claire;
    Project: EC | LUBARTWORLD (818843)

    RésumésL’objectif de cet article est de proposer un examen détaillé des apports et des limites de la modélisation en histoire à partir du cas de la Shoah. Il s’appuie sur une enquête qui a permis de reconstituer les « trajectoires de persécution » des 992 Juifs de Lens pendant la Seconde Guerre mondiale, dont 527 seulement ont survécu. 491 ont été arrêtés, 468 ont été déportés et 449 ont été exterminés. Les données prosopographiques sont utilisées ici pour répondre à une question simple : est-il possible de modéliser la persécution ? En d’autres termes, est-il possible de construire une représentation simplifiée mais heuristique des processus causaux complexes qui ont déterminé les chances de survie face à la persécution nazie à partir de données standardisées sur un nombre relativement important d’individus ? L’article discute les apports et les limites d’une succession de méthodes quantifiées : celles qui s’inscrivent dans ce qu’Andrew Abbott appelle le « programme standard » des sciences sociales, ainsi que l’analyse des réseaux et l’analyse séquentielle. Pour chacune d’entre elles, sont plus particulièrement discutées les manières de rendre compte des interactions entre les individus, de l’historicité des comportements et des processus déterminant ces chances de survie. Les tentatives de modélisation à partir de données historiennes apportent ainsi de véritables renouvellements de connaissances, notamment lorsqu’elles sont menées de manière cumulative sur une même enquête. En passant d’une logique de propriétés individuelles à une logique de trajectoires interconnectées, ces approches permettent de mieux comprendre les interactions sociales et locales, et offrent ainsi des perspectives stimulantes pour la microhistoire de l’Holocauste.

  • Restricted
    Authors: 
    Ghazal Faraj; András Micsik;
    Publisher: Springer International Publishing
    Project: EC | COURAGE (692919)

    Creating links manually between large datasets becomes an extremely tedious task. Although the linked data production is growing massively, the interconnecting needs improvement. This paper presents our work regarding detecting and extending links between Wikidata and COURAGE entities with respect to cultural heritage data. The COURAGE project explored the methods for cultural opposition in the socialist era (cc. 1950–1990), highlighting the variety of alternative cultural scenes that flourished in Eastern Europe before 1989. We describe our methods and results in discovering common entities in the two datasets, and our solution for automating this task. Furthermore, it is shown how it was possible to enrich the data in Wikidata and to establish new, bi-directional connections between COURAGE and Wikidata. Hence, the audience of both databases will have a more complete view of the matched entities.

  • Open Access English
    Authors: 
    Kun Sun; Haitao Liu; Wenxin Xiong;
    Project: EC | WIDE (742545)

    AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.

  • Open Access English
    Authors: 
    Laurent Romary; Charles Riondet;
    Publisher: HAL CCSD
    Country: France
    Project: EC | EHRI (654164), EC | PARTHENOS (654119), EC | EHRI (261873)

    This article tackles the issue of integrating heterogeneous archival sources in one single data repository, namely the European Holocaust Research Infrastructure (EHRI) portal, whose aim is to support Holocaust research by providing online access to information about dispersed sources relating to the Holocaust (http://portal.ehri-project.eu). In this case, the problem at hand is to combine data coming from a network of archives in order to create an interoperable data space which can be used to search for, retrieve and disseminate content in the context of archival-based research. The scholarly purpose has specific consequences on our task. It assumes that the information made available to the researcher is as close as possible to the originating source in order to guarantee that the ensuing analysis can be deemed reliable. In the EHRI network of archives, as already observed in the case of the EU Cendari project, one cannot but face heterogeneity. The EHRI portal brings together descriptions from more than 1900 institutions. Each archive comes with a whole range of idiosyncrasies corresponding to the way it has been set up and evolved over time. Cataloging practices may also differ. Even the degree of digitization may range from the absence of a digital catalogue to the provision of a full-fledged online catalogue with all the necessary APIs for anyone to query and extract content. There is indeed a contrast here with the global endeavour at the international level to develop and promote standards for the description of archival content as a whole. Nonetheless, in a project like EHRI, standards should play a central role. They are necessary for many tasks related to the integration and exploitation of the aggregated content, namely: ● Being able to compare the content of the various sources, thus being able to develop quality-checking processes; ● Defining of an integrated repository infrastructure where the content of the various archival sources can be reliably hosted; ● Querying and re-using content in a seamless way; ● Deploying tools that have been developed independently of the specificities of the information sources, for instance in order to visualise or mine the resulting pool of information. The central aspect of the work described in this paper is the assessment of the role of the EAD (Encoded Archival Description) standard as the basis for achieving the tasks described above. We have worked out how we could develop a real strategy of defining specific customization of EAD that could be used at various stages of the process of integrating heterogeneous sources. While doing so, we have developed a methodology based on a specification and customization method inspired from the extensive experience of the Text Encoding Initiative (TEI) community. In the TEI framework, as we show in section 1, one has the possibility to model specific subsets or extensions of the TEI guidelines while maintaining both the technical (XML schemas) and editorial (documentation) content within a single framework. This work has led us quite far in anticipating that the method we have developed may be of a wider interest within similar environments, but also, as we believe, for the future maintenance of the EAD standard. Finally this work, successfully tested and implemented in the framework of EHRI [Riondet 2017], can be seen as part of the wider endeavour of European research infrastructures in the humanities such as CLARIN and DARIAH to provide support for researchers to integrate the use of standards in their scholarly practices. This is the reason why the general workflow studied here has been introduced as a use case in the umbrella infrastructure project PARTHENOS which aims, among other things, at disseminating information and resources about methodological and technical standards in the humanities.

  • Publication . Conference object . Contribution for newspaper or weekly magazine . 2020
    Open Access
    Authors: 
    Jeff Mitchell; Jeffrey S. Bowers;
    Publisher: International Committee on Computational Linguistics
    Country: United Kingdom
    Project: EC | M and M (741134)

    Recently, domain-general recurrent neural networks, without explicit linguistic inductive biases, have been shown to successfully reproduce a range of human language behaviours, such as accurately predicting number agreement between nouns and verbs. We show that such networks will also learn number agreement within unnatural sentence structures, i.e. structures that are not found within any natural languages and which humans struggle to process. These results suggest that the models are learning from their input in a manner that is substantially different from human language acquisition, and we undertake an analysis of how the learned knowledge is stored in the weights of the network. We find that while the model has an effective understanding of singular versus plural for individual sentences, there is a lack of a unified concept of number agreement connecting these processes across the full range of inputs. Moreover, the weights handling natural and unnatural structures overlap substantially, in a way that underlines the non-human-like nature of the knowledge learned by the network.

  • Publication . Conference object . Article . Preprint . 2018 . Embargo End Date: 01 Jan 2018
    Open Access
    Authors: 
    Hardy Hardy; Andreas Vlachos;
    Publisher: arXiv
    Country: United Kingdom
    Project: EC | SUMMA (688139)

    Recent work on abstractive summarization has made progress with neural encoder-decoder architectures. However, such models are often challenged due to their lack of explicit semantic modeling of the source document and its summary. In this paper, we extend previous work on abstractive summarization using Abstract Meaning Representation (AMR) with a neural language generation stage which we guide using the source document. We demonstrate that this guidance improves summarization results by 7.4 and 10.5 points in ROUGE-2 using gold standard AMR parses and parses obtained from an off-the-shelf parser respectively. We also find that the summarization performance using the latter is 2 ROUGE-2 points higher than that of a well-established neural encoder-decoder approach trained on a larger dataset. Code is available at \url{https://github.com/sheffieldnlp/AMR2Text-summ} Comment: Accepted in EMNLP 2018

Powered by OpenAIRE graph
Advanced search in Research products
Research products
arrow_drop_down
Searching FieldsTerms
Any field
arrow_drop_down
includes
arrow_drop_down
Include:
The following results are related to Digital Humanities and Cultural Heritage. Are you interested to view more results? Visit OpenAIRE - Explore.
8 Research products, page 1 of 1
  • Open Access English
    Authors: 
    Lotta Vikström; Helena Haage; Erling Häggström Lundevaller;
    Publisher: Umeå universitet, Enheten för demografi och åldrandeforskning (CEDAR)
    Country: Sweden
    Project: EC | DISLIFE (647125)

    Historically, little is known about whether and to what extent disabled people found work and formed families. To fill this gap, this study analyses the life course trajectories of both disabled and non-disabled individuals, between the ages of 15 and 33, from the Sundsvall region in Sweden during the nineteenth century. Having access to micro-data that report disabilities in a population of 8,874 individuals from the parish registers digitised by the Demographic Data Base, Umeå University, we employ sequence analysis on a series of events that are expected to occur in life of young adults: getting a job, marrying and becoming a parent, while also taking into account out-migration and death. Through this method we obtain a holistic picture of the life course of disabled people. Main findings show that their trajectories did not include work or family to the same extent as those of non-disabled people. Secondary findings concerning migration and mortality indicate that the disabled rarely out-migrated from the region, and they suffered from premature deaths. To our knowledge this is the first study to employ sequence analysis on a substantially large number of cases to provide demographic evidence of how disability shaped human trajectories in the past during an extended period of life. Accordingly, we detail our motivation for this method, describe our analytical approach, and discuss the advantages and disadvantages associated with sequence analysis for our case study.

  • Open Access English
    Authors: 
    Camil Demetrescu; Andrea Ribichini; Marco Schaerf;
    Publisher: Springer Verlag
    Country: Italy
    Project: EC | SecondHands (643950)

    We investigate the accuracy of how author names are reported in bibliographic records excerpted from four prominent sources: WoS, Scopus, PubMed, and CrossRef. We take as a case study 44,549 publications stored in the internal database of Sapienza University of Rome, one of the largest universities in Europe. While our results indicate generally good accuracy for all bibliographic data sources considered, we highlight a number of issues that undermine the accuracy for certain classes of author names, including compound names and names with diacritics, which are common features to Italian and other Western languages.

  • Open Access English
    Authors: 
    Mercklé, Pierre; Zalc, Claire;
    Project: EC | LUBARTWORLD (818843)

    RésumésL’objectif de cet article est de proposer un examen détaillé des apports et des limites de la modélisation en histoire à partir du cas de la Shoah. Il s’appuie sur une enquête qui a permis de reconstituer les « trajectoires de persécution » des 992 Juifs de Lens pendant la Seconde Guerre mondiale, dont 527 seulement ont survécu. 491 ont été arrêtés, 468 ont été déportés et 449 ont été exterminés. Les données prosopographiques sont utilisées ici pour répondre à une question simple : est-il possible de modéliser la persécution ? En d’autres termes, est-il possible de construire une représentation simplifiée mais heuristique des processus causaux complexes qui ont déterminé les chances de survie face à la persécution nazie à partir de données standardisées sur un nombre relativement important d’individus ? L’article discute les apports et les limites d’une succession de méthodes quantifiées : celles qui s’inscrivent dans ce qu’Andrew Abbott appelle le « programme standard » des sciences sociales, ainsi que l’analyse des réseaux et l’analyse séquentielle. Pour chacune d’entre elles, sont plus particulièrement discutées les manières de rendre compte des interactions entre les individus, de l’historicité des comportements et des processus déterminant ces chances de survie. Les tentatives de modélisation à partir de données historiennes apportent ainsi de véritables renouvellements de connaissances, notamment lorsqu’elles sont menées de manière cumulative sur une même enquête. En passant d’une logique de propriétés individuelles à une logique de trajectoires interconnectées, ces approches permettent de mieux comprendre les interactions sociales et locales, et offrent ainsi des perspectives stimulantes pour la microhistoire de l’Holocauste.

  • Restricted
    Authors: 
    Ghazal Faraj; András Micsik;
    Publisher: Springer International Publishing
    Project: EC | COURAGE (692919)

    Creating links manually between large datasets becomes an extremely tedious task. Although the linked data production is growing massively, the interconnecting needs improvement. This paper presents our work regarding detecting and extending links between Wikidata and COURAGE entities with respect to cultural heritage data. The COURAGE project explored the methods for cultural opposition in the socialist era (cc. 1950–1990), highlighting the variety of alternative cultural scenes that flourished in Eastern Europe before 1989. We describe our methods and results in discovering common entities in the two datasets, and our solution for automating this task. Furthermore, it is shown how it was possible to enrich the data in Wikidata and to establish new, bi-directional connections between COURAGE and Wikidata. Hence, the audience of both databases will have a more complete view of the matched entities.

  • Open Access English
    Authors: 
    Kun Sun; Haitao Liu; Wenxin Xiong;
    Project: EC | WIDE (742545)

    AbstractScientific writings, as one essential part of human culture, have evolved over centuries into their current form. Knowing how scientific writings evolved is particularly helpful in understanding how trends in scientific culture developed. It also allows us to better understand how scientific culture was interwoven with human culture generally. The availability of massive digitized texts and the progress in computational technologies today provide us with a convenient and credible way to discern the evolutionary patterns in scientific writings by examining the diachronic linguistic changes. The linguistic changes in scientific writings reflect the genre shifts that took place with historical changes in science and scientific writings. This study investigates a general evolutionary linguistic pattern in scientific writings. It does so by merging two credible computational methods: relative entropy; word-embedding concreteness and imageability. It thus creates a novel quantitative methodology and applies this to the examination of diachronic changes in the Philosophical Transactions of Royal Society (PTRS, 1665–1869). The data from two computational approaches can be well mapped to support the argument that this journal followed the evolutionary trend of increasing professionalization and specialization. But it also shows that language use in this journal was greatly influenced by historical events and other socio-cultural factors. This study, as a “culturomic” approach, demonstrates that the linguistic evolutionary patterns in scientific discourse have been interrupted by external factors even though this scientific discourse would likely have cumulatively developed into a professional and specialized genre. The approaches proposed by this study can make a great contribution to full-text analysis in scientometrics.

  • Open Access English
    Authors: 
    Laurent Romary; Charles Riondet;
    Publisher: HAL CCSD
    Country: France
    Project: EC | EHRI (654164), EC | PARTHENOS (654119), EC | EHRI (261873)

    This article tackles the issue of integrating heterogeneous archival sources in one single data repository, namely the European Holocaust Research Infrastructure (EHRI) portal, whose aim is to support Holocaust research by providing online access to information about dispersed sources relating to the Holocaust (http://portal.ehri-project.eu). In this case, the problem at hand is to combine data coming from a network of archives in order to create an interoperable data space which can be used to search for, retrieve and disseminate content in the context of archival-based research. The scholarly purpose has specific consequences on our task. It assumes that the information made available to the researcher is as close as possible to the originating source in order to guarantee that the ensuing analysis can be deemed reliable. In the EHRI network of archives, as already observed in the case of the EU Cendari project, one cannot but face heterogeneity. The EHRI portal brings together descriptions from more than 1900 institutions. Each archive comes with a whole range of idiosyncrasies corresponding to the way it has been set up and evolved over time. Cataloging practices may also differ. Even the degree of digitization may range from the absence of a digital catalogue to the provision of a full-fledged online catalogue with all the necessary APIs for anyone to query and extract content. There is indeed a contrast here with the global endeavour at the international level to develop and promote standards for the description of archival content as a whole. Nonetheless, in a project like EHRI, standards should play a central role. They are necessary for many tasks related to the integration and exploitation of the aggregated content, namely: ● Being able to compare the content of the various sources, thus being able to develop quality-checking processes; ● Defining of an integrated repository infrastructure where the content of the various archival sources can be reliably hosted; ● Querying and re-using content in a seamless way; ● Deploying tools that have been developed independently of the specificities of the information sources, for instance in order to visualise or mine the resulting pool of information. The central aspect of the work described in this paper is the assessment of the role of the EAD (Encoded Archival Description) standard as the basis for achieving the tasks described above. We have worked out how we could develop a real strategy of defining specific customization of EAD that could be used at various stages of the process of integrating heterogeneous sources. While doing so, we have developed a methodology based on a specification and customization method inspired from the extensive experience of the Text Encoding Initiative (TEI) community. In the TEI framework, as we show in section 1, one has the possibility to model specific subsets or extensions of the TEI guidelines while maintaining both the technical (XML schemas) and editorial (documentation) content within a single framework. This work has led us quite far in anticipating that the method we have developed may be of a wider interest within similar environments, but also, as we believe, for the future maintenance of the EAD standard. Finally this work, successfully tested and implemented in the framework of EHRI [Riondet 2017], can be seen as part of the wider endeavour of European research infrastructures in the humanities such as CLARIN and DARIAH to provide support for researchers to integrate the use of standards in their scholarly practices. This is the reason why the general workflow studied here has been introduced as a use case in the umbrella infrastructure project PARTHENOS which aims, among other things, at disseminating information and resources about methodological and technical standards in the humanities.

  • Publication . Conference object . Contribution for newspaper or weekly magazine . 2020
    Open Access
    Authors: 
    Jeff Mitchell; Jeffrey S. Bowers;
    Publisher: International Committee on Computational Linguistics
    Country: United Kingdom
    Project: EC | M and M (741134)

    Recently, domain-general recurrent neural networks, without explicit linguistic inductive biases, have been shown to successfully reproduce a range of human language behaviours, such as accurately predicting number agreement between nouns and verbs. We show that such networks will also learn number agreement within unnatural sentence structures, i.e. structures that are not found within any natural languages and which humans struggle to process. These results suggest that the models are learning from their input in a manner that is substantially different from human language acquisition, and we undertake an analysis of how the learned knowledge is stored in the weights of the network. We find that while the model has an effective understanding of singular versus plural for individual sentences, there is a lack of a unified concept of number agreement connecting these processes across the full range of inputs. Moreover, the weights handling natural and unnatural structures overlap substantially, in a way that underlines the non-human-like nature of the knowledge learned by the network.

  • Publication . Conference object . Article . Preprint . 2018 . Embargo End Date: 01 Jan 2018
    Open Access
    Authors: 
    Hardy Hardy; Andreas Vlachos;
    Publisher: arXiv
    Country: United Kingdom
    Project: EC | SUMMA (688139)

    Recent work on abstractive summarization has made progress with neural encoder-decoder architectures. However, such models are often challenged due to their lack of explicit semantic modeling of the source document and its summary. In this paper, we extend previous work on abstractive summarization using Abstract Meaning Representation (AMR) with a neural language generation stage which we guide using the source document. We demonstrate that this guidance improves summarization results by 7.4 and 10.5 points in ROUGE-2 using gold standard AMR parses and parses obtained from an off-the-shelf parser respectively. We also find that the summarization performance using the latter is 2 ROUGE-2 points higher than that of a well-established neural encoder-decoder approach trained on a larger dataset. Code is available at \url{https://github.com/sheffieldnlp/AMR2Text-summ} Comment: Accepted in EMNLP 2018

Powered by OpenAIRE graph