- home
- Advanced Search
24 Research products, page 1 of 3
Loading
- Other research product . 2022Open Access EnglishAuthors:Kang, Taize;Kang, Taize;Publisher: Helsingin yliopistoCountry: Finland
Story generation is an artificial intelligence task in which a computer program is used to create literature or stories. This kind of task usually involves giving an initial scene, characters, background information and goals, and then letting the computer program automatically generate a storyline and complete the narrative of the story. Transformers are widely used and achieved state of the art for many different natural language processing tasks, including story generation. With the help of attention mechanism, transforms can overcome overfittting and achieved great results. Generative Pre-trained Transformer (GPT) series are one of the best transformers, which attract many researchers. In this thesis, transformer models are used to design and implement a machine learning method for the generation of very short stories. By introducing a commonsense knowledge base and a rule generator based on it, the models can learn the relationships between context and generate coherent narratives. By given the first sentence of the story as the input, the model can complete the story. The model is based on GPT-2 model and COINS. The dataset used is a collection of short stories. By comparing with the generated results of different models in many aspects, we proved the effectiveness of the model. In addition, the compared results are analyzed to find the potential optimization methods.
- Other research product . 2022Open Access EnglishAuthors:Kuljukka, Tomi;Kuljukka, Tomi;Publisher: Helsingin yliopistoCountry: Finland
In this thesis, I attempt to answer the question of whether it is possible to reflect the modern psychological theory of culture of honour to the Late Copper/ Early Bronze Age period of the Eurasian steppe zone. Furthermore, how does this affect social structure and can archaeological evidence prove it. To study culture of honour, Indo-European sources and ethnographic research on mobile pastoralism are also examined. Through a sociocultural approach, this thesis strives to reconstruct the sociocultural background and changes originating from the Yamnaya. In this approach, theories from anthropology, ethnography, sociology, social psychology, and science of religion interact. Furthermore, sources associated with early Indo-European culture (e.g., social structure and mythology) are included. Essentially, this study aims to link the Yamnaya culture with the sociocultural theory of culture of honor. A focus of this thesis is the study of anthropomorphic stone stelae associated with the Yamnaya and adjacent cultures. The area where the stelae have been found consists of the modern countries of Romania, Bulgaria, Ukraine, Moldova, North Macedonia and Russia. Moreover, general knowledge about grave goods, burial rituals, osteological and genetic materials contribute to the overall reconstruction and interpretation process. A comprehensive outline of Yamnaya's ideological, social, and behavioral aspects is attempted through the use of comparative methodology. In order to accomplish this, the archaeological materials and their symbolic meaning are interpreted using the theoretical frameworks provided and compared to later Indo-European traditions and ethnographic studies on mobile pastoralism. Using theoretical frameworks and comparative method, this thesis demonstrates that the sociocultural theory of culture of honor can be reflected in archaeological materials. The reflections of sociocultural behaviour can be argued to be present in burial rituals, grave goods, osteological, genetic and most of all in the anthropomorphic stone stelae.
- Other research product . 2022Open Access EnglishAuthors:Zhang, Jialei;Zhang, Jialei;Publisher: Helsingin yliopistoCountry: Finland
This thesis reports of a case study exploring the linguistic landscapes of four churches in Helsinki and their official websites. The aim of the study was to explore multilingualism and the status of English in the linguistic landscapes of main tourist destinations in Helsinki. In particular, the study aimed to find out how different languages, particularly English, are used in the linguistic landscape of Finnish tourism, as well as the reasons for this. Simultaneously, it explored the differences between on-site and online linguistic landscapes to find out how they affect visitors’ experiences. The data consist of photographs collected on-site in the four churches and the different language versions of the church websites. The thesis analysed the linguistic landscapes (LL) and virtual linguistic landscapes (VLL) by categorizing the collected data as monolingual or multilingual signs, as well as the appearing language order and the materiality of signs as temporary or permanent. The findings revealed that numerous languages were used in the LL and VLL, but Finnish remained the dominant language, with English, Swedish, and Russian coming in next. The number of languages on the websites was fewer than the number of languages on-site. A noteworthy discovery is that English was used more frequently than Swedish, even though Swedish is one of Finland’s national languages. English was also a common language in these churches since it was used in more temporary signs than permanent signs to transmit most of the current and up-to-date information to visitors. Based on the findings, the LL of the churches are mostly accessible to tourists, but that consistency of the signage could be thought out more thoroughly. With the increasing number of foreign tourists, more language versions of LL can be added, particularly English version, which is the world’s lingua franca. Some LL with grammatical and spelling mistakes can also be appropriately updated in order to provide visitors a better travel experience.
- Other research product . 2022Open Access EnglishAuthors:Bakhia, Alexander;Bakhia, Alexander;Publisher: Helsingin yliopistoCountry: Finland
The dispersal of hominin species out of Africa during the Early Pleistocene has been centred on the environmental conditions which the species dispersed into. Specifically, being polarised between those who reconstruct this dispersal as synchronous with the expansions of grasslands, placing hominins within this environmental niche. Conversely, new data suggests hominin species were a highly adaptable species, capable of occupying varied habitats and other non-grassland areas, including forests. In order to test these conflicting hypotheses, reconstruction of local paleoenvironments at major hominin sites is required. The Early Pleistocene site of Dmanisi is an excellent case study. As one of the major hominin sites, paleoenvironmental reconstructions of Dmanisi are important to understand its role in the hominin dispersal events. By understanding the environment hominins moved into, we can better understand the ecological factors behind their dispersal. The aim of this study is to deduce the paleodiets and the mean body mass of the herbivore community in Dmanisi to assess the biome they lived in. For dietary analyses I used mesowear, and for estimating body mass from skeletal measures I used regression equations. The mesowear scores were used in a cluster analysis where the fossil taxa were clustered with modern taxa, whose diet is well-known. The results indicated the dietary categories of the fossil taxa. A mean body mass value of all ungulates was calculated for Dmanisi, which was then put into a data sheet with mean body mass values from other localities. Relative body mass indices were calculated for each locality by dividing their mean body mass value with the largest mean body mass value in the set. These values were then correlated with locality-specific net primary productivity values, to see how the mean body mass value in Dmanisi performed against its net primary productivity, and against other localities. The herbivore community mainly consisted of mixed feeders and browsers, with minimal evidence for grazing in the mesowear analysis. Ungulates were comparatively small sized in Dmanisi, implying that the low net primary productivity, interspecific and intraspecific competition lead to lack of resource availability in the area, which in turn caused the relatively small ungulate community. The results show that Early Pleistocene hominins occupied a relatively diverse, ecotone habitat in Dmanisi, consisting of shrublands and woodlands. This suggests that the Dmanisi was not as grassy as previously assumed, and the spread of grassland environments probably was not the catalyst for the hominin dispersal.
- Other research product . 2022Open Access EnglishAuthors:Kivisaari, Kiira;Kivisaari, Kiira;Publisher: Helsingin yliopistoCountry: Finland
Tiivistelmä - Referat - Abstract Ekologinen kestävyys ja ympäristövastuullinen johtaminen ovat nousseet puheenaiheiksi Suomen nykytaidekentän keskuudessa. Nykytaidetapahtuma Helsinki Biennaali nosti ekologisen kestävyyden yhdeksi kulmakiveksi tapahtuma tuotannossaan. Tutkimuksen aineisto on kerätty autoetnografisin sekä laadullisen sisällön analyysin kautta. Tässä Pro Gradu työssä arvioin Helsinki Biennaalin ympäristöjohtamisen onnistumista sekä sitä, miten Helsinki Biennaalin tekemät ympäristötoimet näyttäytyvät suhteessa Suomen nykytaidekenttään laajemmin. Tutkimukseni pyrkii vastaamaan kolmeen kysymykseen: 1. Mitä ympäristötoimia Helsinki Biennaali teki ja miten kyseiset toimet valittiin? 2. Miten tehdyt toimet vertautuvat Suomen nykytaidekentän odotuksiin sekä 3. Mitä odotuksia ja huolia Helsinki Biennaali sekä muut asiantuntijat tunnistavat suhteessa ekologisen kestävyyden tuomiseksi osaksi nykytaidekenttää entistä vahvemmin? Tulokseni osittavat, että Helsinki Biennaali ympäristötoimet vastaavat hyvin nykytaidekentän odotuksia. Yleisemmin kentällä on paljon potentiaalia kestävyys-siirtymän toteutumiseen, tätä potentiaalia voitaisiin vapauttaa esimerkiksi uudenlaisten rahoitussäädösten, matkustussääntöjen avulla sekä lisäämällä työntekijöiden tietoa ja aikaresursseja ympäristötyöhön. Tämä tutkimus selvitti myös nykytaidekentän näkemyksiä tulevaisuuden tavoitteista ja näkemyksistä ympäristötoimien suhteen. Environmental sustainability and environmentally responsible management have found their way as discussion topics in the Finnish contemporary art field. The contemporary art event Helsinki Biennial chose to make environmental sustainability as one of their corner stones of producing the event (Taskinen et al 2021). In this master’s thesis I assess the environmental responsibility of Helsinki Biennial and how its environmental actions reflect on the expectations of the contemporary art field in Finland from a managerial perspective. The research was conducted through autoethnographic and content analysis methods. This research will try to answer three questions 1. What environmental actions were taken and how were they selected? (R1) 2. How do the environmental action taken compare to the current expectations of environmentally responsible management in the Finnish contemporary art field? (R2) I am also interested in finding out 3. What types of hopes and concerns about future work are identified by Helsinki Biennial organizers and other experts in the Finnish contemporary art field? (R3) The results show that Helsinki Biennial compared well on the expectations of the field. Based on the answers of the interviewees, there is a lot of potential and moti-vation within the field which waits to be unleashed, for example through new financ-ing and travel policies and by increasing the knowledge and time resource of em-ployees. This study also found out about the future visions regarding environmental matters within the contemporary art field.
- Other research product . 2022Open Access EnglishAuthors:Saari, Nelli-Johanna;Saari, Nelli-Johanna;Publisher: Helsingin yliopistoCountry: Finland
Ancient DNA research has become a widely applied method to examine past communities. The acidic soil in Finland has previously complicated archaeogenetic research but advances in the field have opened up new opportunities in recent years. This thesis integrates genetic data and archival research to examine the genetic ancestry and social organisation of Early Medieval communities in Southwest Finland. In this era, the coastal region of Southwest Finland experienced diverse societal changes induced by trade networks, urbanisation and conversion to Christianity. These shifts can be observed in large inhumation burial grounds from the Crusade Period (1025/1050–1150/1200 CE). While the burial grounds contain well-preserved skeletal material, no prior ancient DNA investigations have been undertaken in the area. A total of 30 individuals from three burial grounds, Tuomala, Kansakoulunmäki and Humikkala, were studied. The sites are located in the historical Raisio and Masku parishes close to Raisio river valley. The burial context was reconstructed with archival research, and the genetic data was extracted from skeletal samples. The resulting genome-wide data for 8 individuals from the Kansakoulunmäki burial ground was studied with exploratory population genetic and kinship analyses. The archival research produced a detailed burial context for 14 individuals from Kansakoulunmäki in Raisio and 15 individuals from Humikkala in Masku. The genetic results concluded good molecular preservation at Kansakoulunmäki and poor at Tuomala and Humikkala. The Kansakoulunmäki and Humikkala individuals bore traces of diverse connections to the Baltic Sea region. The integrated results revealed possible evidence of patrilocality and potential female genetic connections to Sigtuna in Central Sweden. These findings may point towards female mobility or exogamous marriage patterns between the two regions. Kinship relations were also detected. The possible sibling relationship could indicate an Early Medieval burial practice where close kin was buried together. The Kansakoulunmäki individuals appeared local and displayed genetic continuity with present-day Finns. This thesis adds to an emerging body of research on the ancient genetic compositions and social practices in coastal Southwest Finland in a period of transformation. The results underline the potential of interdisciplinary strategies combining genetic and archival research, as well as possibilities in the investigation of larger inhumation burial grounds. The study contributes to diverse lines of research with new data and interpretations about the Early Medieval communities, suggesting potential for further analyses both in Finland and across the Baltic Sea region.
- Other research product . 2022Open Access EnglishAuthors:Leppämäki, Tatu;Leppämäki, Tatu;Publisher: Helsingin yliopistoCountry: Finland
Alati enemmän aineistoa tuotetaan ja jaetaan internetin kautta. Aineistot ovat vaihtelevia muodoiltaan, kuten verkkoartikkelien ja sosiaalisen media julkaisujen kaltaiset digitaaliset tekstit, ja niillä on usein spatiaalinen ulottuvuus. Teksteissä geospatiaalisuutta ilmaistaan paikannimien kautta, mutta tavanomaisilla paikkatietomenetelmillä ei kyetä käsittelemään tietoa epätäsmällisessä kielellisessä asussaan. Tämä on luonut tarpeen muuntaa tekstimuotoisen sijaintitiedon näkyvään muotoon, koordinaateiksi. Ongelmaa ratkaisemaan on kehitetty geojäsentimiä, jotka tunnistavat ja paikantavat paikannimet vapaista teksteistä, ja jotka oikein toimiessaan voisivat toimia paikkatiedon lähteenä maantieteellisessä tutkimuksessa. Geojäsentämistä onkin sovellettu katastrofihallinnasta kirjallisuudentutkimukseen. Merkittävässä osassa geojäsentämisen tutkimusta tutkimusaineiston kielenä on ollut englanti ja geojäsentimetkin ovat kielikohtaisia – tämä jättää pimentoon paitsi geojäsentimien kehitykseen vaikuttavat havainnot pienemmistä kielistä myös kyseisten kielten puhujien näkemykset. Maisterintutkielmassani pyrin vastaamaan kolmeen tutkimuskysymykseen: Mitkä ovat edistyneimmät geojäsentämismenetelmät? Mitkä kielelliset ja maantieteelliset monitulkintaisuudet vaikeuttavat tämän monitahoisen ongelman ratkaisua? Ja miten arvioida geojäsentimien luotettavuutta ja käytettävyyttä? Tutkielman soveltavassa osuudessa esittelen Fingerin, geojäsentimen suomen kielelle, ja kuvaan sen kehitystä sekä suorituskyvyn arviointia. Arviointia varten loin kaksi testiaineistoa, joista toinen koostuu Twitter-julkaisuista ja toinen uutisartikkeleista. Finger-geojäsennin, testiaineistot ja relevantit ohjelmakoodit jaetaan avoimesti. Geojäsentäminen voidaan jakaa kahteen alitehtävään: paikannimien tunnistamiseen tekstivirrasta ja paikannimien ratkaisemiseen oikeaan koordinaattipisteeseen mahdollisesti useasta kandidaatista. Molemmissa vaiheissa uusimmat metodit nojaavat syväoppimismalleihin ja -menetelmiin, joiden syötteinä ovat sanaupotusten kaltaiset vektorit. Geojäsentimien suoriutumista testataan aineistoilla, joissa paikannimet ja niiden koordinaatit tiedetään. Mittatikkuna tunnistamisessa on vastaavuus ja ratkaisemisessa etäisyys oikeasta sijainnista. Finger käyttää paikannimitunnistinta, joka hyödyntää suomenkielistä BERT-kielimallia, ja suoraviivaista tietokantahakua paikannimien ratkaisemiseen. Ohjelmisto tuottaa taulukkomuotoiseksi jäsenneltyä paikkatietoa, joka sisältää syötetekstit ja niistä mahdollisesti tunnistetut paikannimet koordinaattisijainteineen. Testiaineistot eroavat aihepiireiltään, mutta Finger suoriutuu niillä likipitäen samoin, ja suoriutuu englanninkielisillä aineistoilla tehtyihin arviointeihin suhteutettuna kelvollisesti. Virheanalyysi paljastaa useita virhelähteitä, jotka johtuvat kielten tai maantieteellisen todellisuuden luontaisesta epäselvyydestä tai ovat prosessoinnin aiheuttamia, kuten perusmuotoistamisvirheet. Kaikkia osia Fingerissä voidaan parantaa, muun muassa kehittämällä kielellistä käsittelyä pidemmälle ja luomalla kattavampia testiaineistoja. Samoin tulevaisuuden geojäsentimien tulee kyetä käsittelemään monimutkaisempia kielellisiä ja maantieteellisiä kuvaustapoja kuin pelkät paikannimet ja koordinaattipisteet. Finger ei nykymuodossaan tuota valmista paikkatietoa, jota kannattaisi kritiikittä käyttää. Se on kuitenkin lupaava ensiaskel suomen kielen geojäsentimille ja astinlauta vastaisuuden soveltavalle tutkimukselle. Ever more data is available and shared through the internet. The big data masses often have a spatial dimension and can take many forms, one of which are digital texts, such as articles or social media posts. The geospatial links in these texts are made through place names, also called toponyms, but traditional GIS methods are unable to deal with the fuzzy linguistic information. This creates the need to transform the linguistic location information to an explicit coordinate form. Several geoparsers have been developed to recognize and locate toponyms in free-form texts: the task of these systems is to be a reliable source of location information. Geoparsers have been applied to topics ranging from disaster management to literary studies. Major language of study in geoparser research has been English and geoparsers tend to be language-specific, which threatens to leave the experiences provided by studying and expressed in smaller languages unexplored. This thesis seeks to answer three research questions related to geoparsing: What are the most advanced geoparsing methods? What linguistic and geographical features complicate this multi-faceted problem? And how to evaluate the reliability and usability of geoparsers? The major contributions of this work are an open-source geoparser for Finnish texts, Finger, and two test datasets, or corpora, for testing Finnish geoparsers. One of the datasets consists of tweets and the other of news articles. All of these resources, including the relevant code for acquiring the test data and evaluating the geoparser, are shared openly. Geoparsing can be divided into two sub-tasks: recognizing toponyms amid text flows and resolving them to the correct coordinate location. Both tasks have seen a recent turn to deep learning methods and models, where the input texts are encoded as, for example, word embeddings. Geoparsers are evaluated against gold standard datasets where toponyms and their coordinates are marked. Performance is measured on equivalence and distance-based metrics for toponym recognition and resolution respectively. Finger uses a toponym recognition classifier built on a Finnish BERT model and a simple gazetteer query to resolve the toponyms to coordinate points. The program outputs structured geodata, with input texts and the recognized toponyms and coordinate locations. While the datasets represent different text types in terms of formality and topics, there is little difference in performance when evaluating Finger against them. The overall performance is comparable to the performance of geoparsers of English texts. Error analysis reveals multiple error sources, caused either by the inherent ambiguousness of the studied language and the geographical world or are caused by the processing itself, for example by the lemmatizer. Finger can be improved in multiple ways, such as refining how it analyzes texts and creating more comprehensive evaluation datasets. Similarly, the geoparsing task should move towards more complex linguistic and geographical descriptions than just toponyms and coordinate points. Finger is not, in its current state, a ready source of geodata. However, the system has potential to be the first step for geoparsers for Finnish and it can be a steppingstone for future applied research.
- Other research product . 2022Open Access EnglishAuthors:Koppatz, Maximilian;Koppatz, Maximilian;Publisher: Helsingin yliopistoCountry: Finland
Automatic headline generation has the potential to significantly assist editors charged with head- lining articles. Approaches to automation in the headlining process can range from tools as creative aids, to complete end to end automation. The latter is difficult to achieve as journalistic require- ments imposed on headlines must be met with little room for error, with the requirements depending on the news brand in question. This thesis investigates automatic headline generation in the context of the Finnish newsroom. The primary question I seek to answer is how well the current state of text generation using deep neural language models can be applied to the headlining process in Finnish news media. To answer this, I have implemented and pre-trained a Finnish generative language model based on the Transformer architecture. I have fine-tuned this language model for headline generation as autoregression of headlines conditioned on the article text. I have designed and implemented a variation of the Diverse Beam Search algorithm, with additional parameters, to perform the headline generation in order to generate a diverse set of headlines for a given text. The evaluation of the generative capabilities of this system was done with real world usage in mind. I asked domain-experts in headlining to evaluate a generated set of text-headline pairs. The task was to accept or reject the individual headlines in key criteria. The responses of this survey were then quantitatively and qualitatively analyzed. Based on the analysis and feedback, this model can already be useful as a creative aid in the newsroom despite being far from ready for automation. I have identified concrete improvement directions based on the most common types of errors, and this provides interesting future work.
- Other research product . 2021Open Access EnglishAuthors:Kjellman, Martin;Kjellman, Martin;Publisher: Helsingin yliopistoCountry: Finland
This thesis examines how representatives of service providers for news automation perceive a) journalists and news organisations and b) the service providers’ relationship to these. By introducing new technology (natural language generation, i.e. the transformation of data into everyday language) that influences both the production and business models of news media, news automation represents a type of media innovation. The service providers represent actors peripheral to journalism. The theoretical framework takes hybrid media logics as its starting point, meaning that the power dynamics of news production are thought to be influenced by the field-specific logics of the actors involved. The hybridity metaphor is deepened by using a typology for journalistic strangers that takes into account the different roles peripheral actors adopt in relation to journalists and news organisations. Journalism is understood throughout as a professional ideology encountered by service providers who work with news organisations. Semi-structured interviews were conducted with representatives from companies that create natural language generation software used to produce journalistic text based on data. Participants were asked about their experiences working with news media and the interviews (N=6) were analysed phenomenologically. The findings form three distinct but interrelated dimensions of how the service providers perceive news media and journalism: an area that sorely needs innovators (potential) but lacks resources in terms of knowledge, money and will to innovate (obstacles), but one that they can ultimately learn from and collaborate with (solutions). Their own relationship to journalism and news media is not fixed to one single role. Instead, they alternate between challenging news media (explicit interloping) and inhabiting a supportive role (implicit interloping). This thesis serves as an exploration into how service providers for news automation affect the power dynamics of news production. It does so by unveiling how journalists and news organisations are perceived, and by adding further understanding to previous research on actors peripheral to journalism. In order to further untangle how service providers for news automation shift the balance of power shaping news production, future research should attempt to unify the way traditional news media actors and service providers perceive each other and their collaborations.
- Other research product . 2021Open Access EnglishAuthors:Moisio, Mikko;Moisio, Mikko;Publisher: Helsingin yliopistoCountry: Finland
Semantic textual similarity (STS), the procedure of determining how similar pieces of text are in terms of their meaning, is an important problem in the rapidly evolving field of natural language processing (NLP). STS accelerates major information retrieval applications dealing with natural language text, such as web search engines. For computational efficiency reasons, text pieces are often encoded into semantically meaningful real-valued vectors, sentence embeddings, that can be compared with similarity metrics. Majority of recent NLP research has focused on a small set of largest Indo-European languages and Chinese. Although much of the research is machine learning oriented and is thus often applicable across languages, languages with lesser speaker population, such as Finnish, often lack annotated data required to train, or even evaluate, complex models. BERT, a language representation framework building on transfer learning, is one of the recent quantum leaps in NLP research. BERT-type models take advantage of unsupervised pre-training reducing annotated data demands for supervised tasks. Furthermore, a BERT modification called Sentence-BERT enables us to extend and train BERT-type models to derive semantically meaningful sentence embeddings. However, yet the annotated data demands for conventional training of a Sentence-BERT is relatively low, often such data is unavailable for low-resourced languages. Multilingual knowledge distillation has been shown to be a working strategy for extending mono- lingual Sentence-BERT models to new languages. This technique allows transferring and merging desired properties of two language models, and, instead of annotated data, consumes bilingual parallel samples. In this thesis we study using knowledge distillation to transfer STS properties learnt from English into a model pre-trained on Finnish while bypassing the lack of annotated Finnish data. Further, we experiment distillation with different types of data, English-Finnish bilingual, English monolingual and random pseudo samples, to observe which properties of training data are really necessary. We acquire a bilingual English-Finnish test dataset by translating an existing annotated English dataset and use this set to evaluate the fit of our resulting models. We evaluate the performance of the models in different tasks, English, Finnish and English-Finnish cross-lingual STS, to observe how well the properties being transferred are captured, and how well the models retain the desired properties they already have. We find that knowledge distillation is indeed a feasible approach for obtaining a relatively high quality Sentence-BERT for Finnish. Surprisingly, in all setups large portion of desired properties are transferred to the Finnish model, and, training with English-Finnish bilingual data yields best Finnish sentence embedding model we are aware of.
24 Research products, page 1 of 3
Loading
- Other research product . 2022Open Access EnglishAuthors:Kang, Taize;Kang, Taize;Publisher: Helsingin yliopistoCountry: Finland
Story generation is an artificial intelligence task in which a computer program is used to create literature or stories. This kind of task usually involves giving an initial scene, characters, background information and goals, and then letting the computer program automatically generate a storyline and complete the narrative of the story. Transformers are widely used and achieved state of the art for many different natural language processing tasks, including story generation. With the help of attention mechanism, transforms can overcome overfittting and achieved great results. Generative Pre-trained Transformer (GPT) series are one of the best transformers, which attract many researchers. In this thesis, transformer models are used to design and implement a machine learning method for the generation of very short stories. By introducing a commonsense knowledge base and a rule generator based on it, the models can learn the relationships between context and generate coherent narratives. By given the first sentence of the story as the input, the model can complete the story. The model is based on GPT-2 model and COINS. The dataset used is a collection of short stories. By comparing with the generated results of different models in many aspects, we proved the effectiveness of the model. In addition, the compared results are analyzed to find the potential optimization methods.
- Other research product . 2022Open Access EnglishAuthors:Kuljukka, Tomi;Kuljukka, Tomi;Publisher: Helsingin yliopistoCountry: Finland
In this thesis, I attempt to answer the question of whether it is possible to reflect the modern psychological theory of culture of honour to the Late Copper/ Early Bronze Age period of the Eurasian steppe zone. Furthermore, how does this affect social structure and can archaeological evidence prove it. To study culture of honour, Indo-European sources and ethnographic research on mobile pastoralism are also examined. Through a sociocultural approach, this thesis strives to reconstruct the sociocultural background and changes originating from the Yamnaya. In this approach, theories from anthropology, ethnography, sociology, social psychology, and science of religion interact. Furthermore, sources associated with early Indo-European culture (e.g., social structure and mythology) are included. Essentially, this study aims to link the Yamnaya culture with the sociocultural theory of culture of honor. A focus of this thesis is the study of anthropomorphic stone stelae associated with the Yamnaya and adjacent cultures. The area where the stelae have been found consists of the modern countries of Romania, Bulgaria, Ukraine, Moldova, North Macedonia and Russia. Moreover, general knowledge about grave goods, burial rituals, osteological and genetic materials contribute to the overall reconstruction and interpretation process. A comprehensive outline of Yamnaya's ideological, social, and behavioral aspects is attempted through the use of comparative methodology. In order to accomplish this, the archaeological materials and their symbolic meaning are interpreted using the theoretical frameworks provided and compared to later Indo-European traditions and ethnographic studies on mobile pastoralism. Using theoretical frameworks and comparative method, this thesis demonstrates that the sociocultural theory of culture of honor can be reflected in archaeological materials. The reflections of sociocultural behaviour can be argued to be present in burial rituals, grave goods, osteological, genetic and most of all in the anthropomorphic stone stelae.
- Other research product . 2022Open Access EnglishAuthors:Zhang, Jialei;Zhang, Jialei;Publisher: Helsingin yliopistoCountry: Finland
This thesis reports of a case study exploring the linguistic landscapes of four churches in Helsinki and their official websites. The aim of the study was to explore multilingualism and the status of English in the linguistic landscapes of main tourist destinations in Helsinki. In particular, the study aimed to find out how different languages, particularly English, are used in the linguistic landscape of Finnish tourism, as well as the reasons for this. Simultaneously, it explored the differences between on-site and online linguistic landscapes to find out how they affect visitors’ experiences. The data consist of photographs collected on-site in the four churches and the different language versions of the church websites. The thesis analysed the linguistic landscapes (LL) and virtual linguistic landscapes (VLL) by categorizing the collected data as monolingual or multilingual signs, as well as the appearing language order and the materiality of signs as temporary or permanent. The findings revealed that numerous languages were used in the LL and VLL, but Finnish remained the dominant language, with English, Swedish, and Russian coming in next. The number of languages on the websites was fewer than the number of languages on-site. A noteworthy discovery is that English was used more frequently than Swedish, even though Swedish is one of Finland’s national languages. English was also a common language in these churches since it was used in more temporary signs than permanent signs to transmit most of the current and up-to-date information to visitors. Based on the findings, the LL of the churches are mostly accessible to tourists, but that consistency of the signage could be thought out more thoroughly. With the increasing number of foreign tourists, more language versions of LL can be added, particularly English version, which is the world’s lingua franca. Some LL with grammatical and spelling mistakes can also be appropriately updated in order to provide visitors a better travel experience.
- Other research product . 2022Open Access EnglishAuthors:Bakhia, Alexander;Bakhia, Alexander;Publisher: Helsingin yliopistoCountry: Finland
The dispersal of hominin species out of Africa during the Early Pleistocene has been centred on the environmental conditions which the species dispersed into. Specifically, being polarised between those who reconstruct this dispersal as synchronous with the expansions of grasslands, placing hominins within this environmental niche. Conversely, new data suggests hominin species were a highly adaptable species, capable of occupying varied habitats and other non-grassland areas, including forests. In order to test these conflicting hypotheses, reconstruction of local paleoenvironments at major hominin sites is required. The Early Pleistocene site of Dmanisi is an excellent case study. As one of the major hominin sites, paleoenvironmental reconstructions of Dmanisi are important to understand its role in the hominin dispersal events. By understanding the environment hominins moved into, we can better understand the ecological factors behind their dispersal. The aim of this study is to deduce the paleodiets and the mean body mass of the herbivore community in Dmanisi to assess the biome they lived in. For dietary analyses I used mesowear, and for estimating body mass from skeletal measures I used regression equations. The mesowear scores were used in a cluster analysis where the fossil taxa were clustered with modern taxa, whose diet is well-known. The results indicated the dietary categories of the fossil taxa. A mean body mass value of all ungulates was calculated for Dmanisi, which was then put into a data sheet with mean body mass values from other localities. Relative body mass indices were calculated for each locality by dividing their mean body mass value with the largest mean body mass value in the set. These values were then correlated with locality-specific net primary productivity values, to see how the mean body mass value in Dmanisi performed against its net primary productivity, and against other localities. The herbivore community mainly consisted of mixed feeders and browsers, with minimal evidence for grazing in the mesowear analysis. Ungulates were comparatively small sized in Dmanisi, implying that the low net primary productivity, interspecific and intraspecific competition lead to lack of resource availability in the area, which in turn caused the relatively small ungulate community. The results show that Early Pleistocene hominins occupied a relatively diverse, ecotone habitat in Dmanisi, consisting of shrublands and woodlands. This suggests that the Dmanisi was not as grassy as previously assumed, and the spread of grassland environments probably was not the catalyst for the hominin dispersal.
- Other research product . 2022Open Access EnglishAuthors:Kivisaari, Kiira;Kivisaari, Kiira;Publisher: Helsingin yliopistoCountry: Finland
Tiivistelmä - Referat - Abstract Ekologinen kestävyys ja ympäristövastuullinen johtaminen ovat nousseet puheenaiheiksi Suomen nykytaidekentän keskuudessa. Nykytaidetapahtuma Helsinki Biennaali nosti ekologisen kestävyyden yhdeksi kulmakiveksi tapahtuma tuotannossaan. Tutkimuksen aineisto on kerätty autoetnografisin sekä laadullisen sisällön analyysin kautta. Tässä Pro Gradu työssä arvioin Helsinki Biennaalin ympäristöjohtamisen onnistumista sekä sitä, miten Helsinki Biennaalin tekemät ympäristötoimet näyttäytyvät suhteessa Suomen nykytaidekenttään laajemmin. Tutkimukseni pyrkii vastaamaan kolmeen kysymykseen: 1. Mitä ympäristötoimia Helsinki Biennaali teki ja miten kyseiset toimet valittiin? 2. Miten tehdyt toimet vertautuvat Suomen nykytaidekentän odotuksiin sekä 3. Mitä odotuksia ja huolia Helsinki Biennaali sekä muut asiantuntijat tunnistavat suhteessa ekologisen kestävyyden tuomiseksi osaksi nykytaidekenttää entistä vahvemmin? Tulokseni osittavat, että Helsinki Biennaali ympäristötoimet vastaavat hyvin nykytaidekentän odotuksia. Yleisemmin kentällä on paljon potentiaalia kestävyys-siirtymän toteutumiseen, tätä potentiaalia voitaisiin vapauttaa esimerkiksi uudenlaisten rahoitussäädösten, matkustussääntöjen avulla sekä lisäämällä työntekijöiden tietoa ja aikaresursseja ympäristötyöhön. Tämä tutkimus selvitti myös nykytaidekentän näkemyksiä tulevaisuuden tavoitteista ja näkemyksistä ympäristötoimien suhteen. Environmental sustainability and environmentally responsible management have found their way as discussion topics in the Finnish contemporary art field. The contemporary art event Helsinki Biennial chose to make environmental sustainability as one of their corner stones of producing the event (Taskinen et al 2021). In this master’s thesis I assess the environmental responsibility of Helsinki Biennial and how its environmental actions reflect on the expectations of the contemporary art field in Finland from a managerial perspective. The research was conducted through autoethnographic and content analysis methods. This research will try to answer three questions 1. What environmental actions were taken and how were they selected? (R1) 2. How do the environmental action taken compare to the current expectations of environmentally responsible management in the Finnish contemporary art field? (R2) I am also interested in finding out 3. What types of hopes and concerns about future work are identified by Helsinki Biennial organizers and other experts in the Finnish contemporary art field? (R3) The results show that Helsinki Biennial compared well on the expectations of the field. Based on the answers of the interviewees, there is a lot of potential and moti-vation within the field which waits to be unleashed, for example through new financ-ing and travel policies and by increasing the knowledge and time resource of em-ployees. This study also found out about the future visions regarding environmental matters within the contemporary art field.
- Other research product . 2022Open Access EnglishAuthors:Saari, Nelli-Johanna;Saari, Nelli-Johanna;Publisher: Helsingin yliopistoCountry: Finland
Ancient DNA research has become a widely applied method to examine past communities. The acidic soil in Finland has previously complicated archaeogenetic research but advances in the field have opened up new opportunities in recent years. This thesis integrates genetic data and archival research to examine the genetic ancestry and social organisation of Early Medieval communities in Southwest Finland. In this era, the coastal region of Southwest Finland experienced diverse societal changes induced by trade networks, urbanisation and conversion to Christianity. These shifts can be observed in large inhumation burial grounds from the Crusade Period (1025/1050–1150/1200 CE). While the burial grounds contain well-preserved skeletal material, no prior ancient DNA investigations have been undertaken in the area. A total of 30 individuals from three burial grounds, Tuomala, Kansakoulunmäki and Humikkala, were studied. The sites are located in the historical Raisio and Masku parishes close to Raisio river valley. The burial context was reconstructed with archival research, and the genetic data was extracted from skeletal samples. The resulting genome-wide data for 8 individuals from the Kansakoulunmäki burial ground was studied with exploratory population genetic and kinship analyses. The archival research produced a detailed burial context for 14 individuals from Kansakoulunmäki in Raisio and 15 individuals from Humikkala in Masku. The genetic results concluded good molecular preservation at Kansakoulunmäki and poor at Tuomala and Humikkala. The Kansakoulunmäki and Humikkala individuals bore traces of diverse connections to the Baltic Sea region. The integrated results revealed possible evidence of patrilocality and potential female genetic connections to Sigtuna in Central Sweden. These findings may point towards female mobility or exogamous marriage patterns between the two regions. Kinship relations were also detected. The possible sibling relationship could indicate an Early Medieval burial practice where close kin was buried together. The Kansakoulunmäki individuals appeared local and displayed genetic continuity with present-day Finns. This thesis adds to an emerging body of research on the ancient genetic compositions and social practices in coastal Southwest Finland in a period of transformation. The results underline the potential of interdisciplinary strategies combining genetic and archival research, as well as possibilities in the investigation of larger inhumation burial grounds. The study contributes to diverse lines of research with new data and interpretations about the Early Medieval communities, suggesting potential for further analyses both in Finland and across the Baltic Sea region.
- Other research product . 2022Open Access EnglishAuthors:Leppämäki, Tatu;Leppämäki, Tatu;Publisher: Helsingin yliopistoCountry: Finland
Alati enemmän aineistoa tuotetaan ja jaetaan internetin kautta. Aineistot ovat vaihtelevia muodoiltaan, kuten verkkoartikkelien ja sosiaalisen media julkaisujen kaltaiset digitaaliset tekstit, ja niillä on usein spatiaalinen ulottuvuus. Teksteissä geospatiaalisuutta ilmaistaan paikannimien kautta, mutta tavanomaisilla paikkatietomenetelmillä ei kyetä käsittelemään tietoa epätäsmällisessä kielellisessä asussaan. Tämä on luonut tarpeen muuntaa tekstimuotoisen sijaintitiedon näkyvään muotoon, koordinaateiksi. Ongelmaa ratkaisemaan on kehitetty geojäsentimiä, jotka tunnistavat ja paikantavat paikannimet vapaista teksteistä, ja jotka oikein toimiessaan voisivat toimia paikkatiedon lähteenä maantieteellisessä tutkimuksessa. Geojäsentämistä onkin sovellettu katastrofihallinnasta kirjallisuudentutkimukseen. Merkittävässä osassa geojäsentämisen tutkimusta tutkimusaineiston kielenä on ollut englanti ja geojäsentimetkin ovat kielikohtaisia – tämä jättää pimentoon paitsi geojäsentimien kehitykseen vaikuttavat havainnot pienemmistä kielistä myös kyseisten kielten puhujien näkemykset. Maisterintutkielmassani pyrin vastaamaan kolmeen tutkimuskysymykseen: Mitkä ovat edistyneimmät geojäsentämismenetelmät? Mitkä kielelliset ja maantieteelliset monitulkintaisuudet vaikeuttavat tämän monitahoisen ongelman ratkaisua? Ja miten arvioida geojäsentimien luotettavuutta ja käytettävyyttä? Tutkielman soveltavassa osuudessa esittelen Fingerin, geojäsentimen suomen kielelle, ja kuvaan sen kehitystä sekä suorituskyvyn arviointia. Arviointia varten loin kaksi testiaineistoa, joista toinen koostuu Twitter-julkaisuista ja toinen uutisartikkeleista. Finger-geojäsennin, testiaineistot ja relevantit ohjelmakoodit jaetaan avoimesti. Geojäsentäminen voidaan jakaa kahteen alitehtävään: paikannimien tunnistamiseen tekstivirrasta ja paikannimien ratkaisemiseen oikeaan koordinaattipisteeseen mahdollisesti useasta kandidaatista. Molemmissa vaiheissa uusimmat metodit nojaavat syväoppimismalleihin ja -menetelmiin, joiden syötteinä ovat sanaupotusten kaltaiset vektorit. Geojäsentimien suoriutumista testataan aineistoilla, joissa paikannimet ja niiden koordinaatit tiedetään. Mittatikkuna tunnistamisessa on vastaavuus ja ratkaisemisessa etäisyys oikeasta sijainnista. Finger käyttää paikannimitunnistinta, joka hyödyntää suomenkielistä BERT-kielimallia, ja suoraviivaista tietokantahakua paikannimien ratkaisemiseen. Ohjelmisto tuottaa taulukkomuotoiseksi jäsenneltyä paikkatietoa, joka sisältää syötetekstit ja niistä mahdollisesti tunnistetut paikannimet koordinaattisijainteineen. Testiaineistot eroavat aihepiireiltään, mutta Finger suoriutuu niillä likipitäen samoin, ja suoriutuu englanninkielisillä aineistoilla tehtyihin arviointeihin suhteutettuna kelvollisesti. Virheanalyysi paljastaa useita virhelähteitä, jotka johtuvat kielten tai maantieteellisen todellisuuden luontaisesta epäselvyydestä tai ovat prosessoinnin aiheuttamia, kuten perusmuotoistamisvirheet. Kaikkia osia Fingerissä voidaan parantaa, muun muassa kehittämällä kielellistä käsittelyä pidemmälle ja luomalla kattavampia testiaineistoja. Samoin tulevaisuuden geojäsentimien tulee kyetä käsittelemään monimutkaisempia kielellisiä ja maantieteellisiä kuvaustapoja kuin pelkät paikannimet ja koordinaattipisteet. Finger ei nykymuodossaan tuota valmista paikkatietoa, jota kannattaisi kritiikittä käyttää. Se on kuitenkin lupaava ensiaskel suomen kielen geojäsentimille ja astinlauta vastaisuuden soveltavalle tutkimukselle. Ever more data is available and shared through the internet. The big data masses often have a spatial dimension and can take many forms, one of which are digital texts, such as articles or social media posts. The geospatial links in these texts are made through place names, also called toponyms, but traditional GIS methods are unable to deal with the fuzzy linguistic information. This creates the need to transform the linguistic location information to an explicit coordinate form. Several geoparsers have been developed to recognize and locate toponyms in free-form texts: the task of these systems is to be a reliable source of location information. Geoparsers have been applied to topics ranging from disaster management to literary studies. Major language of study in geoparser research has been English and geoparsers tend to be language-specific, which threatens to leave the experiences provided by studying and expressed in smaller languages unexplored. This thesis seeks to answer three research questions related to geoparsing: What are the most advanced geoparsing methods? What linguistic and geographical features complicate this multi-faceted problem? And how to evaluate the reliability and usability of geoparsers? The major contributions of this work are an open-source geoparser for Finnish texts, Finger, and two test datasets, or corpora, for testing Finnish geoparsers. One of the datasets consists of tweets and the other of news articles. All of these resources, including the relevant code for acquiring the test data and evaluating the geoparser, are shared openly. Geoparsing can be divided into two sub-tasks: recognizing toponyms amid text flows and resolving them to the correct coordinate location. Both tasks have seen a recent turn to deep learning methods and models, where the input texts are encoded as, for example, word embeddings. Geoparsers are evaluated against gold standard datasets where toponyms and their coordinates are marked. Performance is measured on equivalence and distance-based metrics for toponym recognition and resolution respectively. Finger uses a toponym recognition classifier built on a Finnish BERT model and a simple gazetteer query to resolve the toponyms to coordinate points. The program outputs structured geodata, with input texts and the recognized toponyms and coordinate locations. While the datasets represent different text types in terms of formality and topics, there is little difference in performance when evaluating Finger against them. The overall performance is comparable to the performance of geoparsers of English texts. Error analysis reveals multiple error sources, caused either by the inherent ambiguousness of the studied language and the geographical world or are caused by the processing itself, for example by the lemmatizer. Finger can be improved in multiple ways, such as refining how it analyzes texts and creating more comprehensive evaluation datasets. Similarly, the geoparsing task should move towards more complex linguistic and geographical descriptions than just toponyms and coordinate points. Finger is not, in its current state, a ready source of geodata. However, the system has potential to be the first step for geoparsers for Finnish and it can be a steppingstone for future applied research.
- Other research product . 2022Open Access EnglishAuthors:Koppatz, Maximilian;Koppatz, Maximilian;Publisher: Helsingin yliopistoCountry: Finland
Automatic headline generation has the potential to significantly assist editors charged with head- lining articles. Approaches to automation in the headlining process can range from tools as creative aids, to complete end to end automation. The latter is difficult to achieve as journalistic require- ments imposed on headlines must be met with little room for error, with the requirements depending on the news brand in question. This thesis investigates automatic headline generation in the context of the Finnish newsroom. The primary question I seek to answer is how well the current state of text generation using deep neural language models can be applied to the headlining process in Finnish news media. To answer this, I have implemented and pre-trained a Finnish generative language model based on the Transformer architecture. I have fine-tuned this language model for headline generation as autoregression of headlines conditioned on the article text. I have designed and implemented a variation of the Diverse Beam Search algorithm, with additional parameters, to perform the headline generation in order to generate a diverse set of headlines for a given text. The evaluation of the generative capabilities of this system was done with real world usage in mind. I asked domain-experts in headlining to evaluate a generated set of text-headline pairs. The task was to accept or reject the individual headlines in key criteria. The responses of this survey were then quantitatively and qualitatively analyzed. Based on the analysis and feedback, this model can already be useful as a creative aid in the newsroom despite being far from ready for automation. I have identified concrete improvement directions based on the most common types of errors, and this provides interesting future work.
- Other research product . 2021Open Access EnglishAuthors:Kjellman, Martin;Kjellman, Martin;Publisher: Helsingin yliopistoCountry: Finland
This thesis examines how representatives of service providers for news automation perceive a) journalists and news organisations and b) the service providers’ relationship to these. By introducing new technology (natural language generation, i.e. the transformation of data into everyday language) that influences both the production and business models of news media, news automation represents a type of media innovation. The service providers represent actors peripheral to journalism. The theoretical framework takes hybrid media logics as its starting point, meaning that the power dynamics of news production are thought to be influenced by the field-specific logics of the actors involved. The hybridity metaphor is deepened by using a typology for journalistic strangers that takes into account the different roles peripheral actors adopt in relation to journalists and news organisations. Journalism is understood throughout as a professional ideology encountered by service providers who work with news organisations. Semi-structured interviews were conducted with representatives from companies that create natural language generation software used to produce journalistic text based on data. Participants were asked about their experiences working with news media and the interviews (N=6) were analysed phenomenologically. The findings form three distinct but interrelated dimensions of how the service providers perceive news media and journalism: an area that sorely needs innovators (potential) but lacks resources in terms of knowledge, money and will to innovate (obstacles), but one that they can ultimately learn from and collaborate with (solutions). Their own relationship to journalism and news media is not fixed to one single role. Instead, they alternate between challenging news media (explicit interloping) and inhabiting a supportive role (implicit interloping). This thesis serves as an exploration into how service providers for news automation affect the power dynamics of news production. It does so by unveiling how journalists and news organisations are perceived, and by adding further understanding to previous research on actors peripheral to journalism. In order to further untangle how service providers for news automation shift the balance of power shaping news production, future research should attempt to unify the way traditional news media actors and service providers perceive each other and their collaborations.
- Other research product . 2021Open Access EnglishAuthors:Moisio, Mikko;Moisio, Mikko;Publisher: Helsingin yliopistoCountry: Finland
Semantic textual similarity (STS), the procedure of determining how similar pieces of text are in terms of their meaning, is an important problem in the rapidly evolving field of natural language processing (NLP). STS accelerates major information retrieval applications dealing with natural language text, such as web search engines. For computational efficiency reasons, text pieces are often encoded into semantically meaningful real-valued vectors, sentence embeddings, that can be compared with similarity metrics. Majority of recent NLP research has focused on a small set of largest Indo-European languages and Chinese. Although much of the research is machine learning oriented and is thus often applicable across languages, languages with lesser speaker population, such as Finnish, often lack annotated data required to train, or even evaluate, complex models. BERT, a language representation framework building on transfer learning, is one of the recent quantum leaps in NLP research. BERT-type models take advantage of unsupervised pre-training reducing annotated data demands for supervised tasks. Furthermore, a BERT modification called Sentence-BERT enables us to extend and train BERT-type models to derive semantically meaningful sentence embeddings. However, yet the annotated data demands for conventional training of a Sentence-BERT is relatively low, often such data is unavailable for low-resourced languages. Multilingual knowledge distillation has been shown to be a working strategy for extending mono- lingual Sentence-BERT models to new languages. This technique allows transferring and merging desired properties of two language models, and, instead of annotated data, consumes bilingual parallel samples. In this thesis we study using knowledge distillation to transfer STS properties learnt from English into a model pre-trained on Finnish while bypassing the lack of annotated Finnish data. Further, we experiment distillation with different types of data, English-Finnish bilingual, English monolingual and random pseudo samples, to observe which properties of training data are really necessary. We acquire a bilingual English-Finnish test dataset by translating an existing annotated English dataset and use this set to evaluate the fit of our resulting models. We evaluate the performance of the models in different tasks, English, Finnish and English-Finnish cross-lingual STS, to observe how well the properties being transferred are captured, and how well the models retain the desired properties they already have. We find that knowledge distillation is indeed a feasible approach for obtaining a relatively high quality Sentence-BERT for Finnish. Surprisingly, in all setups large portion of desired properties are transferred to the Finnish model, and, training with English-Finnish bilingual data yields best Finnish sentence embedding model we are aware of.