- home
- Advanced Search
142 Research products, page 1 of 15
Loading
- Publication . Article . 2021Open Access RussianAuthors:Arzyutov, Dmitry V.; Anderson, David G.;Arzyutov, Dmitry V.; Anderson, David G.;Publisher: KTH, Historiska studier av teknik, vetenskap och miljöCountry: Sweden
What does an anthropologist’s archive look like? Where is it located? And is the anthropology of archives important for the understanding of anthropological thinking today? Here we answer these questions by analysing the various life histories of the archival fragments of one of the most puzzling and influential anthropologists in the history of Russian and Soviet anthropology: Sergei Mikhailovich Shirokogoroff (1887–1939). Shirokogoroff is credited as being one of the authors of the etnos theory — one of the main instruments of identity politics in Russia, China, Germany and also, in part, Japan and South Africa. The transnational life histories of Shirokogoroff and his wife Elizaveta [Elizabeth] Nikolaevna (1884–1943), and of their ideas, suggests a conception of the archive not as a single whole, but instead as a collection of forgotten, hidden, obliterated, or, on the other hand, scrupulously preserved fragments. These fragments are not centred in one place or organized around any one reading, but they nevertheless represent “partial connections”. Moreover, as we can see today with hindsight, none of these archival fragments lay inert. They have been intertwined in local political and social ontologies. Our text has an autoethnograpic quality. While illustrating separate episodes from the life of the Shirokogoroffs we also will tell of our search for the manuscripts through which we were forced onto strange paths and encounters. These greatly deepened our understanding both of the life of documents and their material links to the lives of researchers. Our article is an attempt to illustrate this complex picture which, in the end, will allow us to conclude that we have only just begun to understand the workings of the anthropologist’s archive in the history of anthropological thought. QC 20220530
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open AccessAuthors:Daniel Svensson; Sverker Sörlin; Katarina Saltzman;Daniel Svensson; Sverker Sörlin; Katarina Saltzman;Publisher: Informa UK LimitedCountry: Sweden
Can walking trails be understood not only as routes to history and heritage, but also as heritage in and of themselves? The paper explores the articulation of trails as a distinct landscape and mobility heritage, bridging the nature-culture divide and building on physical and intellectual movements over time. The authors aim to contribute to a better understanding of the geography of trails and trailscapes by analysing the emergence of the Swedish-Norwegian trail Finnskogleden. The trail is situated in the border region spanning the former county of Hedmark in present-day Innlandet County, south-eastern Norway, and Värmland County in mid-western Sweden, a forested area where Finnish-speaking immigrants settled from the 16th century to the early 20th century. Archives, literature, interviews, and field visits were used to analyse the emergence and governance of the trail. The main finding is the importance of continuous articulation work by local and regional stakeholders, through texts, maps, maintenance, and mobility. In conclusion, the Finn forest trailscape and its mobility heritage can be seen as an articulation of territory over time, a multilayered process drawing on various environing technologies, making the trail a transformative part of a trans-border political geography. Rörelsearvet: stigar och leder i hållbar och inkluderande kulturarvsförvaltning
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Master thesis . Bachelor thesis . 2021Open Access EnglishAuthors:González Lopez, Angel Luis;González Lopez, Angel Luis;Publisher: E.T.S. de Ingenieros Informáticos (UPM)Countries: Sweden, Spain
Code Search is one of the most common tasks for developers. The open-source software movement and the rise of social media have made this process easier thanks to the vast public software repositories available to everyone and the Q&A sites where individuals can resolve their doubts. However, in the case of poorly documented code that is difficult to search in a repository, or in the case of private enterprise frameworks that are not publicly available, so there is not a community on Q&A sites to answer questions, searching for code snippets to solve doubts or learn how to use an API becomes very complicated. In order to solve this problem, this thesis studies the use of natural language in code retrieval. In particular, it studies transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), which are currently state of the art in natural language processing but present high latency in information retrieval tasks. That is why this project proposes a multi-stage architecture that seeks to maintain the performance of standard BERT-based models while reducing the high latency usually associated with the use of this type of framework. Experiments show that this architecture outperforms previous non- BERT-based models by +0.17 on the Top 1 (or Recall@1) metric and reduces latency with inference times 5% of those of standard BERT models. Kodsökning är en av de vanligaste uppgifterna för utvecklare. Rörelsen för öppen källkod och de sociala medierna har gjort denna process enklare tack vare de stora offentliga programvaruupplagorna som är tillgängliga för alla och de Q&A-webbplatser där enskilda personer kan lösa sina tvivel. När det gäller dåligt dokumenterad kod som är svår att söka i ett arkiv, eller när det gäller ramverk för privata företag som inte är offentligt tillgängliga, så att det inte finns någon gemenskap på Q&AA-webbplatser för att besvara frågor, blir det dock mycket komplicerat att söka efter kodstycken för att lösa tvivel eller lära sig hur man använder ett API. För att lösa detta problem studeras i denna avhandling användningen av naturligt språk för att hitta kod. I synnerhet studeras transformatorbaserade modeller, såsom BERT, som för närvarande är den senaste tekniken inom behandling av naturliga språk men som har hög latenstid vid informationssökning. Därför föreslås i detta projekt en arkitektur i flera steg som syftar till att bibehålla prestandan hos standard BERT-baserade modeller samtidigt som den höga latenstiden som vanligtvis är förknippad med användningen av denna typ av ramverk minskas. Experiment visar att denna arkitektur överträffar tidigare icke-BERT-baserade modeller med +0,17 på Top 1 (eller Recall@1) och minskar latensen, med en inferenstid som är 5% av den för standard BERT-modeller.
- Publication . Conference object . Article . 2021Open AccessAuthors:Jonas Sjöbergh; Viggo Kann;Jonas Sjöbergh; Viggo Kann;
doi: 10.3384/ecp184175
Publisher: Linköping University Electronic PressCountry: SwedenWe present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API. QC 20230328
add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Conference object . 2021Open Access EnglishAuthors:Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;Publisher: ZenodoCountry: SwedenProject: EC | RAIS (813162)
Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training. QC 20210423
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open AccessAuthors:Chen Feng; John Peponis;Chen Feng; John Peponis;Publisher: SAGE PublicationsCountry: Sweden
The patterns of syntactic differentiation and their causes and effects are fundamental to space syntax analysis. Often, however, differentiation is taken for granted with no reference to the dynamic process that brings it about. Here, we first show that by measuring the amount of syntactic differentiation, we can better distinguish between types of street networks. We then show that repeated local transformations of a regular street grid lead to different yet largely predictable trajectories of differentiation depending upon the rules used. Finally, we show that different paths to differentiation entail different costs in terms of undesirable properties. This allows us to better assess the likely consequences of design moves and their appropriateness relative to design intentions. QC 20210614
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Conference object . 2021Open Access EnglishAuthors:Viktor Palmkvist; Elias Castegren; Philipp Haller; David Broman;Viktor Palmkvist; Elias Castegren; Philipp Haller; David Broman;Publisher: KTH, Programvaruteknik och datorsystem, SCSCountry: Sweden
When building a new programming language, it can be useful to compose parts of existing languages to avoid repeating implementation work. However, this is problematic already at the syntax level, as composing the grammars of language fragments can easily lead to an ambiguous grammar. State-of-the-art parser tools cannot handle ambiguity truly well: either the grammar cannot be handled at all, or the tools give little help to an end-user who writes an ambiguous program. This composability problem is twofold: (i) how can we detect if the composed grammar is ambiguous, and (ii) if it is ambiguous, how can we help a user resolve an ambiguous program? In this paper, we depart from the traditional view of unambiguous grammar design and enable a language designer to work with an ambiguous grammar, while giving users the tools needed to handle these ambiguities. We introduce the concept of resolvable ambiguity wherein a user can resolve an ambiguous program by editing it, as well as an approach to computing the resolutions of an ambiguous program. Furthermore, we present a method based on property-based testing to identify if a composed grammar is unambiguous, resolvably ambiguous, or unresolvably ambiguous. The method is implemented in Haskell and evaluated on a large set of language fragments selected from different languages. The evaluation shows that (i) the approach can handle significantly more cases of language compositions compared to approaches which ban ambiguity altogether, and (ii) that the approach is fast enough to be used in practice. Part of proceedings: ISBN 9781450383257, QC 20230117
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Bachelor thesis . 2021Open Access EnglishAuthors:Bubla, Boris;Bubla, Boris;Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)Country: Sweden
The recent development of massive multilingual transformer networks has resulted in drastic improvements in model performance. These models, however, are so large they suffer from large inference latency and consume vast computing resources. Such features hinder widespread adoption of the models in industry and some academic settings. Thus there is growing research into reducing their parameter count and increasing their inference speed, with significant interest in the use of knowledge distillation techniques. This thesis uses the existing approach of deep self-attention distillation to develop a task-agnostic distillation of the language agnostic BERT sentence embedding model. It also explores the use of the Switch Transformer architecture in distillation contexts. The result is DistilLaBSE, a task-agnostic distillation of LaBSE used to create a 10 times faster version of LaBSE, whilst retaining over 99% cosine similarity of its sentence embeddings on a holdout test from the same domain as the training samples, namely the OpenSubtitles dataset. It is also shown that DistilLaBSE achieves similar scores when embedding data from two other domains, namely English tweets and customer support banking data. This faster version of LaBSE allows industry practitioners and resourcelimited academic groups to apply a more convenient version of LaBSE to their various applications and research tasks. Den senaste utvecklingen av massiva flerspråkiga transformatornätverk har resulterat i drastiska förbättringar av modellprestanda. Dessa modeller är emellertid så stora att de lider av stor inferenslatens och förbrukar stora datorresurser. Sådana funktioner hindrar bred spridning av modeller i branschen och vissa akademiska miljöer. Således växer det forskning om att minska deras parametrar och öka deras inferenshastighet, med stort intresse för användningen av kunskapsdestillationstekniker. Denna avhandling använder det befintliga tillvägagångssättet med djup uppmärksamhetsdestillation för att utveckla en uppgiftsagnostisk destillation av språket agnostisk BERT- innebördmodell. Den utforskar också användningen av Switch Transformerarkitekturen i destillationskontexter. Resultatet är DistilLaBSE, en uppgiftsagnostisk destillation av LaBSE som används för att skapa en 10x snabbare version av LaBSE, samtidigt som man bibehåller mer än 99 % cosinuslikhet i sina meningsinbäddningar på ett uthållstest från samma domän som träningsproverna, nämligen OpenSubtitles dataset. Det visas också att DistilLaBSE uppnår liknande poäng när man bäddar in data från två andra domäner, nämligen engelska tweets och kundsupportbankdata. Denna snabbare version av LaBSE tillåter branschutövare och resursbegränsade akademiska grupper
- Publication . Article . 2021Open Access EnglishAuthors:Sverker Sörlin;Sverker Sörlin;Publisher: KTH, Historiska studier av teknik, vetenskap och miljöCountry: SwedenProject: EC | SPHERE (787516)
AbstractEmerging after World War II “the environment” as a modern concept turned in the years around 1970 into a phase of institutionalization in science, civic society, and politics. Part of this was the foundation of journals. The majority became “environmental specialist journals”, typically based in established disciplines. Some became “environmental generalist journals”, covering broad knowledge areas and often with an ambition to be policy relevant. A significant and early member of the latter category was Ambio, founded 1972. This article presents an overview of the journal’s first 50 years, with a focus on main changes in scientific content, political context, and editorial directions. A key finding is that the journal reflects an increasing pluralization of “the environment” with concepts such as global change, climate change, Earth system science, Anthropocene, resilience, and environmental governance. Another finding is that the journal has also itself influenced developments through publishing work on new concepts and ideas.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Bachelor thesis . 2021Open Access EnglishAuthors:Lazarova, Mariya;Lazarova, Mariya;Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)Country: Sweden
Nowadays, with the ever growing availability of options in many areas of our lives, it is crucial to have good ways to navigate your choices. This is why recommendation engines’ role is growing more important. Recommenders are often based on user-item interaction. In many areas like news and podcasts, however, by the time there is enough interaction data for an item, the item has already become irrelevant. This is why incorporating content features is desirable, as the content does not depend on the popularity or novelty of an item. Very often, there is text describing an item, so text features are good candidates for features within recommender systems. Within Natural Language Processing (NLP), pre-trained language models based on the Transformer architecture have brought a revolution in recent years, achieving state-of-the-art performance on many language tasks. Because of this, it is natural to explore how such models can play a role within recommendation systems. The scope of this work is on the intersection between NLP and recommendation systems where we investigate what are the effects of adding BERT-based encodings of titles and descriptions of movies and books to a recommender system. The results show that even in off-the-shelf BERT-models there is a considerable amount of information on movie and book similarity. It also shows that BERT based representations could be used in a recommender system for user recommendation to combine the best of collaborative and content representations. In this thesis, it is shown that adding deep pre-trained language model representations could improve a recommender system’s capability to predict good items for users with up to 0.43 AUC-ROC score for a shallow model, and 0.017 AUC-ROC score for a deeper model. It is also shown that SBERT can be fine-tuned to encode item similarity with up to 0.03 nDCG and up to 0.05 nDCG@10 score improvement. Med den ständigt växande tillgängligheten av val i många delar av våra liv har det blivit viktigt att enkelt kunna navigera kring olika alternativ. Det är därför rekommendationssystems har blivit viktigare. Rekommendationssystem baseras ofta på interaktion-historiken mellan användare och artikel. När tillräckligt mycket data inom nyheter och podcast har hunnits samlats in för att utföra en rekommendation så har artikeln hunnit bli irrelevant. Det är därför det är önskvärt att införa innehållsfunktioner till rekommenderaren, då innehållet inte är beroende av popularitet eller nymodigheten av artikeln. Väldigt ofta finns det text som beskriver en artikel vilket har lett till textfunktioner blivit bra kandidater som funktion för rekommendationssystem. Inom Naturlig Språkbehandling (NLP), har förtränande språkmodeller baserad på transformator arkitekturen revolutionerat området de senaste åren. Den nya arkitekturen har uppnått toppmoderna resultat på flertal språkuppgifter. Tack vare detta, har det blivit naturligt att utforska hur sådana modeller kan fungera inom rekommendationssystem. Det här arbetet är mellan två områden, NLP och rekommendationssystem. Arbetet utforskar effekten av att lägga till BERT-baserade kodningar av titel och beskrivning av filmer, samt böcker till ett rekommendationssystem. Resultaten visar att även i förpackade BERT modeller finns det mycket av information om likheter mellan film och böcker. Resultaten visar även att BERT representationer kan användas i rekommendationssystem för användarrekommendationer, i kombination med kollaborativa och artikel baserade representationer. Uppsatsen visar att lägga till förtränade djupspråkmodell representationer kan förbättra rekommendationssystemens förmåga att förutsäga bra artiklar för användare. Förbättringarna är upp till 0.43 AUC-ROC poäng för en grundmodell, samt 0.017 AUC-ROC poäng för en djupmodell. Uppsatsen visar även att SBERT kan bli finjusterad för att koda artikel likhet med upp till 0.03 nDCG och upp till 0.05 nDCG@10 poängs förbättring.
142 Research products, page 1 of 15
Loading
- Publication . Article . 2021Open Access RussianAuthors:Arzyutov, Dmitry V.; Anderson, David G.;Arzyutov, Dmitry V.; Anderson, David G.;Publisher: KTH, Historiska studier av teknik, vetenskap och miljöCountry: Sweden
What does an anthropologist’s archive look like? Where is it located? And is the anthropology of archives important for the understanding of anthropological thinking today? Here we answer these questions by analysing the various life histories of the archival fragments of one of the most puzzling and influential anthropologists in the history of Russian and Soviet anthropology: Sergei Mikhailovich Shirokogoroff (1887–1939). Shirokogoroff is credited as being one of the authors of the etnos theory — one of the main instruments of identity politics in Russia, China, Germany and also, in part, Japan and South Africa. The transnational life histories of Shirokogoroff and his wife Elizaveta [Elizabeth] Nikolaevna (1884–1943), and of their ideas, suggests a conception of the archive not as a single whole, but instead as a collection of forgotten, hidden, obliterated, or, on the other hand, scrupulously preserved fragments. These fragments are not centred in one place or organized around any one reading, but they nevertheless represent “partial connections”. Moreover, as we can see today with hindsight, none of these archival fragments lay inert. They have been intertwined in local political and social ontologies. Our text has an autoethnograpic quality. While illustrating separate episodes from the life of the Shirokogoroffs we also will tell of our search for the manuscripts through which we were forced onto strange paths and encounters. These greatly deepened our understanding both of the life of documents and their material links to the lives of researchers. Our article is an attempt to illustrate this complex picture which, in the end, will allow us to conclude that we have only just begun to understand the workings of the anthropologist’s archive in the history of anthropological thought. QC 20220530
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open AccessAuthors:Daniel Svensson; Sverker Sörlin; Katarina Saltzman;Daniel Svensson; Sverker Sörlin; Katarina Saltzman;Publisher: Informa UK LimitedCountry: Sweden
Can walking trails be understood not only as routes to history and heritage, but also as heritage in and of themselves? The paper explores the articulation of trails as a distinct landscape and mobility heritage, bridging the nature-culture divide and building on physical and intellectual movements over time. The authors aim to contribute to a better understanding of the geography of trails and trailscapes by analysing the emergence of the Swedish-Norwegian trail Finnskogleden. The trail is situated in the border region spanning the former county of Hedmark in present-day Innlandet County, south-eastern Norway, and Värmland County in mid-western Sweden, a forested area where Finnish-speaking immigrants settled from the 16th century to the early 20th century. Archives, literature, interviews, and field visits were used to analyse the emergence and governance of the trail. The main finding is the importance of continuous articulation work by local and regional stakeholders, through texts, maps, maintenance, and mobility. In conclusion, the Finn forest trailscape and its mobility heritage can be seen as an articulation of territory over time, a multilayered process drawing on various environing technologies, making the trail a transformative part of a trans-border political geography. Rörelsearvet: stigar och leder i hållbar och inkluderande kulturarvsförvaltning
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Master thesis . Bachelor thesis . 2021Open Access EnglishAuthors:González Lopez, Angel Luis;González Lopez, Angel Luis;Publisher: E.T.S. de Ingenieros Informáticos (UPM)Countries: Sweden, Spain
Code Search is one of the most common tasks for developers. The open-source software movement and the rise of social media have made this process easier thanks to the vast public software repositories available to everyone and the Q&A sites where individuals can resolve their doubts. However, in the case of poorly documented code that is difficult to search in a repository, or in the case of private enterprise frameworks that are not publicly available, so there is not a community on Q&A sites to answer questions, searching for code snippets to solve doubts or learn how to use an API becomes very complicated. In order to solve this problem, this thesis studies the use of natural language in code retrieval. In particular, it studies transformer-based models, such as Bidirectional Encoder Representations from Transformers (BERT), which are currently state of the art in natural language processing but present high latency in information retrieval tasks. That is why this project proposes a multi-stage architecture that seeks to maintain the performance of standard BERT-based models while reducing the high latency usually associated with the use of this type of framework. Experiments show that this architecture outperforms previous non- BERT-based models by +0.17 on the Top 1 (or Recall@1) metric and reduces latency with inference times 5% of those of standard BERT models. Kodsökning är en av de vanligaste uppgifterna för utvecklare. Rörelsen för öppen källkod och de sociala medierna har gjort denna process enklare tack vare de stora offentliga programvaruupplagorna som är tillgängliga för alla och de Q&A-webbplatser där enskilda personer kan lösa sina tvivel. När det gäller dåligt dokumenterad kod som är svår att söka i ett arkiv, eller när det gäller ramverk för privata företag som inte är offentligt tillgängliga, så att det inte finns någon gemenskap på Q&AA-webbplatser för att besvara frågor, blir det dock mycket komplicerat att söka efter kodstycken för att lösa tvivel eller lära sig hur man använder ett API. För att lösa detta problem studeras i denna avhandling användningen av naturligt språk för att hitta kod. I synnerhet studeras transformatorbaserade modeller, såsom BERT, som för närvarande är den senaste tekniken inom behandling av naturliga språk men som har hög latenstid vid informationssökning. Därför föreslås i detta projekt en arkitektur i flera steg som syftar till att bibehålla prestandan hos standard BERT-baserade modeller samtidigt som den höga latenstiden som vanligtvis är förknippad med användningen av denna typ av ramverk minskas. Experiment visar att denna arkitektur överträffar tidigare icke-BERT-baserade modeller med +0,17 på Top 1 (eller Recall@1) och minskar latensen, med en inferenstid som är 5% av den för standard BERT-modeller.
- Publication . Conference object . Article . 2021Open AccessAuthors:Jonas Sjöbergh; Viggo Kann;Jonas Sjöbergh; Viggo Kann;
doi: 10.3384/ecp184175
Publisher: Linköping University Electronic PressCountry: SwedenWe present an online API to access a number of Natural Language Processing services developed at KTH. The services work on Swedish text. They include tokenization, part-of-speech tagging, shallow parsing, compound word analysis, word inflection, lemmatization, spelling error detection and correction, grammar checking, and more. The services can be accessed in several ways, including a RESTful interface, direct socket communication, and premade Web forms. The services are open to anyone. The source code is also freely available making it possible to set up another server or run the tools locally. We have also evaluated the performance of several of the services and compared them to other available systems. Both the precision and the recall for the Granska grammar checker are higher than for both Microsoft Word and Google Docs. The evaluation also shows that the recall is greatly improved when combining all the grammar checking services in the API, compared to any one method, and combining services is made easy by the API. QC 20230328
add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Conference object . 2021Open Access EnglishAuthors:Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;Alkathiri, Abdul Aziz; Giaretta, Lodovico; Girdzijauskas, Sarunas; Sahlgren, Magnus;Publisher: ZenodoCountry: SwedenProject: EC | RAIS (813162)
Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training. QC 20210423
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Article . 2021Open AccessAuthors:Chen Feng; John Peponis;Chen Feng; John Peponis;Publisher: SAGE PublicationsCountry: Sweden
The patterns of syntactic differentiation and their causes and effects are fundamental to space syntax analysis. Often, however, differentiation is taken for granted with no reference to the dynamic process that brings it about. Here, we first show that by measuring the amount of syntactic differentiation, we can better distinguish between types of street networks. We then show that repeated local transformations of a regular street grid lead to different yet largely predictable trajectories of differentiation depending upon the rules used. Finally, we show that different paths to differentiation entail different costs in terms of undesirable properties. This allows us to better assess the likely consequences of design moves and their appropriateness relative to design intentions. QC 20210614
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Conference object . 2021Open Access EnglishAuthors:Viktor Palmkvist; Elias Castegren; Philipp Haller; David Broman;Viktor Palmkvist; Elias Castegren; Philipp Haller; David Broman;Publisher: KTH, Programvaruteknik och datorsystem, SCSCountry: Sweden
When building a new programming language, it can be useful to compose parts of existing languages to avoid repeating implementation work. However, this is problematic already at the syntax level, as composing the grammars of language fragments can easily lead to an ambiguous grammar. State-of-the-art parser tools cannot handle ambiguity truly well: either the grammar cannot be handled at all, or the tools give little help to an end-user who writes an ambiguous program. This composability problem is twofold: (i) how can we detect if the composed grammar is ambiguous, and (ii) if it is ambiguous, how can we help a user resolve an ambiguous program? In this paper, we depart from the traditional view of unambiguous grammar design and enable a language designer to work with an ambiguous grammar, while giving users the tools needed to handle these ambiguities. We introduce the concept of resolvable ambiguity wherein a user can resolve an ambiguous program by editing it, as well as an approach to computing the resolutions of an ambiguous program. Furthermore, we present a method based on property-based testing to identify if a composed grammar is unambiguous, resolvably ambiguous, or unresolvably ambiguous. The method is implemented in Haskell and evaluated on a large set of language fragments selected from different languages. The evaluation shows that (i) the approach can handle significantly more cases of language compositions compared to approaches which ban ambiguity altogether, and (ii) that the approach is fast enough to be used in practice. Part of proceedings: ISBN 9781450383257, QC 20230117
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Bachelor thesis . 2021Open Access EnglishAuthors:Bubla, Boris;Bubla, Boris;Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)Country: Sweden
The recent development of massive multilingual transformer networks has resulted in drastic improvements in model performance. These models, however, are so large they suffer from large inference latency and consume vast computing resources. Such features hinder widespread adoption of the models in industry and some academic settings. Thus there is growing research into reducing their parameter count and increasing their inference speed, with significant interest in the use of knowledge distillation techniques. This thesis uses the existing approach of deep self-attention distillation to develop a task-agnostic distillation of the language agnostic BERT sentence embedding model. It also explores the use of the Switch Transformer architecture in distillation contexts. The result is DistilLaBSE, a task-agnostic distillation of LaBSE used to create a 10 times faster version of LaBSE, whilst retaining over 99% cosine similarity of its sentence embeddings on a holdout test from the same domain as the training samples, namely the OpenSubtitles dataset. It is also shown that DistilLaBSE achieves similar scores when embedding data from two other domains, namely English tweets and customer support banking data. This faster version of LaBSE allows industry practitioners and resourcelimited academic groups to apply a more convenient version of LaBSE to their various applications and research tasks. Den senaste utvecklingen av massiva flerspråkiga transformatornätverk har resulterat i drastiska förbättringar av modellprestanda. Dessa modeller är emellertid så stora att de lider av stor inferenslatens och förbrukar stora datorresurser. Sådana funktioner hindrar bred spridning av modeller i branschen och vissa akademiska miljöer. Således växer det forskning om att minska deras parametrar och öka deras inferenshastighet, med stort intresse för användningen av kunskapsdestillationstekniker. Denna avhandling använder det befintliga tillvägagångssättet med djup uppmärksamhetsdestillation för att utveckla en uppgiftsagnostisk destillation av språket agnostisk BERT- innebördmodell. Den utforskar också användningen av Switch Transformerarkitekturen i destillationskontexter. Resultatet är DistilLaBSE, en uppgiftsagnostisk destillation av LaBSE som används för att skapa en 10x snabbare version av LaBSE, samtidigt som man bibehåller mer än 99 % cosinuslikhet i sina meningsinbäddningar på ett uthållstest från samma domän som träningsproverna, nämligen OpenSubtitles dataset. Det visas också att DistilLaBSE uppnår liknande poäng när man bäddar in data från två andra domäner, nämligen engelska tweets och kundsupportbankdata. Denna snabbare version av LaBSE tillåter branschutövare och resursbegränsade akademiska grupper
- Publication . Article . 2021Open Access EnglishAuthors:Sverker Sörlin;Sverker Sörlin;Publisher: KTH, Historiska studier av teknik, vetenskap och miljöCountry: SwedenProject: EC | SPHERE (787516)
AbstractEmerging after World War II “the environment” as a modern concept turned in the years around 1970 into a phase of institutionalization in science, civic society, and politics. Part of this was the foundation of journals. The majority became “environmental specialist journals”, typically based in established disciplines. Some became “environmental generalist journals”, covering broad knowledge areas and often with an ambition to be policy relevant. A significant and early member of the latter category was Ambio, founded 1972. This article presents an overview of the journal’s first 50 years, with a focus on main changes in scientific content, political context, and editorial directions. A key finding is that the journal reflects an increasing pluralization of “the environment” with concepts such as global change, climate change, Earth system science, Anthropocene, resilience, and environmental governance. Another finding is that the journal has also itself influenced developments through publishing work on new concepts and ideas.
Average popularityAverage popularity In bottom 99%Average influencePopularity: Citation-based measure reflecting the current impact.Average influence In bottom 99%Influence: Citation-based measure reflecting the total impact.add Add to ORCIDPlease grant OpenAIRE to access and update your ORCID works.This Research product is the result of merged Research products in OpenAIRE.
You have already added works in your ORCID record related to the merged Research product. - Publication . Bachelor thesis . 2021Open Access EnglishAuthors:Lazarova, Mariya;Lazarova, Mariya;Publisher: KTH, Skolan för elektroteknik och datavetenskap (EECS)Country: Sweden
Nowadays, with the ever growing availability of options in many areas of our lives, it is crucial to have good ways to navigate your choices. This is why recommendation engines’ role is growing more important. Recommenders are often based on user-item interaction. In many areas like news and podcasts, however, by the time there is enough interaction data for an item, the item has already become irrelevant. This is why incorporating content features is desirable, as the content does not depend on the popularity or novelty of an item. Very often, there is text describing an item, so text features are good candidates for features within recommender systems. Within Natural Language Processing (NLP), pre-trained language models based on the Transformer architecture have brought a revolution in recent years, achieving state-of-the-art performance on many language tasks. Because of this, it is natural to explore how such models can play a role within recommendation systems. The scope of this work is on the intersection between NLP and recommendation systems where we investigate what are the effects of adding BERT-based encodings of titles and descriptions of movies and books to a recommender system. The results show that even in off-the-shelf BERT-models there is a considerable amount of information on movie and book similarity. It also shows that BERT based representations could be used in a recommender system for user recommendation to combine the best of collaborative and content representations. In this thesis, it is shown that adding deep pre-trained language model representations could improve a recommender system’s capability to predict good items for users with up to 0.43 AUC-ROC score for a shallow model, and 0.017 AUC-ROC score for a deeper model. It is also shown that SBERT can be fine-tuned to encode item similarity with up to 0.03 nDCG and up to 0.05 nDCG@10 score improvement. Med den ständigt växande tillgängligheten av val i många delar av våra liv har det blivit viktigt att enkelt kunna navigera kring olika alternativ. Det är därför rekommendationssystems har blivit viktigare. Rekommendationssystem baseras ofta på interaktion-historiken mellan användare och artikel. När tillräckligt mycket data inom nyheter och podcast har hunnits samlats in för att utföra en rekommendation så har artikeln hunnit bli irrelevant. Det är därför det är önskvärt att införa innehållsfunktioner till rekommenderaren, då innehållet inte är beroende av popularitet eller nymodigheten av artikeln. Väldigt ofta finns det text som beskriver en artikel vilket har lett till textfunktioner blivit bra kandidater som funktion för rekommendationssystem. Inom Naturlig Språkbehandling (NLP), har förtränande språkmodeller baserad på transformator arkitekturen revolutionerat området de senaste åren. Den nya arkitekturen har uppnått toppmoderna resultat på flertal språkuppgifter. Tack vare detta, har det blivit naturligt att utforska hur sådana modeller kan fungera inom rekommendationssystem. Det här arbetet är mellan två områden, NLP och rekommendationssystem. Arbetet utforskar effekten av att lägga till BERT-baserade kodningar av titel och beskrivning av filmer, samt böcker till ett rekommendationssystem. Resultaten visar att även i förpackade BERT modeller finns det mycket av information om likheter mellan film och böcker. Resultaten visar även att BERT representationer kan användas i rekommendationssystem för användarrekommendationer, i kombination med kollaborativa och artikel baserade representationer. Uppsatsen visar att lägga till förtränade djupspråkmodell representationer kan förbättra rekommendationssystemens förmåga att förutsäga bra artiklar för användare. Förbättringarna är upp till 0.43 AUC-ROC poäng för en grundmodell, samt 0.017 AUC-ROC poäng för en djupmodell. Uppsatsen visar även att SBERT kan bli finjusterad för att koda artikel likhet med upp till 0.03 nDCG och upp till 0.05 nDCG@10 poängs förbättring.