Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Publikationer från K...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Publikationer från KTH
Bachelor thesis . 2022
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Data Collection and Layout Analysis on Visually Rich Documents using Multi-Modular Deep Learning.

Authors: Stahre, Mattias;

Data Collection and Layout Analysis on Visually Rich Documents using Multi-Modular Deep Learning.

Abstract

The use of Deep Learning methods for Document Understanding has been embraced by the research community in recent years. A requirement for Deep Learning methods and especially Transformer Networks, is access to large datasets. The objective of this thesis was to evaluate a state-of-the-art model for Document Layout Analysis on a public and custom dataset. Additionally, the objective was to build a pipeline for building a dataset specifically for Visually Rich Documents. The research methodology consisted of a literature study to find the state-of-the-art model for Document Layout Analysis and a relevant dataset used to evaluate the chosen model. The literature study also included research on how existing datasets in the domain were collected and processed. Finally, an evaluation framework was created. The evaluation showed that the chosen multi-modal transformer network, LayoutLMv2, performed well on the Docbank dataset. The custom build dataset was limited by class imbalance, although good performance for the larger classes. The annotator tool and its auto-tagging feature performed well and the proposed pipelined showed great promise for creating datasets with Visually Rich Documents. In conclusion, this thesis project answers the research questions and suggests two main opportunities. The first is to encourage others to build datasets with Visually Rich Documents using a similar pipeline to the one presented in this paper. The second is to evaluate the possibility of creating the visual token information for LayoutLMv2 as part of the transformer network rather than using a separate CNN. Användningen av Deep Learning-metoder för dokumentförståelse har anammats av forskarvärlden de senaste åren. Ett krav för Deep Learning-metoder och speciellt Transformer Networks är tillgång till stora datamängder. Syftet med denna avhandling var att utvärdera en state-of-the-art modell för analys av dokumentlayout på en offentligt tillgängligt dataset. Dessutom var målet att bygga en pipeline för att bygga en dataset specifikt för Visuallt Rika Dokument. Forskningsmetodiken bestod av en litteraturstudie för att hitta modellen för Document Layout Analys och ett relevant dataset som användes för att utvärdera den valda modellen. Litteraturstudien omfattade också forskning om hur befintliga dataset i domänen samlades in och bearbetades. Slutligen skapades en utvärderingsram. Utvärderingen visade att det valda multimodala transformatornätverket, LayoutLMv2, fungerade bra på Docbank-datasetet. Den skapade datasetet begränsades av klassobalans även om bra prestanda för de större klasserna erhölls. Annotatorverktyget och dess autotaggningsfunktion fungerade bra och den föreslagna pipelinen visade sig vara mycket lovande för att skapa dataset med VVisuallt Rika Dokument.svis besvarar detta examensarbete forskningsfrågorna och föreslår två huvudsakliga möjligheter. Den första är att uppmuntra andra att bygga datauppsättningar med Visuallt Rika Dokument med en liknande pipeline som den som presenteras i denna uppsats. Det andra är att utvärdera möjligheten att skapa den visuella tokeninformationen för LayoutLMv2 som en del av transformatornätverket snarare än att använda en separat CNN.

Country
Sweden
Related Organizations
Keywords

Maskininlärning, Multi-modulär, Datasamling, Computer Sciences, DeepLearning, Annotation, Computer Vision, Transformer Network, Inbäddning, Dataset Collection, Annotering, Machine Learning, Multi-Modal, Datavetenskap (datalogi), Labeling, Transformernätverk, Märkning, LayoutLMv2, Datorsyn, DocBank, Naturlig Språkbehandling, Natural Language Processing, Embedding, Djupinlärning

6.1 Were the research questions answered? . . . . . . . . . . . . . 57 6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities