• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 20 versions
Publication . Article . Other literature type . Preprint . 2021

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Raphaël Barman; Maud Ehrmann; Simon Clematide; Sofia Ares Oliveira; Frédéric Kaplan;
Open Access
Published: 19 Jan 2021 Journal: Journal of Data Mining and Digital Humanities (issn: 2416-5999, Copyright policy )
Publisher: Nicolas Turenne
Country: Switzerland

The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration. Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis. Although the identification and categorization of segments of interest in document images have seen significant progress over the last years thanks to deep learning techniques, many challenges remain with, among others, the use of more fine-grained segmentation typologies and the consideration of complex, heterogeneous documents such as historical newspapers. Besides, most approaches consider visual features only, ignoring textual signal. We introduce a multimodal neural model for the semantic segmentation of historical newspapers that directly combines visual features at pixel level with text embedding maps derived from, potentially noisy, OCR output. Based on a series of experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features and their capacity to generalize across time and sources. Results show consistent improvement of multimodal models in comparison to a strong visual baseline, as well as better robustness to the wide variety of our material.

Subjects by Vocabulary

Library of Congress Subject Headings: lcsh:History of scholarship and learning. The humanities lcsh:AZ20-999 lcsh:Bibliography. Library science. Information resources lcsh:Z

Microsoft Academic Graph classification: Newspaper Deep learning Segmentation Context (language use) Image segmentation Categorization Multimodal learning Natural language processing computer.software_genre computer Document layout analysis Artificial intelligence business.industry business Computer science


computer science - computer vision and pattern recognition, computer science - computation and language, computer science - information retrieval, computer science - machine learning, historical newspapers, image segmentation, multimodal learning, deep learning, digital humanitites, 000 Computer science, knowledge & systems, 410 Linguistics, 10105 Institute of Computational Linguistics, Computer Science, Computer Vision and Pattern Recognition, Computation and Language, Information Retrieval, Machine Learning, Computer Vision and Pattern Recognition (cs.CV), Computation and Language (cs.CL), Information Retrieval (cs.IR), Machine Learning (cs.LG), FOS: Computer and information sciences

Funded by
SNSF| Media Monitoring of the Past
  • Funder: Swiss National Science Foundation (SNSF)
  • Project Code: 173719
  • Funding stream: Programmes | Sinergia
SNSF| Media Monitoring of the Past
  • Funder: Swiss National Science Foundation (SNSF)
  • Project Code: 173719
  • Funding stream: Programmes | Sinergia
Related to Research communities
Digital Humanities and Cultural Heritage