• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 2 versions
Research data . Dataset . 2022

Semantic Similarity of IT Support Tickets

Leonardo Santiago Benitez Pereira;
Open Access
Published: 12 Dec 2022
Publisher: Zenodo

Collection of 300 support tickets manually labeled for semantic similarity, obtained from a IT support company in the Florianópolis (Brazil) region. Each ticket is represented by an unstructured text field, which is typed by the user that opened the call. The labeling process was performed in 2022 by three IT support professionals. The corpus contains tickets in many languages, mainly English, German, Portuguese and Spanish. All Personal Identifiable Information (PII) and sensitive information were removed (substituted by a tag indicating the original content, for instance: the sentence "this text was written by Leonardo" is converted to "this text was written by [NAME]"). The removal was performed in three steps: first, the automated machine learning-based tool AWS Comprehend PII Removal was used; then, a sequence of custom regular expressions was applied; last, the entire corpus was manually verified.


Natural Language Processing, IT Support, IT Ticket, Semantic Text Similarity

Related to Research communities
Digital Humanities and Cultural Heritage