Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Publikationer från K...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
Publikationer från KTH
Bachelor thesis . 2022
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Cluster-assisted Grading : Comparison of different methods for pre-processing, text representation and cluster analysis in cluster-assisted short-text grading

Authors: Båth, Jacob;

Cluster-assisted Grading : Comparison of different methods for pre-processing, text representation and cluster analysis in cluster-assisted short-text grading

Abstract

School teachers spend approximately 30 percent of their time grading exams and other assessments. With an increasingly digitized education, a research field have been initiated that aims to reduce the time spent on grading by automating it. This is an easy task for multiple-choice questions but much harder for open-ended questions requiring free-text answers, where the latter have shown to be superior for knowledge assessment and learning consolidation. While results in previous work have presented promising results of up to 90 percent grading accuracy, it is still problematic using a system that gives the wrong grade in 10 percent of the cases. This has given rise to a research field focusing on assisting teachers in the grading process, instead of fully replacing them. Cluster analysis has been the most popular tool for this, grouping similar answers together and letting teachers process groups of answers at once, instead of evaluating each question one-at-a-time. This approach has shown evidence to decrease the time spent on grading substantially, however, the methods for performing the clustering vary widely between studies, leaving no apparent methodology choice for real-use implementation. Using several techniques for pre-processing, text representation and choice of clustering algorithm, this work compared various methods for clustering free-text answers by evaluating them on a dataset containing almost 400 000 student answers. The results showed that using all of the tested pre-processing techniques led to the best performance, although the difference to using minimum pre-processing were small. Sentence embeddings were the text representation approach that performed the best, however, it remains to be answered how it should be used when spelling and grammar is part of the assessment, as it lacks the ability to identify such errors. A suitable choice of clustering algorithm is one where the number of clusters can be specified, as determining this automatically proved to be difficult. Teachers can then easily adjust the number of clusters based on their judgement. Skollärare spenderar ungefär 30 procent av sin tid på rättning av prov och andra bedömningar. I takt med att mer utbildning digitaliseras, försöker forskare hitta sätt att automatisera rättning för att minska den administrativa bördan för lärare. Flervalsfrågor har fördelen att de enkelt kan rättas automatiskt, medan öppet ställda frågor som kräver ett fritt formulerat svar har visat sig vara ett bättre verktyg för att mäta elevers förståelse. Dessa typer av frågor är däremot betydligt svårare att rätta automatiskt, vilket lett till forskning inom automatisk rättning av dessa. Även om tidigare forskning har lyckats uppnå resultat med upp till 90 procents träffsäkerhet, är det fortfarande problematiskt att det blir fel i de resterande 10 procenten av fallen. Detta har lett till forskning som fokuserar på underlätta för lärare i rättningen, istället för att ersätta dem. Klusteranalys har varit det mest populära tillvägagångssättet för att åstadkomma detta, där liknande svar grupperas tillsammans, vilket möjliggör rättning av flera svar samtidigt. Denna metod har visat sig minska rättningstiden signifikant, däremot har metoderna för att göra klusteranalysen varierat brett, vilket gör det svårt att veta hur en implementering i ett verkligt scenario bör se ut. Genom att använda olika tekniker för textbearbetning, textrepresentation och val av klusteralgoritm, jämför detta arbete olika metoder för att klustra fritext-svar, genom att utvärdera dessa på nästan 400 000 riktiga elevsvar. Resultatet visar att mer textbearbetning generellt är bättre, även om skillnaderna är små. Användning av så kallade sentence embeddings ledde till bäst resultat när olika tekniker för textrepresentation jämfördes. Däremot har denna teknik svårare att identifiera grammatik- och stavningsfel, hur detta ska hanteras är en fråga för framtida forskning. Ett lämpligt val av klustringsalgoritm är en där antalet kluster kan bestämmas av användaren, då det visat sig svårt att bestämma det automatiskt. Lärare kan då justera antalet kluster ifall det skulle vara för få eller för många.

Country
Sweden
Related Organizations
Keywords

Machine Learning, Computer and Information Sciences, Automatic Grading, Cluster Analysis, Data- och informationsvetenskap, Automatic Short Answer Grading, Natural Language Processing

1 Introduction 1 1.1 Introduction and background . . . . . . . . . . . . . . . . . . 1 1.2 Objective and research question . . . . . . . . . . . . . . . . 2 1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Research methodology . . . . . . . . . . . . . . . . . . . . . 3 1.5 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.6 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 5 2.1 Pre-processing and text representations . . . . . . . . . . . . . 5 2.1.1 Bag-of-words (BoW) . . . . . . . . . . . . . . . . . . 5 2.1.2 TF-IDF . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.3 N-grams and Chargrams . . . . . . . . . . . . . . . . 6 2.1.4 FastText . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Cluster analysis . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 K-means . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Mean shift . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Agglomerative clustering . . . . . . . . . . . . . . . . 10 2.2.4 Determining the number of clusters . . . . . . . . . . 10 2.2.5 Cluster evaluation . . . . . . . . . . . . . . . . . . . . 11 2.3 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Automatic Short Answer Grading . . . . . . . . . . . 12 2.3.2 Assisted short answer grading . . . . . . . . . . . . . 12 2.3.3 The benefits of assisted grading . . . . . . . . . . . . 14 3.3.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . 18 3.3.2 Text representations . . . . . . . . . . . . . . . . . . 20 3.3.3 Vector normalization . . . . . . . . . . . . . . . . . . 20 3.3.4 Clustering . . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Results and Analysis 25 4.1 Clustering algorithms . . . . . . . . . . . . . . . . . . . . . . 25 4.2 Text representations . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 Varying question characteristics . . . . . . . . . . . . . . . . 30 4.4.1 Varying answer length . . . . . . . . . . . . . . . . . 31 4.4.2 Varying number of answers . . . . . . . . . . . . . . 32 4.5 Clustered example question . . . . . . . . . . . . . . . . . . . 32

5 Discussion 37 5.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 Word representations . . . . . . . . . . . . . . . . . . . . . . 37 5.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.4 Practical use . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.5 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.6 Ethical aspects . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Conclusions and Future work 41 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

[2] J. H. McMillan, “Secondary Teachers' Classroom Assessment and Grading Practices,” Educational Measurement: Issues and Practice, vol. 20, no. 1, pp. 20-32, 2001. doi: 10.1111/j.1745-3992.2001.tb00055.x _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1745- 3992.2001.tb00055.x. [Online]. Available: https://onlinelibrary.wiley. com/doi/abs/10.1111/j.1745-3992.2001.tb00055.x

[3] J. D. Karpicke and H. L. Roediger, “The Critical Importance of Retrieval for Learning,” Science, vol. 319, no. 5865, pp. 966- 968, Feb. 2008. doi: 10.1126/science.1152408 Publisher: American Association for the Advancement of Science. [Online]. Available: https://www.science.org/doi/abs/10.1126/science.1152408

[4] M. Mohler and R. Mihalcea, “Text-to-Text Semantic Similarity for Automatic Short Answer Grading,” in Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009). Athens, Greece: Association for Computational Linguistics, Mar. 2009, pp. 567-575. [Online]. Available: https://aclanthology.org/E09-1065

[5] M. A. Sultan, C. Salazar, and T. Sumner, “Fast and Easy Short Answer Grading with High Accuracy,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics, Jun.

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
  • citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    Powered byBIP!BIP!
Powered by OpenAIRE graph
Found an issue? Give us feedback
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities
Digital Humanities and Cultural Heritage