What question did this study set out to answer?

The aim is to determine the semantic similarity of student answers to reference answers using advanced text models.

February 22, 2026

Automatic Determination of Semantic Similarity of Student Answers with the Reference Answer Using Modern Models

Puntos clave

The aim is to determine the semantic similarity of student answers to reference answers using advanced text models.
Utilized neural network models BERT, GPT, and Mamba to evaluate text similarity.
Conducted experiments on two corpora: the Text Similarity corpus and a custom corpus.
Assessed quality using precision, recall, and F-measure.
Neural network models achieved an F-measure of about 86% for the larger Text Similarity corpus.
Performance for the custom corpus was 50–56% for neural models.
Stylometric features showed an 80% F-measure for the custom corpus, equal to the neural networks.

Resumen

This paper presents the results of a study of modern text models in order to identify, on their basis, the semantic similarity of English-language texts. The task of determining semantic similarity of texts is an important component of many areas of natural language processing: machine translation, information retrieval, question and answer systems, and artificial intelligence in education. The authors solve the problem of classifying the similarity of student answers to the teacher’s reference answer. The neural network language models BERT and GPT, previously used to determine the semantic similarity of texts, the new neural network model Mamba, as well as the stylometric features of the text are chosen for the study. Experiments are carried out with two text corpora: the Text Similarity corpus from open sources and the custom corpus, collected with the help of philologists. The quality of the problem solution is assessed by precision, recall, and the F-measure. All neural network language models show a similar F-measure quality of about 86% for the larger Text Similarity corpus and 50–56% for the custom corpus. The successful application of the Mamba model A gives a completely new result. However, the most interesting achievement concerns the use of vectors of the stylometric features of the text, which show an 80% F-measure for the custom corpus and the same quality of problem solving as neural network models for another corpus.

Me gusta

Guardar

Me gusta

Guardar

Automatic Determination of Semantic Similarity of Student Answers with the Reference Answer Using Modern Models

Puntos clave

Resumen

Cite This Study