• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Automatic Assessment of Exam Essays Based on Morphological, Lexical, Syntactic, and Discursive Factors

Student: Panteleeva Irina

Supervisor: Olga Lyashevskaya

Faculty: Faculty of Humanities

Educational Programme: Fundamental and Computational Linguistics (Bachelor)

Final Grade: 9

Year of Graduation: 2019

In the course of our research we answered the question: what features of text complexity reflect the level of language proficiency best? As a result, using different machine learning approaches we confirmed the following hypothesis: some metrics of the text are more important than others. All in all, we identified 59 features that could be divided into five groups: lexical, morphological, syntactic, discursive, and L1 interference. We have established the difference between the two essay genres: graph descriptions and opinion essay. Besides, the analyzed text features helped us to understand how the text criteria differ in the beginning and in the ending of the essay. During the research, we intend to answer the following questions: What features are more important when evaluating an essay? What features are more correlated? Do genre features of the text play a role in the essay evaluation? IWas there a significant difference between the beginning and the ending of the essay? What research methods work better for evaluating an essay automatically? Accordingly, we pursue the following purposes: to define the features influencing the assessment most; to develop the method of automatic evaluation of essays; to create an application based on the results of this study. We found what text features are more relevant for the assessment of the essays written in English by Russian students. We analyzed 3440 texts from Russian Error-Annotated English Learner Corpus, for each of which we calculate the values of the text criteria. Then we use the methods of machine learning and statistical analysis to predict the grade that could be received for an essay. The best performance was demonstrated in the random forest classifier model trained on unbalanced data with TF-IDF vectors added as a feature: precision 0.85, recall 0.89, f1-score 0.85. The outcomes of this research helped us to create the tool that would give a reasonable feedback about the level of language proficiency taking into account those text features that are significant for the chosen audience.

Full text (added June 4, 2019)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses