• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Extending the Ability to Diagnose Data Missingness Mechanism through Machine Learning

Student: Ivanovskaya Liliya

Supervisor: Alexey Rotmistrov

Faculty: Faculty of Social Sciences

Educational Programme: Sociology (Bachelor)

Year of Graduation: 2024

In sociological research, there are often gaps in data due to various reasons, for example, the refusal of respondents to answer a particular question. When dealing with incomplete data, there is a need to determine whether omissions can be avoided without changing data structure or whether data imputation is needed. Using missing data mechanisms allows for researchers to realize which missing data mechanism they deal with: MCAR, MAR or MNAR, and it is possible to ignore missing data in a particular variable when analyzing. Current methods of diagnosing have limitations in distinguishing between random and non-random omissions. The purpose of this study is to examine the use of machine learning techniques to enhance the capabilities of diagnosing missing data mechanisms. By training several machine learning models on the generated subsamples with gaps, the confusion matrix and quality metrics such as accuracy for models were calculated. In the result, the HistGragientBoostingClassifier showed the ability to correctly classify 57% of the sample observations, in particular, it identified the MNAR well. This allowed us to conclude that machine learning methods are applicable both for diagnosing mechanisms of missing data in general and for separating non-random missing data from random ones, and to highlight the potential of work in this direction to improve the quality of diagnostics. 

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses