• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
  • HSE University
  • Student Theses
  • Ontology-Controlled Feature Engineering in Machine Learning Task with Applications to Chemistry and Material Science

Ontology-Controlled Feature Engineering in Machine Learning Task with Applications to Chemistry and Material Science

Student: Alexander Glushko

Supervisor: Alexey Neznanov

Faculty: Faculty of Computer Science

Educational Programme: Data Science (Master)

Final Grade: 10

Year of Graduation: 2024

Current development of information technologies has an huge effect on all areas of our lives and forces us to change the established approaches and processes. You can clearly see this changes in areas that are heavily relay on digital products but have lack of knowledge formalization. For example chemistry and material science domains, which historically have problems with tacit knowledge. Usually in material science you can find that major research has been conducted without linking field and computational experiments data to formalized knowledge or even without explicitly specified data schemes. This leads to an increase in the inherent complexity of further analysis of the data from a particular study and the fragmentation of individual studies which completely precludes meta-analysis of the data. In this case, an important factor are the extremely limited resources and <<digital skills>> of the research teams. Fields experiments data in most cases are limited to literally a few experiments, since it is too expensive to perform thousands of syntheses or grow hundreds of different crystals. In such circumstances, well-interpreted machine learning methods with links to well-established knowledge bases is a key to problem solving. Use of knowledge bases makes possible to harmonize data from previously disparate experiments and enable the use of advanced data analysis algorithms. Note that an emphasis on data quality is mandatory in such settings. This paper presents an overview of modern ontologies, both metaontologies and specialized ontologies (chemistry, materials science), approaches to analyzing the results of computational experiments by machine learning methods and describes the methodology of ontology-controlled feature engineering in machine learning. Described methodology have been tested in our joint work with the "MIREA-Russian Technological University" on the research of antimicrobial activity and AI guided synthesis of nanoscale titanium(IV) oxides.

Full text (added May 23, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses