Nov

2024

HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education

Researchers at HSE University have proposed a novel approach to assessing AI's competency in educational settings. The approach is grounded in psychometric principles and has been empirically tested using the GPT-4 model. This marks the first step in evaluating the true readiness of generative models to serve as assistants for teachers or students. The results have been published in arXiv.

Each year, artificial intelligence plays a progressively larger role in education, prompting developers to address crucial questions about how to assess AI's capabilities, particularly in the context of its role in teaching and learning. Researchers at HSE University have introduced a novel psychometrics-based approach to creating effective benchmarks for evaluating the professional competencies of large language models (LLM), such as GPT. The approach is based on Bloom's taxonomy, which, despite the availability of numerous benchmarks (tests for language models), is not widely used specifically for result verification.

A distinctive feature of the proposed methodology is its comparison of tasks across different levels of complexity—ranging from basic (knowledge) to advanced (application of knowledge) and addressing these varying levels in task evaluation. This is essential for assessing the quality of the model's recommendations across diverse situations and determining the extent to which it can be trusted in the educational context. As part of the study, the researchers developed and tested over 3,900 unique assignments, categorised into 16 content areas, including teaching methods, educational psychology, and classroom management. The experiment was conducted using the Russian language version of the GPT-4 model.

Elena Kardanova

'We have developed a new approach that goes beyond conventional testing,' explains Elena Kardanova, lead author of the project and Academic Supervisor at the Centre for Psychometrics and Measurement in Education of the HSE Institute of Education. Our approach is demonstrated through a comprehensive new benchmark—which is the term for language model tests—designed for AI in pedagogy. This benchmark is grounded in psychometric principles and emphasises key competencies essential for teaching.

Today's AI models, such as ChatGPT, possess an impressive ability to process and generate text quickly, making them potential assistants in educational settings. However, our results indicate that the model struggles with more complex tasks that require a deeper understanding and the ability to think adaptively. For example, AI excels at retrieving known facts but demonstrates lower proficiency in applying this information to address real-world pedagogical challenges. In particular, ChatGPT is not always successful in solving theoretical problems, which can sometimes appear basic even to average students.

Yaroslav Kuzminov

'The approach we have developed clearly highlights a key issue with AI today: you never know where to expect an error to occur. A model can make mistakes even in the simplest tasks, which are considered the core of an academic discipline. Our test highlights key issues both in the area of knowledge and in the application of that knowledge, thereby paving the way to address these core challenges. Addressing these issues is crucial if we want to rely on such models as assistants for teachers, and even more so for students. An assistant that requires everything to be rechecked—which is currently the case—is unlikely to inspire a desire to use it,' according to Yaroslav Kuzminov, Academic Supervisor of HSE University.

Among the potential scenarios for AI use in education, scientists worldwide cite assisting teachers in creating educational materials, automating the assessment of student responses, developing adaptive curricula, and quickly generating analytics on student academic performance. According to the authors, AI can be a powerful tool for teachers, especially in the face of increasing workloads. However, there is still a need to improve the models and approaches used for their training and evaluation.

Taras Pashchenko

'The test we conducted helped us understand not only—and not so much—how to train large generative models, but also why concerns about teachers being replaced with artificial intelligence are, at the very least, premature. Indeed, it is impossible to overlook the breakthrough of generative models serving as teacher assistants: they can already attempt to develop curricula, compile reading lists for lessons, and, in some cases, grade assignments. Nevertheless, we still encounter the model's hallucinations, where it invents answers to questions when it lacks information about a phenomenon, or misunderstands the context. In general, if we want tools based on generative models to be used in pedagogical practice and earn epistemic trust, there is still much work to be done,' according to Taras Pashchenko, Head of the HSE Laboratory for Curriculum Design, who shares his perspective on the test results.

In the future, the research team plans to continue finalising the benchmark by incorporating more complex tasks that can assess AI abilities such as information analysis and evaluation.

Ekaterina Kruchinskaya

'Our upcoming papers will focus on both introducing new types of benchmarks and discussing academic techniques. Such techniques will be developed to further train models and mitigate the risks of hallucinations, loss of context, and errors in core knowledge. The main goal we aim to achieve is to ensure models are stable in their knowledge and to develop methods for testing this stability with even greater accuracy. Otherwise, they will remain merely tools that facilitate copying and imitation of knowledge,' notes Ekaterina Kruchinskaya, Senior Lecturer at the HSE Department of Higher Mathematics.

Date

21 November 2024

Topics

Research & Expertise

Keywords

publications artificial intelligence

About

Centre for Psychometrics and Measurement in Education, Institute of Education, Laboratory for Curriculum Design

About persons

Elena Kardanova

Ekaterina Kruchinskaia

Yaroslav Kuzminov

Taras Pashchenko

AI Predicts Behaviour of Quantum Systems

Scientists from HSE University, in collaboration with researchers from the University of Southern California, have developed an algorithm that rapidly and accurately predicts the behaviour of quantum systems, from quantum computers to solar panels. This methodology enabled the simulation of processes in the MoS₂ semiconductor and revealed that the movement of charged particles is influenced not only by the number of defects but also by their location. These defects can either slow down or accelerate charge transport, leading to effects that were previously difficult to account for with standard methods. The study has been published in Proceedings of the National Academy of Sciences (PNAS).

14 May

May

2025

‘Services Must Be Flexible’: How Governments Can Use Artificial Intelligence

The HSE International Laboratory for Digital Transformation in Public Administration held a roundtable titled ‘Artificial Intelligence in Public Administration: Current Trends.’ Scholars from Israel, China, and Russia discussed which public services AI can enhance and what key factors must be considered when adopting new technologies.

7 May

May

2025

Artificial Intelligence Improves Risk Prediction of Complex Diseases

Neural network models developed at the HSE AI Research Centre have significantly improved the prediction of risks for obesity, type 1 diabetes, psoriasis, and other complex diseases. A joint study with Genotek Ltd showed that deep learning algorithms outperform traditional methods, particularly in cases involving complex gene interactions (epistasis). The findings have been published in Frontiers in Medicine.

6 May

Apr

2025

Artificial Intelligence as a Catalyst for Sustainable Development

Artificial intelligence is transforming every aspect of life, expanding both our capabilities and our boundaries. At the same time, it presents new challenges for humanity, including concerns about safety, ethics, and environmental sustainability. Today, each neural network leaves a significant carbon footprint. However, with responsible management, AI has the potential to benefit the planet and become a cornerstone of a sustainable future economy. Panos Pardalos, Academic Supervisor of the Laboratory of Algorithms and Technologies for Network Analysis at the HSE Campus in Nizhny Novgorod, emphasised this point as he addressed the XXV Yasin (April) International Academic Conference on Economic and Social Development.

29 April

Apr

2025

HSE Develops Its Own MLOps Platform

HSE researchers have developed an MLOps platform called SmartMLOps. It has been created for artificial intelligence researchers who wish to transform their invention into a fully-fledged service. In the future, the platform may host AI assistants to simplify educational processes, provide medical support, offer consultations, and solve a wide range of other tasks. Creators of AI technologies will be able to obtain a ready-to-use service within just a few hours. Utilising HSE’s supercomputer, the service can be launched in just a few clicks.

23 April

Mar

2025

‘HSE’s Industry Ties Are Invaluable’

Pan Zhengwu has spent the last seven years at HSE University—first as a student of the Bachelor’s in Software Engineering and now in the Master’s in System and Software Engineering at the Faculty of Computer Science. In addition to his busy academic schedule, he works as a mobile software engineer at Yandex and is an avid urban photographer. In his interview with the HSE News Service, Zhengwu talks about the challenges he faced when he first moved to Russia, shares his thoughts on ‘collaborating’ with AI, and reveals one of his top spots for taking photos in Moscow.

28 March

Mar

2025

Scientists Present New Solution to Imbalanced Learning Problem

Specialists at the HSE Faculty of Computer Science and Sber AI Lab have developed a geometric oversampling technique known as Simplicial SMOTE. Tests on various datasets have shown that it significantly improves classification performance. This technique is particularly valuable in scenarios where rare cases are crucial, such as fraud detection or the diagnosis of rare diseases. The study's results are available on ArXiv.org, an open-access archive, and will be presented at the International Conference on Knowledge Discovery and Data Mining (KDD) in summer 2025 in Toronto, Canada.

27 March

Mar

2025

Megascience, AI, and Supercomputers: HSE Expands Cooperation with JINR

Experts in computer technology from HSE University and the Joint Institute for Nuclear Research (JINR) discussed collaboration and joint projects at a meeting held at the Meshcheryakov Laboratory of Information Technologies (MLIT). HSE University was represented by Lev Shchur, Head of the Laboratory for Computational Physics at the HSE Tikhonov Moscow Institute of Electronics and Mathematics (HSE MIEM), as well as Denis Derkach and Fedor Ratnikov from the Laboratory of Methods for Big Data Analysis at the HSE Faculty of Computer Science.

3 March

Feb

2025

AI vs AI: Scientists Develop Neural Networks to Detect Generated Text Insertions

A research team, including Alexander Shirnin from HSE University, has developed two models designed to detect AI-generated insertions in scientific texts. The AIpom system integrates two types of models: a decoder and an encoder. The Papilusion system is designed to detect modifications through synonyms and summarisation by neural networks, using one type of models: encoders. In the future, these models will assist in verifying the originality and credibility of scientific publications. Articles describing the Papilusion and AIpom systems have been published in the ACL Anthology Digital Archive.

27 February

Feb

2025

HSE Researchers Develop Python Library for Analysing Eye Movements

A research team at HSE University has developed EyeFeatures, a Python library for analysing and modelling eye movement data. This tool is designed to simplify the work of scientists and developers by enabling them to efficiently process complex data and create predictive models.

19 February