HSE Researchers Develop Novel Approach to Evaluating AI Applications in Education
Researchers at HSE University have proposed a novel approach to assessing AI's competency in educational settings. The approach is grounded in psychometric principles and has been empirically tested using the GPT-4 model. This marks the first step in evaluating the true readiness of generative models to serve as assistants for teachers or students. The results have been published in arXiv.
Each year, artificial intelligence plays a progressively larger role in education, prompting developers to address crucial questions about how to assess AI's capabilities, particularly in the context of its role in teaching and learning. Researchers at HSE University have introduced a novel psychometrics-based approach to creating effective benchmarks for evaluating the professional competencies of large language models (LLM), such as GPT. The approach is based on Bloom's taxonomy, which, despite the availability of numerous benchmarks (tests for language models), is not widely used specifically for result verification.
A distinctive feature of the proposed methodology is its comparison of tasks across different levels of complexity—ranging from basic (knowledge) to advanced (application of knowledge) and addressing these varying levels in task evaluation. This is essential for assessing the quality of the model's recommendations across diverse situations and determining the extent to which it can be trusted in the educational context. As part of the study, the researchers developed and tested over 3,900 unique assignments, categorised into 16 content areas, including teaching methods, educational psychology, and classroom management. The experiment was conducted using the Russian language version of the GPT-4 model.
Elena Kardanova
'We have developed a new approach that goes beyond conventional testing,' explains Elena Kardanova, lead author of the project and Academic Supervisor at the Centre for Psychometrics and Measurement in Education of the HSE Institute of Education. Our approach is demonstrated through a comprehensive new benchmark—which is the term for language model tests—designed for AI in pedagogy. This benchmark is grounded in psychometric principles and emphasises key competencies essential for teaching.
Today's AI models, such as ChatGPT, possess an impressive ability to process and generate text quickly, making them potential assistants in educational settings. However, our results indicate that the model struggles with more complex tasks that require a deeper understanding and the ability to think adaptively. For example, AI excels at retrieving known facts but demonstrates lower proficiency in applying this information to address real-world pedagogical challenges. In particular, ChatGPT is not always successful in solving theoretical problems, which can sometimes appear basic even to average students.
Yaroslav Kuzminov
'The approach we have developed clearly highlights a key issue with AI today: you never know where to expect an error to occur. A model can make mistakes even in the simplest tasks, which are considered the core of an academic discipline. Our test highlights key issues both in the area of knowledge and in the application of that knowledge, thereby paving the way to address these core challenges. Addressing these issues is crucial if we want to rely on such models as assistants for teachers, and even more so for students. An assistant that requires everything to be rechecked—which is currently the case—is unlikely to inspire a desire to use it,' according to Yaroslav Kuzminov, Academic Supervisor of HSE University.
Among the potential scenarios for AI use in education, scientists worldwide cite assisting teachers in creating educational materials, automating the assessment of student responses, developing adaptive curricula, and quickly generating analytics on student academic performance. According to the authors, AI can be a powerful tool for teachers, especially in the face of increasing workloads. However, there is still a need to improve the models and approaches used for their training and evaluation.
Taras Pashchenko
'The test we conducted helped us understand not only—and not so much—how to train large generative models, but also why concerns about teachers being replaced with artificial intelligence are, at the very least, premature. Indeed, it is impossible to overlook the breakthrough of generative models serving as teacher assistants: they can already attempt to develop curricula, compile reading lists for lessons, and, in some cases, grade assignments. Nevertheless, we still encounter the model's hallucinations, where it invents answers to questions when it lacks information about a phenomenon, or misunderstands the context. In general, if we want tools based on generative models to be used in pedagogical practice and earn epistemic trust, there is still much work to be done,' according to Taras Pashchenko, Head of the HSE Laboratory for Curriculum Design, who shares his perspective on the test results.
In the future, the research team plans to continue finalising the benchmark by incorporating more complex tasks that can assess AI abilities such as information analysis and evaluation.
Ekaterina Kruchinskaya
'Our upcoming papers will focus on both introducing new types of benchmarks and discussing academic techniques. Such techniques will be developed to further train models and mitigate the risks of hallucinations, loss of context, and errors in core knowledge. The main goal we aim to achieve is to ensure models are stable in their knowledge and to develop methods for testing this stability with even greater accuracy. Otherwise, they will remain merely tools that facilitate copying and imitation of knowledge,' notes Ekaterina Kruchinskaya, Senior Lecturer at the HSE Department of Higher Mathematics.
See also:
‘Philosophy Is Thinking Outside the Box’
In October 2024, Louis Vervoort, Associate Professor at the School of Philosophy and Cultural Studies of the Faculty of Humanities presented his report ‘Gettier's Problem and Quine's Epistemic Holism: A Unified Account’ at the Formal Philosophy seminar, which covered one of the basic problems of contemporary epistemology. What are the limitations of physics as a science? What are the dangers of AI? How to survive the Russian cold? Louis Vervoort discussed these and many other questions in his interview with the HSE News Service.
HSE Scientists Propose AI-Driven Solutions for Medical Applications
Artificial intelligence will not replace medical professionals but can serve as an excellent assistant to them. Healthcare requires advanced technologies capable of rapidly analysing and monitoring patients' conditions. HSE scientists have integrated AI in preoperative planning and postoperative outcome evaluation for spinal surgery and developed an automated intelligent system to assess the biomechanics of the arms and legs.
HSE University and Sber Researchers to Make AI More Empathetic
Researchers at the HSE AI Research Centre and Sber AI Lab have developed a special system that, using large language models, will make artificial intelligence (AI) more emotional when communicating with a person. Multi-agent models, which are gaining popularity, will be engaged in the synthesis of AI emotions. The article on this conducted research was published as part of the International Joint Conference on Artificial Intelligence (IJCAI) 2024.
Neural Network for Assessing English Language Proficiency Developed at HSE University
The AI Lingua Neural Network has been collaboratively developed by the HSE University’s AI Research Centre, School of Foreign Languages, and online campus. The model has been trained on thousands of expert assessments of both oral and written texts. The system evaluates an individual's ability to communicate in English verbally and in writing.
HSE University and Yandex to Host International AI Olympiad for Students
The HSE Faculty of Computer Science and Yandex Education are launching their first joint AI competition, Artificial Intelligence and Data Analysis Olympiad (AIDAO), for students from around the world. Participants will tackle challenging tasks in science and industry and interact with experts from HSE and Yandex. The winners will receive cash prizes.
Winners of the International Olympiad in Artificial Intelligence Admitted to HSE University
In mid-August, Bulgaria hosted the finals of the first International Olympiad in Artificial Intelligence (IOAI) among high school students. The Russian team demonstrated excellent results, winning gold medals in the scientific round, silver medals in the practical round, and coming first in both rounds overall. This year two members of the Russian team were accepted into the programmes of the HSE Faculty of Computer Science.
Artificial and Augmented Intelligence: Connecting Business, Education and Science
The history of AI research in Nizhny Novgorod dates back to the 1960s and 1970s. Today, AI technologies, from voice assistants and smart home systems to digital twin creation and genome sequencing, are revolutionising our life. Natalia Aseeva, Dean of the Faculty of Informatics, Mathematics and Computer Science at HSE Campus in Nizhny Novgorod, discusses how the advancement of AI connects science, business, and education.
HSE University Leads the AI Alliance Ranking
The AI Alliance Russia has released a new ranking of Russian universities based on the quality of education in the field of AI. Similar to last year, HSE University has joined the leaders in A+ group alongside MIPT and ITMO. A total of 207 universities from 69 Russian regions participated in the ranking. In 2024, over 35,000 students were enrolled in AI-related programmes at these universities.
Reinforcement Learning Enhances Performance of Generative Flow Networks
Scientists at the AI Research Centre and the AI and Digital Science Institute of the HSE Faculty of Computer Science applied classical reinforcement learning algorithms to train generative flow networks (GFlowNets). This enabled significant performance improvements in GFlowNets, which have been employed for three years in tackling the most complex scientific challenges at modelling, hypothesis generation, and experimental design stages. The results of their work achieved a top 5% ranking among publications at the International Conference on Artificial Intelligence and Statistics AISTATS, held on May 2-4, 2024, in Valencia, Spain.
‘I Came Up with the Idea to Create an Application Useful for Practicing Physicians’
Dmitry Ryabtsev, a 2024 graduate of the master's programme at the HSE Faculty of Computer Science, created an AI-powered software service for ophthalmology during his two years of study. This product is now entering the market, and its developer plans to participate in establishing a working group on software engineering for medical applications at the HSE Faculty of Computer Science, with the goal of promoting more genuinely useful domestic projects. In an interview with HSE News Service, Dr Ryabtsev shared his story of how a professional doctor turned into a programmer.