• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Efficient Neural Network Algorithms for Multimodal Emotion Recognition

Student: Vo Ngoc bich uyen

Supervisor: Andrey Savchenko

Faculty: Faculty of Computer Science

Educational Programme: Math of Machine Learning (Master)

Final Grade: 7

Year of Graduation: 2024

This thesis examines multimodal emotion recognition (MER) and proposes an efficient neural network ensemble that uses facial expressions and speech patterns to improve accuracy in real-world scenarios. Motivated by the limitations of unimodal approaches, we explore the fusion of visual and audio information, a strategy widely recognized for its potential to capture nuanced and complex emotional states. A comprehensive literature review reveals the increasing significance of deep learning techniques for multimodal fusion and temporal modeling. Our proposed model includes an ensemble pre-trained neural networks for feature extraction from video, then early fusion with Transformer encoders or multilayer perceptron (MLP). To ensure robustness to varying frame rates and computational efficiency, we implement smoothing and an adaptive frame rate technique based on multiple testing corrections. This allows us to dynamically adjust the level of detail in our analysis, striking a balance between accuracy and computational cost. We conduct experiments on the Aff-Wild2 dataset, a large-scale dataset of in-the-wild videos exhibiting spontaneous emotions. Results show that our proposed model, specifically our multimodal ensemble with MLP fusion, outperforms several existing methods, particularly in expression classification. We also demonstrate the effectiveness of smoothing techniques, like mean and median filtering, in improving the stability and accuracy of frame-level emotion predictions. The findings of this thesis contribute to the advancement of emotion recognition technology by demonstrating the power of multimodal fusion, temporal modeling, smoothing and adaptive frame rate techniques. The developed model holds significant potential for applications in human-computer interaction, mental health assessment, and other domains where understanding emotional states is crucial.

Full text (added May 30, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses