'GameFuse Serves as a Scientific Testing Ground for Exploring Non-Trivial Technological Approaches'
Students and staff of the ‘Game Engineering and Interactive Systems’ workshop at MIEM HSE have been exploring approaches to processing multitext data, ie objects represented by a collection of texts with associated meta-information. The team produced the GameFuse dataset, which the project leader Feodor Zakharov presented it at the IEMTRONICS 2024 conference; the paper was awarded the 2024 Reviewers’ Choice Best Paper Certificate.
IEMTRONICS 2024 is the annual International IOT, Electronics and Mechatronics Conference hosted by Imperial College London, UK.
The GameFuse dataset was assembled as part of a study focusing on video game data to analyse methods of processing objects described in multiple texts. The research involved fourth-year students at HSE MIEM, including Feodor Zakharov, who led the research, Ruslan Molokanov and Egor Litvinenko (Bachelor's Programme 'Applied Mathematics'), Karina Malyshkina (Bachelor's Programme 'Information Science and Computation Technology'), and Ilya Semichasnov, Head of the Game Engineering and Interactive Systems Workshop. Professor Alexander Belov is the academic supervisor of the project.
Feodor Zakharov
'We sought a method to process objects represented by multiple texts. In fact, any object can typically be described by a series of texts. For example, computer games can be represented through feedback from gamers, reviews from journalists, and descriptions from developers,' explained Fyodor Zakharov when sharing the details of the team's objective. 'These texts are essentially different, but modern processing methods do not address their specifics at all. Given that existing solutions enable effective analysis of individual texts or objects represented by various data types, such as images and text, we started thinking of ways to integrate and apply these approaches to our stated objective.'
Since a review of existing literature revealed both an absence of papers focusing on multitext modality and a lack of publicly available data for analysis, the team decided to initiate their own data collection process and developed the GameFuse dataset that meets the following conditions:
- comprises a varied number of texts describing individual objects;
- enables the selection of multiple clusters with similar properties from a diverse range of texts;
- metainformation is attached to each text.
Multitext data can be used to describe numerous aspects of an individual's or society's life. The authors of the paper chose to focus their research on data related to video games, given the extensive publicly available information on this topic that aligns with the previously described conditions, which greatly facilitates the task of data collection and processing.
The dataset includes information on 13,117 games. Each game is described by two types of texts—player comments and critic reviews. Player comments consist of unstructured, brief texts that convey players' emotions rather than objective experience. Reviews, on the other hand, are comprehensive subjective texts that reflect both positive and negative aspects of the game. Altogether, approximately one million comments and 70,000 reviews were collected.
Throughout the study, three neural network models were developed using multitext data processing methods, featuring architectures that are unparalleled to date.
‘The novelty lies in our attempt to emulate the process of human text comprehension while reading. We read one text at a time, retaining in memory what we have previously read. The brain is designed in a manner where each text we have read is perceived as a coherent and meaningful statement. While collecting information, we specifically considered the essential aspects of various types of texts,' according to Feodor Zakharov. 'For instance, when analysing reviews in a specialised magazine, we would focus on comprehensive descriptions of the subject matter (information), whereas when examining online user comments, our primary focus was on tracking the emotions conveyed by the author.'
For the sake of comparison, two models exclusively using classical NLP methods were additionally trained. Each model is based on BERT (Bidirectional Encoder Representations from Transformers) technology, which is a neural network method designed for extracting information from text.
Two experiments were conducted as part of the study. In the first experiment, the researchers’ task was to predict the presence of specific game-related tags, such as ‘sports,’ ‘shooter,’ and others. In the second experiment, aimed at predicting the popularity of a game in the market, commercial data about the game, including lower and upper estimates of the number of purchases in the service, were used as target features for the prediction.
The results from both series of experiments confirmed the following assumptions:
the proposed processing method outperforms classical NLP methods in tasks related to obtaining a representation of a multitext object;
separate processing of different types of texts improves the prediction results;
adding meta-information about texts can enhance predictions as long as this information is relevant to the predictions being made.
Ilya Semichasnov, Head of the workshop ‘Game Engineering and Interactive Systems’ at HSE University
'The workshop employs AI technology in many of its projects. What the team is working on is incredible: merging sophisticated research in machine learning with the development of gaming software. GameFuse stands as a flagship project in this area and has sparked interest among investors and accelerators, while also serving as an excellent scientific testing ground for exploring non-trivial technological approaches. I am sincerely proud of the project and confident that it is not the team's final achievement in AI.'
According to Feodor Zakharov, throughout his studies at HSE University, he aspired to tackle a genuine scientific task; therefore, when the opportunity arose, he dedicated himself to it. 'The main aspect that struck me during my work was the abundance of freedom at all stages of research—from formulating hypotheses and researching scientific literature to conducting experiments and analysing their results,' he notes. ‘It was truly enjoyable to fully leverage this freedom by experimenting with various solutions!'
Alexander Belov
'It is no secret that computer game development, as well as the games themselves, are attracting increasing attention from the IT community today. Unfortunately, there are very few specialised scientific conferences in the field of game engineering,' according to Alexander Belov, Professor, Head of the School of Applied Mathematics at HSE University. ‘Therefore, when research conducted as part of the project of applying AI methods to analysing textual information shared by users of computer games produced results, the decision was made to participate in the annual international conference IEMTRONICS 2024. Considerable time and effort were invested in preparing the paper, but they were well worth it.'