• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Cluster Analysis of Banks by Machine Learning Methods on Open Source Data

Student: Bochkarev Sergei

Supervisor:

Faculty: Faculty of Economic Sciences

Educational Programme: Economics (Bachelor)

Year of Graduation: 2024

The Russian banking sector has faced many obstacles in the last 5 years: the COVID-19 pandemic, the beginning of the SWO, sanctions, and the outflow of specialists. The emerged specific infrastructural, operational and market risks require new models for determining the risk profile of banks. The study aims to develop a method for determining the risk profile of banks using clustering. The method is a complement to traditional risk identification tools. The clustering results provide an alternative approach and interpretation of banks' propensity to specific risks. The CBR can use the model to assess bank risks, identify specific patterns, and conduct intra-cluster stress tests. Data for the study were taken from: banki.ru, Kommersant, Frank RG, using parsing of websites. The novelty of the study lies in the use of several types of data: bank financial indicators and norms, composition of bank portfolios, client reviews, news. The authors used clustering methods previously used in the literature and modern ML approaches. For each area, several methods were applied and parameters were selected. K-means and spectral clustering gave the best results. The obtained clusters were interpreted and tested for statistical significance. Even at the 5% level, at least five clusters were found to be significant. The key novelty of the work is the merging of data types into one dataset. Clustering quality metrics and the b-cubed metric were found to be higher by the merged model, but the results are less interpretable. The results of the simulation revealed that it is possible to divide Russian banks by risk profile, different data sources give similar results, and clustering quality is higher in the merged model. The results resonate with previous studies. For the future development of the work, it is possible to expand the sample of banks, detail the indicators in datasets, and apply advanced neural network methods of clustering.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses