• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Jobs From Headhunter Portal

Student: Bashkirov Egor

Supervisor: Elena Kantonistova

Faculty: Faculty of Computer Science

Educational Programme: Machine Learning and Data-Intensive Systems (Master)

Final Grade: 7

Year of Graduation: 2024

This thesis is devoted to the study of such a task, as forecasting wages according to the job description. The main purpose of the work is to explore the possibilities of a number of methods and approaches from the field of data analysis and machine learning in application to solving this problem. The work implemented a system for collecting and preparing data for analysis, parsers were written to collect additional geographical features, to clear job descriptions from HTML tags. An exploratory analysis of the data was carried out, the relationship of the features with the target variable was investigated, the most informative features were highlighted, such as the name of the region, professional role, currency in which wages are indicated, work experience, type of employment, pre-tax or after income were indicated. Quality metrics were selected to evaluate the models - MAE and MAPE were used. A number of models were trained on the processed data, data enrichment and processing were performed using methods such as geocoding, TF-IDF conversion, and embedding extraction from the rubert-tiny model. Models such as random forest and Catboost were studied. For each trained model, the quality of its work was evaluated. The results of experiments with model training were tabulated, and the best approach among the subjects was identified. The best quality was achieved using an approach in which a Catboost model was built on a set of features and a job description, for which suitable hyperparameters were selected using brute force.

Full text (added June 3, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses