• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
  • HSE University
  • Student Theses
  • Application of Machine Learning Methods for Determining Optimal Tobacco Sales Points for Small and Medium-sized Enterprises

Application of Machine Learning Methods for Determining Optimal Tobacco Sales Points for Small and Medium-sized Enterprises

Student: Dement`ev Ivan

Supervisor: Margarita Burova

Faculty: Faculty of Computer Science

Educational Programme: Master of Data Science (Master)

Year of Graduation: 2024

This thesis presents a step-by-step process for creating a ready-to-use service for predicting sales at new tobacco retail points in Moscow. The service is based on machine learning methods applied to data from the "Honest Sign" state marking system. The development of the service includes four key stages: collection and analysis of marking data, parsing of necessary characteristics and directories, training of machine learning models, and launching of a Telegram bot. The first stage, data collection and analysis, involves connecting to the marking system databases, extracting data (over 7 billion records), statistical and expert analysis of these data, identifying dependencies and anomalies, and aggregating data to optimize machine learning processes. The second stage, parsing characteristics, is performed through open services and includes the analysis of geodata, such as the number of buildings of various types around the sales point, distances to subway stations and competitors. This stage is the most complex and resource-intensive, therefore this work is limited to one city – Moscow, and one format – tobacco shop. The third stage, training of machine learning models, involves the application of various methods, such as linear regression, decision trees, and model ensembles (Random Forest, Gradient Boosting, XGBoost). The main metric of success is MAPE, which amounted to 0.46, marking the best result among existing analogs. The final stage, the development of the final interface, is implemented as a Telegram bot, chosen for its popularity among users, ease of development, and user-friendliness. This approach demonstrates how modern machine learning technologies can be effectively applied to solve practical problems in the retail sector. Thus, the work presented illustrates how modern machine learning technologies can be efficiently used to address current commercial challenges in retail. Looking ahead, there are plans to expand the service to other regions and types of retail points, including chain retail and grocery stores, as well as liquor markets.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses