• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Machine Learning for Solving Items Classification Problems in E-com Services

Student: Artem Makarov

Supervisor: Tamara Voznesenskaya

Faculty: Faculty of Computer Science

Educational Programme: Data Science and Business Analytics (Bachelor)

Year of Graduation: 2024

In the rapidly growing e-commerce industry, as in any business, competitor analysis plays an important role, the results of which can influence business decisions. Assortment analytics in conjunction with competitor data provides strong support to the commercial unit by providing recommendations for category management. This work is aimed at creating an algorithm that allows to determine the category of a product by only its name, which allows to denote competitors' products according to a specific category tree, using machine learning classification methods. The study uses real data from one of the largest e-grocery services. Data preprocessing techniques, embedding, and CatBoost were applied to create, train, and evaluate the models. The model was successfully integrated into the company's relevant processes, saving 33.5 man-hours in manually marking up every 10,000 items. In addition, the work also explored methods related to neural networks, such as CNN for example, which increased the category prediction accuracy to 0.97 and identified them as the most effective way to implement the solution.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses