• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Software Suite for Streaming Data Backfill Using YT Distributed Computing System

Student: Akhundov Aleksei

Supervisor: Nikolay Pavlochev

Faculty: Faculty of Computer Science

Educational Programme: Software Engineering (Bachelor)

Year of Graduation: 2024

This work focuses on developing a solution for data recalculation within the context of stream processing, aimed at facilitating the scalability of the process and minimizing the workload on data engineers during its preparation. In today’s digital world, stream processing is actively used for analytics and real-time decision-making. Apache Flink, a well-known data streaming engine and framework, is widely employed by data engineers to create and execute data transformations. However, during the operation of these transformations, the need often arises for their recalculation, owing to errors or new requirements. Such recalculation process, known as backfill, requires significantly more computing capabilities compared to regular transformations due to the increased volume of data. In this work, a new approach to streaming data backfill is introduced that uses batch processing, natively executing the user’s Flink transformation, while also partly utilizing the MapReduce model on the distributed computing system YT. The approach is scalable and requires minimal additional coding, reducing the workload for data engineers and minimizing the potential for errors. The work contains: 45 pages, 3 chapters, 10 figures, 5 tables, 24 sources and 3 applications. Keywords: distributed computing systems, data processing, data streaming, MapReduce, real-time data processing

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses