• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Service for Extracting Data from Structured Documents. Client Part.

Student: Kim Makar

Supervisor: Hadi Saleh

Faculty: Faculty of Computer Science

Educational Programme: Software Engineering (Bachelor)

Year of Graduation: 2024

Document management is an essential and time-consuming process, used widely by both businesses and individuals. This work focuses on solving the problem of document management digitization. It examines the client part of the system for extracting data from structured documents in various formats using object character recognition technology (OCR). The subject area of this system, its architecture, and the techniques used are all discussed. The entire system consists of three components: the client part, the server part, and the recognition module. The client part considered in this work is a web application, the key components of which are: a document markup module, which as a result transmits data to the server side of the system; a role system that provides configuration of user access to data in the system; user authentication and authorization; administration tools that allow you to manage users and monitor the state of the system in in general; analytics and monitoring tools that allow you to monitor user data and actions. The web application interacts with the backend via the Internet. It is expected that the developed web application will simplify the workflow as a whole and reduce the risk of errors associated with manual document processing. This work contains 40 pages, 3 chapters, 27 images, 22 sources and 6 applications. Keywords: data extraction, document labeling, document workflow digitization, optical character recognition (OCR), administration and monitoring.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses