• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Multi-document Summarization Techniques for News Summaries

Student: Egorov Anton

Supervisor: Dmitry Ilvovsky

Faculty: Faculty of Computer Science

Educational Programme: Master of Data Science (Master)

Final Grade: 8

Year of Graduation: 2024

Automated text summarization has various commercial and educational applications. This research is focused on multi-document summarization of news reports – a task, in which multiple reports on the same event need to be summarized in one concise and coherent text. We discussed major challenges of this task and made a review of main techniques used in this field. We have also proposed a hybrid model, in which we combine extractive and abstractive techniques to tackle redundancy and keep generated summaries diverse. We made experiments with a number of popular machine learning models and compared their performance on two datasets, consisting of news reports and their human-written summaries. Our findings suggest that general NLP models do not always do well with multi-document summarization and it is important to develop techniques that address problems, specific to this task.

Full text (added June 4, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses