• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Entity Disambiguation to Wikipedia for Languages with Different Corpora Volumes

Student: Nikishina Irina

Supervisor: Anastasiya A. Bonch-Osmolovskaya

Faculty: Faculty of Humanities

Educational Programme: Computational Linguistics (Master)

Year of Graduation: 2019

This paper is devoted to Entity Disambiguation to Wikipedia for languages with different corpora volumes. Disambiguation is one of the most crucial part of successful communication. However, in the field of natural language understanding (especially considering semantic interpretation or disambiguation) there is ample room for improvement. At the same time, Entity Disambiguation is able nowadays to drammatically increase the results on related tasks, including but not limited to Named Entity Classification, Coreference Resolution and Relation Extraction regardless of language and corpora sizes. Unfortuntely, Entity Disambiguation is performed nowadays for high-resource languages mostly. Therefore, the aim of the study is to develop a language-independent neural network approach to Entity Disambiguation task. We attain this by linking entity mentions in the text with the appropriate entity in the knowledge-base compiled from Wikipedia. We also compare different approaches for learning distributed representations for tokens. Additionally, this study proves the importance of enriching joint embeddings with information about knowledge base structure.

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses