Master
2021/2022
Unstructured Data Analysis
Category 'Best Course for New Knowledge and Skills'
Type:
Elective course (Applied Statistics with Network Analysis)
Area of studies:
Applied Mathematics and Informatics
Delivered by:
International Laboratory for Applied Network Research
When:
1 year, 1, 2 module
Mode of studies:
offline
Open to:
students of all HSE University campuses
Instructors:
Ilia Karpov
Master’s programme:
Applied Statistics with Network Analysis
Language:
English
ECTS credits:
4
Contact hours:
40
Course Syllabus
Abstract
This course focuses on applied methods and existing tools for information retrieval: web scrap-ing, data preprocessing, natural language processing. All methods considered in this course require basic knowledge of discrete mathematics and probabilistic theory . For instance, most NLP and IR methods use conditional probability. In this course, we show the implementation of contemporary approaches in existing software packages (preferably in the python frameworks), and demonstrate how these methods can be used for the solution of some real-world problems.
Learning Objectives
- Show the implementation of contemporary approaches in existing software packages (preferably in the python frameworks), and demonstrate how these methods can be used for the solution of some real-world problems.
Expected Learning Outcomes
- Знать и применять базовые методы обработки и анализа текстов
- Знать этические аспекты обработки текстов
- Уметь решать задачи, связанные с моделированием языка
- Уметь решать специализированные задачи на текстовых данных
Course Contents
- Введение. Статистический анализ текстов
- Векторные модели представления слов
- Классификация текстов
- Классификация последовательностей
- Предобученные языковые модели
- Синтаксический анализ
- Машинный перевод
- Генерация текстов
- Разметка данных, активное обучение.
- Вопросное-ответные системы
- Мультимодальные методы
- Мультиязычные методы
- Обработка текстов в медицине
- Информационный поиск
- Этические вопросы в обработке текстов
Interim Assessment
- 2021/2022 2nd module0.6 * Final exam + 0.4 * Cumulative mark for the work during the modulus
Bibliography
Recommended Core Bibliography
- Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399
Recommended Additional Bibliography
- Shay Cohen. (2019). Bayesian Analysis in Natural Language Processing : Second Edition. San Rafael: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102157