Бакалавриат
2022/2023
Обработка естественного языка
Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Лучший по критерию «Новизна полученных знаний»
Статус:
Курс по выбору (Программная инженерия)
Направление:
09.03.04. Программная инженерия
Когда читается:
4-й курс, 1, 2 модуль
Формат изучения:
без онлайн-курса
Охват аудитории:
для всех кампусов НИУ ВШЭ
Преподаватели:
Бурашников Евгений Павлович
Язык:
английский
Кредиты:
8
Контактные часы:
35
Course Syllabus
Abstract
The course is aimed at mastering the basics of natural language processing (NLP), a dynamic interdisciplinary field. The course covers the methods and approaches used in many real NLP applications such as language modeling, text classification, sentiment analysis, generalization, and machine translation. Students taking the course will not only use some of the existing NLP libraries and software packages, but will also learn about the principles behind their design and about the mathematical models that underlie modern computational linguistics. The course also involves performing practical tasks in Python programming and experimenting with texts written in English and Russian. Prerequisites are programming skills in python, general knowledge of linguistics
Learning Objectives
- Formation of students' theoretical knowledge and practical skills on the basics of machine processing of natural language.
Expected Learning Outcomes
- Apply basic approaches to word embeddings, such as Count-based methods, Word2Vec, Glove
- Apply classic machine learning methods such as Naive Bayes, SVM, LR and deep learning approaches such as FCN, CNN, LSTM for text classification problem
- Applying open-source libraries for text preprocessing, such as Natasha and nltk. Resume the following common problems: Expand Contractions, Lower Case, Remove Punctuations, Remove words and digits containing digits, Remove Stopwords, Rephrase Text, Stemming and Lemmatization, Remove White spaces
- Apply various text-generation techniques such as N-grams LMs and Neural LMs
- Applying the mechanisms of attenuations and transformers to seq2seq problems
- Apply special data preprocessing techniques and architectures like Bert to the NER problem
- Apply a SDA and Semi-SDA for Domain Adaptation problem
- Apply modern architecture Bert
- Apply of the Burt architecture and its modifications to the problem QA
- Apply NDA, NMF and LSA to Topic modeling problem
- Apply various heuristic approaches to improve the quality of text generation
- Apply modern neural network approaches to solve the problems of summarizing news and reviews
Course Contents
- Word embedding
- Text classification
- Text preprocessing methods
- Language Modeling
- Seq2seq models
- Named Entity Recognition
- Domain Adaptation
- Transfer learning
- Question Answering
- Topic Modeling
- Text generation
- Text summarization
- Style transfer
Assessment Elements
- Text classification problem
- Named entity recognition
- Question Answering
- Text summarization
Interim Assessment
- 2022/2023 2nd module0.25 * Text summarization + 0.25 * Named entity recognition + 0.25 * Question Answering + 0.25 * Text classification problem
Bibliography
Recommended Core Bibliography
- Introduction to natural language processing, Eisenstein, J., 2019
- Yu, C., Wang, J., Chen, Y., & Huang, M. (2019). Transfer Learning with Dynamic Adversarial Adaptation Network. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1909.08184
Recommended Additional Bibliography
- Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.