Master
2022/2023
Natural Language Processing
Type:
Elective course
Area of studies:
Applied Mathematics and Informatics
Delivered by:
Big Data and Information Retrieval School
Where:
Faculty of Computer Science
When:
2 year, 2 module
Mode of studies:
distance learning
Online hours:
82
Open to:
students of one campus
Instructors:
Maria Tikhonova
Master’s programme:
Магистр по наукам о данных (заочная)
Language:
English
ECTS credits:
4
Contact hours:
8
Course Syllabus
Abstract
Natural language processing (NLP) is an important field of computer science, artificial intelligence and linguistics aimed at developing systems that are able to understand and generate natural language at the human level. Modern NLP systems are predominantly based on machine learning (ML) and deep learning (DL) algorithms, and have demonstrated impressive results in a wide range of NLP tasks such as summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic modeling. We interact with such systems and use products involving NLP on a daily basis which makes it exciting to learn how these systems work. This course covers the main topics in NLP, ranging from text preprocessing techniques to state-of-the-art neural architectures. We hope to facilitate interest in the field by combining the theoretical basis with the practical applications of the material.
Learning Objectives
- The learning objective is to acquire knowledge on the classical and advanced approaches to NLP including the use of linguistic tools and the development of NLP systems.
Expected Learning Outcomes
- Known basic NLP tasks
- Be able to prepare text
- Be able to solve simple classification task
- Learn inner workings of the count-based vector representation models, including their advantages and disadvantages
- Known how to compute the probability distribution with the help of the softmax function that is commonly used in neural-based embedding models
- Learn technical details about the word2vec and fastText models
- Learn the details about the extrinsic evaluation of the word embedding models and be able to distinguish it from the intrinsic methods
- Learn similarity measures that are most commonly used with respect to vector representations
- Learn the concept of language modeling and solidify the knowledge about tasks that can be solved with the help of language models
- Learn inner workings of the count-based language language models and smoothing methods
- Learn how to calculate the number of model parameters in a simple manner
- Learn how to compute the probability of the sequence given a set of the model hypotheses
- Solidify the knowledge about the greedy search decoding method and the details about the special tokens
- Solidify the knowledge about the named entity recognition task, specifically the most commonly used IOB-tagging scheme and the task evaluation metrics
- Understand attention mechanism
- Be able to compute attention score and differ attention functions
- Be able to apply and estimate different decoder techniques
- Be able to conclude which limitations have an encoder-decoder architecture
- Understand the BERT architecture and usage
- Understand the architecture of ELMO models
- Understand the GPTs architecture
- Compare different pre-trained models and know how they differ from each other
- Be able to evaluate pre-trained models and know different techniques for pre-trained models compression
- Be able to solve simple question-answering tasks
- Know the particularities of different QA tasks
Course Contents
- Text preprocessing and text classification
- Embeddings
- Language Modelling and Sequence Tagging
- Machine Translation and Transformers
- Sesame Street. Transfer Learning
- Question Answering and Chat-bots
Interim Assessment
- 2022/2023 2nd module0.5 * Programming Assignments + 0.2 * Final Project + 0.3 * Quizzes
Bibliography
Recommended Core Bibliography
- Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
- Yang Liu, & Meng Zhang. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, (1), 193. https://doi.org/10.1162/COLI_r_00312
Recommended Additional Bibliography
- Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399