Аспирантура
2020/2021
Исследовательские задачи в обработке естественного языка
Статус:
Курс по выбору
Направление:
09.06.01. Информатика и вычислительная техника
Где читается:
Факультет компьютерных наук
Когда читается:
2-й курс, 1 семестр
Формат изучения:
без онлайн-курса
Преподаватели:
Браславский Павел Исаакович
Язык:
английский
Кредиты:
5
Контактные часы:
40
Course Syllabus
Abstract
This course comprises recent advances in natural language processing. During late 2010s a paradigm shift in NLP happened due to increasing power of deep learning. We discuss neural approaches to morphological and syntax parsing. Such applications as question answering and machine translations are introduced along with neural networks used for the tasks. Transfer learning techniques, including language model pre-training and domain adaption, are presented.
Learning Objectives
- The learning objective of the course “Research Problems in Natural Language Proceesing” is to provide students advanced techniques and deeper theoretical and practical knowledge in modern NLP tasks, such as: • distributional semantics; • topic modelling; • sequence labelling; • structured learning; • text classification and clustering; • unsupervised information extraction.
Expected Learning Outcomes
- Knowledge about such models as word embeddings, Latent Dirichlet Allocation, conditional random fields, structured SVM, convolutional neural networks, recurrent neural networks, POS-tagging and syntax parsing
- Knowledge about ongoing developments in NLP
- Knowledge about how to design, develop and evaluate NLP programs using programming language Python
- Hands-on experience with large scale NLP problems
Course Contents
- Introduction to NLP, basic conceptsBasic definitions of NLP tasks and methods and basic introduction to linguistics, evaluation metrics and language recourses.
- Text preprocessing: tokenization, POS-tagging, syntax parsingRule-based and machine learning-bases tokenization and POS-tagging, constituency and dependency grammars, syntax parsing.
- Topic modellingVector space model and dimensionality reduction. Latent semantic indexing, latent Dirichlet allocation, dynamic topic models, hierarchical Dirichlet process, autoencoders.
- Distributional semanticsEmbedding models: positive pointwise mutual information matrix decomposition, singular value decomposition, word2vec, GloVe, StarSpace, AdaGram, etc.
- Sequence labellingNamed entity recognition, relation and event extraction and POS-tagging as sequence labelling task. Hidden Markov model, Markov maximal entropy model, conditional random fields, reccurent neural networks.
- Structured learningSyntax parsing and semantic role labelling as structured learning task. Structured SVM and structured perceptron.
- Text classification and clusteringBaseline methods for text classification: naïve Bayes, logisitic regression, fasttext, convolutional neural networks, hard attention mechanism for recurrent neural networks.
- Unsupervised Information ExtractionOpenIE paradigm. SOV triples extraction, classification and clustering. Temporal textual data analysis.
Assessment Elements
- Homework
- Presence on all lectures and seminars
- Exam
- Homework
- Presence on all lectures and seminars
- Exam
Interim Assessment
- Interim assessment (1 semester)0.5 * Exam + 0.4 * Homework + 0.1 * Presence on all lectures and seminars
Bibliography
Recommended Core Bibliography
- Manning, C. D., & Schèutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=24399
- Yang Liu, & Meng Zhang. (2018). Neural Network Methods for Natural Language Processing. Computational Linguistics, (1), 193. https://doi.org/10.1162/COLI_r_00312
Recommended Additional Bibliography
- Shay Cohen. (2019). Bayesian Analysis in Natural Language Processing : Second Edition. San Rafael: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2102157