• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2023/2024

Обработка естественного языка

Статус: Курс обязательный
Направление: 09.03.04. Программная инженерия
Когда читается: 4-й курс, 2, 3 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для своего кампуса
Язык: английский
Кредиты: 6
Контактные часы: 72

Course Syllabus

Abstract

The course is aimed at mastering the basics of natural language processing (NLP), a dynamic interdisciplinary field. The course covers the methods and approaches used in many real NLP applications such as language modeling, text classification, sentiment analysis, generalization, and machine translation. Students taking the course will not only use some of the existing NLP libraries and software packages, but will also learn about the principles behind their design and about the mathematical models that underlie modern computational linguistics. The course also involves performing practical tasks in Python programming and experimenting with texts written in English and Russian. Prerequisites are programming skills in python, general knowledge of linguistics
Learning Objectives

Learning Objectives

  • Formation of students' theoretical knowledge and practical skills on the basics of machine processing of natural language.
Expected Learning Outcomes

Expected Learning Outcomes

  • Apply basic approaches to word embeddings, such as Count-based methods, Word2Vec, Glove
  • Apply classic machine learning methods such as Naive Bayes, SVM, LR and deep learning approaches such as FCN, CNN, LSTM for text classification problem
  • Applying open-source libraries for text preprocessing, such as Natasha and nltk. Resume the following common problems: Expand Contractions, Lower Case, Remove Punctuations, Remove words and digits containing digits, Remove Stopwords, Rephrase Text, Stemming and Lemmatization, Remove White spaces
  • Apply various text-generation techniques such as N-grams LMs and Neural LMs
  • Applying the mechanisms of attenuations and transformers to seq2seq problems
  • Apply special data preprocessing techniques and architectures like Bert to the NER problem
  • Apply modern architecture Bert
Course Contents

Course Contents

  • Word embedding
  • Text classification
  • Text preprocessing methods
  • Language Modeling
  • Seq2seq models
  • Named Entity Recognition
  • Domain Adaptation
  • Transfer learning
  • Question Answering
  • Topic Modeling
  • Text generation
  • Text summarization
  • Style transfer
Assessment Elements

Assessment Elements

  • non-blocking Text classification
  • blocking Text summarization
Interim Assessment

Interim Assessment

  • 2023/2024 3rd module
    0.4 * Text classification + 0.6 * Text summarization
Bibliography

Bibliography

Recommended Core Bibliography

  • Introduction to natural language processing, Eisenstein, J., 2019
  • Yu, C., Wang, J., Chen, Y., & Huang, M. (2019). Transfer Learning with Dynamic Adversarial Adaptation Network. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1909.08184

Recommended Additional Bibliography

  • Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.

Authors

  • EMELYANOVA Mariia MAKSIMOVNA
  • BURASHNIKOV EVGENIY PAVLOVICH