Мы используем файлы cookies для улучшения работы сайта НИУ ВШЭ и большего удобства его использования. Более подробную информацию об использовании файлов cookies можно найти здесь, наши правила обработки персональных данных – здесь. Продолжая пользоваться сайтом, вы подтверждаете, что были проинформированы об использовании файлов cookies сайтом НИУ ВШЭ и согласны с нашими правилами обработки персональных данных. Вы можете отключить файлы cookies в настройках Вашего браузера.

  • A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2024/2025

Введение в NLP

Направление: 45.04.03. Фундаментальная и прикладная лингвистика
Когда читается: 1-й курс, 4 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для своего кампуса
Прогр. обучения: Прикладная лингвистика и текстовая аналитика
Язык: английский
Кредиты: 3

Course Syllabus

Abstract

NLP (Natural Language Processing) is natural language processing, which allows you to apply machine learning algorithms to text and speech. The course will study the basics of NLP, the mathematical methods used in NLP, sentiment analysis, working with databases etc.
Learning Objectives

Learning Objectives

  • The purpose of the course is to gain knowledges of the Natural Language Processing statistical methods.
  • Get acquainted with vector represintation of data.
  • Study building models in NLP.
  • Get acquainted with NLP Libraries.
Expected Learning Outcomes

Expected Learning Outcomes

  • Processing texts using basic string manipulations, as well as sentiment analysis and topic modeling
  • A student knows the history of the discipline and subfields
  • have the skill to work unstructured text data
  • Students are aware of concept and can write Python program for k-nearest neighbors classification
  • Students are aware of concept and can write Python program for naive Bayes classification
  • Students can use Python to pefrom text preprocessing: word normalization (spelling correction, stemming, lemmatization, stopword removal, case folding), tokenization and creation n-grams .
  • Understand the transformer architecture
  • Students are aware of different types of machine learning techniques, such as supervised and unsupervised learning.
  • Student are aware of ways to collect data by scraping web-pages.
  • Students are aware of topic modelling
  • Student are aware of two different algorithms, LSA, LDA
  • A student apply the basics of thematic modeling, is familiar with the main approaches of text summarization, simplification and text generation, writes the examples of programs in Python
  • Students are aware of the motivations behind converting human language into mathematical structures.
  • Student are aware of the different types of vector representation techniques.
  • Apply the NLP transduction and induction process
  • Apply CoLA, SST-2, Winograd schemas for solving tasks
Course Contents

Course Contents

  • Introduction
  • Basic Feature Extraction Methods
  • Developing a Text classifier.
  • Collecting Text Data from the Web.
  • Topic Modelling.
  • Text Summarization and Text Generation.
  • Vector Representation.
  • Sentiment Analysis.
Assessment Elements

Assessment Elements

  • non-blocking Practical Work 1 "Basic Feature Extraction Methods"
  • non-blocking Practical Work 2 "Developing a Text classifiers"
  • non-blocking "Collecting Text Data from the Web"
  • non-blocking Practical work 4 "Topic Modeling"
  • non-blocking Practical work 5 "Text summarization and Text Generation"
  • non-blocking Practical Work 6 "Vector Representation"
  • non-blocking Practical Work 7 "Sentiment Analysis":
  • non-blocking Practical Work 8 "Model Architecture of the Transformer"
  • non-blocking Practical Work 9 'NLP Task with Transformers
  • non-blocking Activity on Lections
  • non-blocking Creative Task
Interim Assessment

Interim Assessment

  • 2024/2025 4th module
    0.1 * Practical Work 8 "Model Architecture of the Transformer" + 0.1 * Practical work 5 "Text summarization and Text Generation" + 0.1 * "Collecting Text Data from the Web" + 0.05 * Activity on Lections + 0.05 * Creative Task + 0.1 * Practical Work 1 "Basic Feature Extraction Methods" + 0.1 * Practical Work 2 "Developing a Text classifiers" + 0.1 * Practical Work 6 "Vector Representation" + 0.1 * Practical Work 7 "Sentiment Analysis": + 0.1 * Practical Work 9 'NLP Task with Transformers + 0.1 * Practical work 4 "Topic Modeling"
Bibliography

Bibliography

Recommended Core Bibliography

  • Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
  • Indurkhya N., Damerau F. J. Handbook of natural language processing. – Chapman and Hall/CRC, 2010. – 704 pp.
  • Introduction to natural language processing, Eisenstein, J., 2019
  • Nfn Bahrawi. (2019). Online Realtime Sentiment Analysis Tweets by Utilizing Streaming API Features From Twitter. Jurnal Penelitian Pos Dan Informatika, (1), 53. https://doi.org/10.17933/jppi.2019.090105
  • Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.
  • Speech and language processing. An introduction to natural language processing, computational lin..., Jurafsky, D., 2009

Recommended Additional Bibliography

  • Dale R., Moisl H., Somers H. (ed.). Handbook of natural language processing. – CRC Press, 2000. – 1015 pp.
  • Natural Language Processing and Information Systems. (2017). Springer.

Authors

  • Kuptsov Pavel Vladimirovich
  • Stankevich Nataliia Vladimirovna