Магистратура
2023/2024
Введение в NLP
Статус:
Курс обязательный (Прикладная лингвистика и текстовая аналитика / Applied Linguistics and Text Analytics)
Направление:
45.04.03. Фундаментальная и прикладная лингвистика
Где читается:
Факультет гуманитарных наук (Нижний Новгород)
Когда читается:
1-й курс, 4 модуль
Формат изучения:
без онлайн-курса
Охват аудитории:
для всех кампусов НИУ ВШЭ
Преподаватели:
Станкевич Наталия Владимировна
Прогр. обучения:
Прикладная лингвистика и текстовая аналитика
Язык:
английский
Кредиты:
3
Контактные часы:
40
Course Syllabus
Abstract
NLP (Natural Language Processing) is natural language processing, which allows you to apply machine learning algorithms to text and speech. The course will study the basics of NLP, the mathematical methods used in NLP, sentiment analysis, working with databases etc.
Learning Objectives
- The purpose of the course is to gain knowledges of the Natural Language Processing statistical methods.
- Get acquainted with vector represintation of data.
- Study building models in NLP.
- Get acquainted with NLP Libraries.
Expected Learning Outcomes
- Processing texts using basic string manipulations, as well as sentiment analysis and topic modeling
- A student knows the history of the discipline and subfields
- have the skill to work unstructured text data
- Students are aware of concept and can write Python program for k-nearest neighbors classification
- Students are aware of concept and can write Python program for naive Bayes classification
- Students can use Python to pefrom text preprocessing: word normalization (spelling correction, stemming, lemmatization, stopword removal, case folding), tokenization and creation n-grams .
- Understand the transformer architecture
- Students are aware of different types of machine learning techniques, such as supervised and unsupervised learning.
- Student are aware of ways to collect data by scraping web-pages.
- Students are aware of topic modelling
- Student are aware of two different algorithms, LSA, LDA
- A student apply the basics of thematic modeling, is familiar with the main approaches of text summarization, simplification and text generation, writes the examples of programs in Python
- Students are aware of the motivations behind converting human language into mathematical structures.
- Student are aware of the different types of vector representation techniques.
- Apply the NLP transduction and induction process
- Apply CoLA, SST-2, Winograd schemas for solving tasks
Course Contents
- Introduction
- Basic Feature Extraction Methods
- Developing a Text classifier.
- Collecting Text Data from the Web.
- Topic Modelling.
- Text Summarization and Text Generation.
- Vector Representation.
- Sentiment Analysis.
- Transformers for Natural Language Processing
Assessment Elements
- Practical Work 1 "Basic Feature Extraction Methods"
- Practical Work 2 "Developing a Text classifiers"
- Practical Work 3 "Collecting Text Data from the Web"
- Practical work 4 "Topic Modeling"
- Practical work 5 "Text summarization and Text Generation"
- Practical Work 6 "Vector Representation"
- Practical Work 7 "Sentiment Analysis"
- Practical Work 8 "Model Architecture of the Transformer"
- Practical Work 9 'NLP Task with Transformers'
- Activity on Lections
- Creative Task
Interim Assessment
- 2023/2024 4th module0.05 * Activity on Lections + 0.05 * Creative Task + 0.1 * Practical Work 1 "Basic Feature Extraction Methods" + 0.1 * Practical Work 2 "Developing a Text classifiers" + 0.1 * Practical Work 3 "Collecting Text Data from the Web" + 0.1 * Practical Work 6 "Vector Representation" + 0.1 * Practical Work 7 "Sentiment Analysis" + 0.1 * Practical Work 8 "Model Architecture of the Transformer" + 0.1 * Practical Work 9 'NLP Task with Transformers' + 0.1 * Practical work 4 "Topic Modeling" + 0.1 * Practical work 5 "Text summarization and Text Generation"
Bibliography
Recommended Core Bibliography
- Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
- Indurkhya N., Damerau F. J. Handbook of natural language processing. – Chapman and Hall/CRC, 2010. – 704 pp.
- Introduction to natural language processing, Eisenstein, J., 2019
- Nfn Bahrawi. (2019). Online Realtime Sentiment Analysis Tweets by Utilizing Streaming API Features From Twitter. Jurnal Penelitian Pos Dan Informatika, (1), 53. https://doi.org/10.17933/jppi.2019.090105
- Pozzi F. et. al. Sentiment Analysis in Social Networks. - Morgan Kaufmann Publishers, 2016. - ЭБС Books 24x7.
- Speech and language processing. An introduction to natural language processing, computational lin..., Jurafsky, D., 2009
- Transformers for machine learning : a deep dive, Kamath, U., 2022
Recommended Additional Bibliography
- Dale R., Moisl H., Somers H. (ed.). Handbook of natural language processing. – CRC Press, 2000. – 1015 pp.
- Natural Language Processing and Information Systems. (2017). Springer.
- Осваиваем архитектуру Transformer : разработка современных моделей с помощью передовых методов обработки естественного языка, Йылдырым, С., 2022