• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2024/2025

Обработка естественного языка

Направление: 38.04.01. Экономика
Когда читается: 1-й курс, 1 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для своего кампуса
Преподаватели: Сурков Антон Юрьевич
Прогр. обучения: Аналитика данных для бизнеса и экономики
Язык: английский
Кредиты: 3

Course Syllabus

Abstract

Prerequisites: strong knowledge and skills in Python (numpy, pandas, scikit-learn), mathematical statistics, and machine learning modeling. Natural language processing (NLP) is an important field of computer science, artificial intelligence and linguistics aimed at developing systems that are able to understand and generate natural language at the human level. Modern NLP systems are predominantly based on machine learning (ML) and deep learning (DL) algorithms, and have demonstrated impressive results in a wide range of NLP tasks such as summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic modeling. We interact with such systems and use products involving NLP on a daily basis which makes it exciting to learn how these systems work. This course covers the main topics in NLP, ranging from text preprocessing techniques to state-of-the-art neural architectures. We hope to facilitate interest in the field by combining the theoretical basis with the practical applications of the material.
Learning Objectives

Learning Objectives

  • Student can preprocess text data with Python and train machine learning models for various NLP tasks.
Expected Learning Outcomes

Expected Learning Outcomes

  • Student is able to operate with strings and arrays of strings in Python with built-in libraries and pandas, NLTK, Spacy libraries; create functions and classes for simple exploraive analysis.
  • Student is able to prcoess raw text: delete stop words, stem/lemmatize text/ create BoW, tf-Idf using NLTK and sklearn.
  • Student is able to split data, fit sklearn models for classification and regression and assess models' performance
  • Student understands basic concepts of topic modelling and able to build topic models with gensim library
  • Student knows the basics of word2vec and doc2vec models and is able to build embeddings from raw text.
  • Student is able to search relevant documents with BM25(F) and cosine similarity.
  • Student knows the basics of neural networks training and familiar with RNN and attention-based arhitectures. Student is able to build simple RNN and train it with pytorch lightning. Student is able to fine tune LLMs using transformers library
Course Contents

Course Contents

  • Processing text with Python
  • Building numerical representations and modelling
  • Introduction to topic modelling.
  • Embeddings: powerful representations of words and documents
  • Introduction to information retrieval
  • Deep learning in NLP
Assessment Elements

Assessment Elements

  • blocking HW_1
    After every 2nd seminar, students are given homework, which must be completed by the student no later than the next seminar.
  • non-blocking HW_2
    After every 2nd seminar, students are given homework, which must be completed by the student no later than the next seminar.
Interim Assessment

Interim Assessment

  • 2024/2025 1st module
    0.571 * HW_1 + 0.429 * HW_2
Bibliography

Bibliography

Recommended Core Bibliography

  • 9781439898208 - Andrew Gelman , John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin - Bayesian Data Analysis, Third Edition - 2013 - Chapman & Hall/CRC Press - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1763244 - nlebk - 1763244
  • Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
  • Davies, J., Goker, A., & Wiley InterScience (Online service). (2009). Information Retrieval : Searching in the 21st Century. Chichester, U.K.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=308933
  • Deep learning, Kelleher, J. D., 2019
  • Uday Kamath, John Liu, & James Whitaker. (2019). Deep Learning for NLP and Speech Recognition. Springer.

Recommended Additional Bibliography

  • 9781789958294 - Raschka, Sebastian; Mirjalili, Vahid - Python Machine Learning : Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, 3rd Edition - 2019 - Packt Publishing - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=2329991 - nlebk - 2329991
  • Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.
  • Barber, D. (2012). Bayesian Reasoning and Machine Learning. Cambridge: Cambridge eText. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=432721
  • Garreta, R., & Moncecchi, G. (2013). Learning Scikit-learn : Machine Learning in Python: Experience the Benefits of Machine Learning Techniques by Applying Them to Real-world Problems Using Python and the Open Source Scikit-learn Library. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=673033
  • GPT-3 : the ultimate guide to building NLP products with OpenAI API, Kublik, S., 2022
  • Hardeniya, N. (2015). NLTK Essentials. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1044817
  • Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning, 2016. URL: http://www.deeplearningbook.org
  • Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632
  • Sándor Dominich. The Modern Algebra of Information Retrieval (2008), Springer

Authors

  • SURKOV ANTON YUREVICH
  • BUDKO VIKTORIYA ALEKSANDROVNA