Master
2024/2025
Natural Language Processing
Type:
Elective course (Data Analytics for Business and Economics)
Area of studies:
Economics
Delivered by:
Department of Management
When:
1 year, 1 module
Mode of studies:
offline
Open to:
students of one campus
Instructors:
Anton Surkov
Master’s programme:
Data Analytics for Business and Economics
Language:
English
ECTS credits:
3
Course Syllabus
Abstract
Prerequisites: strong knowledge and skills in Python (numpy, pandas, scikit-learn), mathematical statistics, and machine learning modeling. Natural language processing (NLP) is an important field of computer science, artificial intelligence and linguistics aimed at developing systems that are able to understand and generate natural language at the human level. Modern NLP systems are predominantly based on machine learning (ML) and deep learning (DL) algorithms, and have demonstrated impressive results in a wide range of NLP tasks such as summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic modeling. We interact with such systems and use products involving NLP on a daily basis which makes it exciting to learn how these systems work. This course covers the main topics in NLP, ranging from text preprocessing techniques to state-of-the-art neural architectures. We hope to facilitate interest in the field by combining the theoretical basis with the practical applications of the material.
Learning Objectives
- Student can preprocess text data with Python and train machine learning models for various NLP tasks.
Expected Learning Outcomes
- Student is able to operate with strings and arrays of strings in Python with built-in libraries and pandas, NLTK, Spacy libraries; create functions and classes for simple exploraive analysis.
- Student is able to prcoess raw text: delete stop words, stem/lemmatize text/ create BoW, tf-Idf using NLTK and sklearn.
- Student is able to split data, fit sklearn models for classification and regression and assess models' performance
- Student understands basic concepts of topic modelling and able to build topic models with gensim library
- Student knows the basics of word2vec and doc2vec models and is able to build embeddings from raw text.
- Student is able to search relevant documents with BM25(F) and cosine similarity.
- Student knows the basics of neural networks training and familiar with RNN and attention-based arhitectures. Student is able to build simple RNN and train it with pytorch lightning. Student is able to fine tune LLMs using transformers library
Course Contents
- Processing text with Python
- Building numerical representations and modelling
- Introduction to topic modelling.
- Embeddings: powerful representations of words and documents
- Introduction to information retrieval
- Deep learning in NLP
Assessment Elements
- HW_1After every 2nd seminar, students are given homework, which must be completed by the student no later than the next seminar.
- HW_2After every 2nd seminar, students are given homework, which must be completed by the student no later than the next seminar.
Bibliography
Recommended Core Bibliography
- 9781439898208 - Andrew Gelman , John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin - Bayesian Data Analysis, Third Edition - 2013 - Chapman & Hall/CRC Press - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1763244 - nlebk - 1763244
- Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
- Davies, J., Goker, A., & Wiley InterScience (Online service). (2009). Information Retrieval : Searching in the 21st Century. Chichester, U.K.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=308933
- Deep learning, Kelleher, J. D., 2019
- Uday Kamath, John Liu, & James Whitaker. (2019). Deep Learning for NLP and Speech Recognition. Springer.
Recommended Additional Bibliography
- 9781789958294 - Raschka, Sebastian; Mirjalili, Vahid - Python Machine Learning : Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, 3rd Edition - 2019 - Packt Publishing - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=2329991 - nlebk - 2329991
- Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.
- Barber, D. (2012). Bayesian Reasoning and Machine Learning. Cambridge: Cambridge eText. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=432721
- Garreta, R., & Moncecchi, G. (2013). Learning Scikit-learn : Machine Learning in Python: Experience the Benefits of Machine Learning Techniques by Applying Them to Real-world Problems Using Python and the Open Source Scikit-learn Library. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=673033
- GPT-3 : the ultimate guide to building NLP products with OpenAI API, Kublik, S., 2022
- Hardeniya, N. (2015). NLTK Essentials. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1044817
- Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning, 2016. URL: http://www.deeplearningbook.org
- Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632
- Sándor Dominich. The Modern Algebra of Information Retrieval (2008), Springer