• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
2024/2025

Обработка естественного языка

Статус: Маго-лего
Когда читается: 1 модуль
Охват аудитории: для своего кампуса
Преподаватели: Сурков Антон Юрьевич
Язык: русский
Кредиты: 3

Программа дисциплины

Аннотация

Prerequisites: strong knowledge and skills in Python (numpy, pandas, scikit-learn), mathematical statistics, and machine learning modeling. Natural language processing (NLP) is an important field of computer science, artificial intelligence and linguistics aimed at developing systems that are able to understand and generate natural language at the human level. Modern NLP systems are predominantly based on machine learning (ML) and deep learning (DL) algorithms, and have demonstrated impressive results in a wide range of NLP tasks such as summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic modeling. We interact with such systems and use products involving NLP on a daily basis which makes it exciting to learn how these systems work. This course covers the main topics in NLP, ranging from text preprocessing techniques to state-of-the-art neural architectures. We hope to facilitate interest in the field by combining the theoretical basis with the practical applications of the material.
Цель освоения дисциплины

Цель освоения дисциплины

  • Student can preprocess text data with Python and train machine learning models for various NLP tasks.
Планируемые результаты обучения

Планируемые результаты обучения

  • Student is able to operate with strings and arrays of strings in Python with built-in libraries and pandas, NLTK, Spacy libraries; create functions and classes for simple exploraive analysis.
  • Student is able to prcoess raw text: delete stop words, stem/lemmatize text/ create BoW, tf-Idf using NLTK and sklearn.
  • Student is able to split data, fit sklearn models for classification and regression and assess models' performance
  • Student understands basic concepts of topic modelling and able to build topic models with gensim library
  • Student knows the basics of word2vec and doc2vec models and is able to build embeddings from raw text.
  • Student is able to search relevant documents with BM25(F) and cosine similarity.
  • Student knows the basics of neural networks training and familiar with RNN and attention-based arhitectures. Student is able to build simple RNN and train it with pytorch lightning. Student is able to fine tune LLMs using transformers library
Содержание учебной дисциплины

Содержание учебной дисциплины

  • Processing text with Python
  • Building numerical representations and modelling
  • Introduction to topic modelling.
  • Embeddings: powerful representations of words and documents
  • Introduction to information retrieval
  • Deep learning in NLP
Элементы контроля

Элементы контроля

  • блокирующий HW_1
    After every 2nd seminar, students are given homework, which must be completed by the student no later than the next seminar.
  • неблокирующий HW_2
    After every 2nd seminar, students are given homework, which must be completed by the student no later than the next seminar.
Промежуточная аттестация

Промежуточная аттестация

  • 2024/2025 1st module
    0.571 * HW_1 + 0.429 * HW_2
Список литературы

Список литературы

Рекомендуемая основная литература

  • 9781439898208 - Andrew Gelman , John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin - Bayesian Data Analysis, Third Edition - 2013 - Chapman & Hall/CRC Press - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1763244 - nlebk - 1763244
  • Bird, S., Loper, E., & Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media.
  • Davies, J., Goker, A., & Wiley InterScience (Online service). (2009). Information Retrieval : Searching in the 21st Century. Chichester, U.K.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=308933
  • Deep learning, Kelleher, J. D., 2019
  • Uday Kamath, John Liu, & James Whitaker. (2019). Deep Learning for NLP and Speech Recognition. Springer.

Рекомендуемая дополнительная литература

  • 9781789958294 - Raschka, Sebastian; Mirjalili, Vahid - Python Machine Learning : Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, 3rd Edition - 2019 - Packt Publishing - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=2329991 - nlebk - 2329991
  • Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.
  • Barber, D. (2012). Bayesian Reasoning and Machine Learning. Cambridge: Cambridge eText. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=432721
  • Garreta, R., & Moncecchi, G. (2013). Learning Scikit-learn : Machine Learning in Python: Experience the Benefits of Machine Learning Techniques by Applying Them to Real-world Problems Using Python and the Open Source Scikit-learn Library. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=673033
  • GPT-3 : the ultimate guide to building NLP products with OpenAI API, Kublik, S., 2022
  • Hardeniya, N. (2015). NLTK Essentials. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1044817
  • Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning, 2016. URL: http://www.deeplearningbook.org
  • Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632
  • Sándor Dominich. The Modern Algebra of Information Retrieval (2008), Springer

Авторы

  • Сурков Антон Юрьевич
  • Будько Виктория Александровна