Бакалавриат
2024/2025
Интеллектуальный анализ данных и основы машинного обучения
Статус:
Курс по выбору (Фундаментальная и прикладная лингвистика)
Направление:
45.03.03. Фундаментальная и прикладная лингвистика
Где читается:
Факультет гуманитарных наук (Нижний Новгород)
Когда читается:
3-й курс, 3, 4 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
20
Охват аудитории:
для своего кампуса
Язык:
английский
Кредиты:
3
Course Syllabus
Abstract
The course introduces to the students some basic approaches and principles of data mining, the main methods of machine learning and the limits of these methods, the main methods of the quality evaluation.
Learning Objectives
- The purpose of the course is to familiarize students with the basic principles and methods of data analysis and machine learning.
Expected Learning Outcomes
- Trains logistic regression and KNN, understand quality metrics.
- Trains classification based on decision trees and ensemble models
- Trains the model of classification based on SVM and various parameters
- Trains clustering models, understands clustering evaluation
- Performs a spectrum of machine learning tasks
- Reduces the dimensionality with various methods
- Trains polynomial regression and understand its quality metrics, to identify overfitting and underfitting, to estimate quality during cross-validation
- Trains polynomial regression and understand its quality metrics, identifies overfitting and underfitting, estimates quality during cross-validation
- Trains linear regression, understands its quality metrics
- Prepares data for machine learning algorithms
- Independently conducts a reproducible experiment by a full pipeline: 1) formulate a problem, analyze previous work and scientific papers on the subject; 2) perform preliminary dataset analysis, data preprocessing, feature engineering and selection; 3) select machine learning methods, train, evaluate and compare models; 4) visualize and explain the results.
Course Contents
- Introduction. Examples of practical tasks
- Exploratory data analysis
- Linear regression
- Polynomial regression. The concept of overfitting and regularization
- Classification problem. Logistic regression. The KNN algorithm. Naïve Bayes Classifier.
- Classification algorithms: decision trees and ensembles.
- Support vector machines
- Machine Learning approaches to Named Entities Recognition.
- Unsupervised machine learning tasks. Dimension reduction.
- Unsupervised machine learning tasks. The task of clustering
- Topic Modelling
Assessment Elements
- Homework
- Class work
- Practical project
- ExamThe exam is conducted orally (a survey based on course materials).
- Tests
Interim Assessment
- 2024/2025 4th module0.1 * Class work + 0.2 * Exam + 0.3 * Homework + 0.2 * Practical project + 0.2 * Tests
Bibliography
Recommended Core Bibliography
- Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293
Recommended Additional Bibliography
- Müller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python : A Guide for Data Scientists: Vol. First edition. Reilly - O’Reilly Media.