• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2023/2024

Data Mining and Elements of Machine Learning

Type: Elective course (Fundamental and Applied Linguistics)
Area of studies: Fundamental and Applied Linguistics
Delivered by: Department of Applied Mathematics and Informatics
When: 3 year, 3, 4 module
Mode of studies: offline
Open to: students of one campus
Instructors: Maxim Kazakov, Козлова Анастасия Владимировна
Language: English
ECTS credits: 3
Contact hours: 64

Course Syllabus

Abstract

The course introduces to the students some basic approaches and principles of data mining, the main methods of machine learning and the limits of these methods, the main methods of the quality evaluation.
Learning Objectives

Learning Objectives

  • The purpose of the course is to familiarize students with the basic principles and methods of data analysis and machine learning.
Expected Learning Outcomes

Expected Learning Outcomes

  • Trains logistic regression and KNN, understand quality metrics.
  • Trains classification based on decision trees and ensemble models
  • Trains the model of classification based on SVM and various parameters
  • Trains clustering models, understands clustering evaluation
  • Performs a spectrum of machine learning tasks
  • Reduces the dimensionality with various methods
  • Trains polynomial regression and understand its quality metrics, to identify overfitting and underfitting, to estimate quality during cross-validation
  • Trains polynomial regression and understand its quality metrics, identifies overfitting and underfitting, estimates quality during cross-validation
  • Trains linear regression, understands its quality metrics
  • Prepares data for machine learning algorithms
  • Independently conducts a reproducible experiment by a full pipeline: 1) formulate a problem, analyze previous work and scientific papers on the subject; 2) perform preliminary dataset analysis, data preprocessing, feature engineering and selection; 3) select machine learning methods, train, evaluate and compare models; 4) visualize and explain the results.
  • Works with text data: preprocesses and encodes it.
  • Solves a topic modeling task. Has an idea of Non-Negative matrix factorization and Latent Dirichlet allocation.
Course Contents

Course Contents

  • Introduction. Examples of practical tasks
  • Exploratory data analysis
  • Linear regression
  • Polynomial regression. The concept of overfitting and regularization
  • Classification problem. Logistic regression. The KNN algorithm. Naïve Bayes Classifier.
  • Classification algorithms: decision trees and ensembles.
  • Support vector machines
  • Machine Learning approaches to Named Entities Recognition.
  • Unsupervised machine learning tasks. Dimension reduction.
  • Unsupervised machine learning tasks. The task of clustering
  • Topic Modelling
Assessment Elements

Assessment Elements

  • non-blocking Exam
    The exam is conducted orally (a survey based on course materials).
  • non-blocking Class work
  • non-blocking Homework
  • non-blocking Practical project
  • non-blocking Tests
Interim Assessment

Interim Assessment

  • 2023/2024 4th module
    0.1 * Class work + 0.2 * Exam + 0.3 * Homework + 0.2 * Practical project + 0.2 * Tests
Bibliography

Bibliography

Recommended Core Bibliography

  • Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293

Recommended Additional Bibliography

  • Müller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python : A Guide for Data Scientists: Vol. First edition. Reilly - O’Reilly Media.

Authors

  • DURANDIN OLEG VLADIMIROVICH
  • Klimova Margarita Andreevna