Бакалавриат
2021/2022
Математические методы анализа данных
Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Лучший по критерию «Новизна полученных знаний»
Статус:
Курс обязательный (Программная инженерия)
Направление:
09.03.04. Программная инженерия
Где читается:
Факультет компьютерных наук
Когда читается:
3-й курс, 1, 2 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
10
Охват аудитории:
для своего кампуса
Преподаватели:
Воронкова Анастасия Михайловна,
Кохтев Вадим Михайлович,
Кузина Анна Вадимовна,
Нарцев Андрей Дмитриевич,
Пащенко Анатолий Владиславович,
Полунина Полина Алексеевна,
Тихонова Мария Ивановна,
Ульянкин Филипп Валерьевич
Язык:
английский
Кредиты:
5
Контактные часы:
60
Course Syllabus
Abstract
This course presents the foundations of rapidly developing scientific field called intellectual data analysis or machine learning. This field is about algorithms that automatically adjust to data and extract valuable structure and dependencies from it. The automatic adjustment to data by machine learning algorithms makes it especially convenient tool for analysis of big volumes of data, having complicated and diverse structure which is a common case in modern "information era". During this course most common problems of machine learning are considered, including classification, regression, dimensionality reduction, clustering, collaborative filtering and ranking. The most famous and widely used algorithms suited to solve these problems are presented. For each algorithm its data assumptions, advantages and disadvantages as well as connections with other algorithms are analyzed to provide an in-depth and critical understanding of the subject. Much attention is given to developing practical skills during the course. Students are asked to apply studied algorithms to real data, critically analyze their output and solve theoretical problems highlighting important concepts of the course. Machine learning algorithms are applied using python programming language and its scientific extensions, which are also taught during the course. The course is designed for students of the bachelor program "Software Engineering" at the Faculty of Computer Science, HSE.
Learning Objectives
- distinguish major problems of data analysis, solved with machine learning
- recognise and be able to apply major algorithms to solve stated problems
- understand and be able to reproduce core machine learning algorithms
- understand dependencies between algorithms, their advantages and disadvantages
- be able to use data analysis libraries from python - numpy, scipy, pandas, matplotlib and scikit-learn
- understand, which kinds of algorithms are more appropriate for what kinds of data
- know, how to transform data to make it more suitable for machine learning algorithms
Expected Learning Outcomes
- To name basic components of the binary tree
- Can derive algorithm in general and particular loss-function cases, see conections
- Can derive continual learning update
- Can derive CUMSUM algorithm
- Can derive updates for kmeans
- Can state and explain one-class SVM optimization problem
- Can state optimization problem for spectral clustering
- To be able to derive and analyse closed form solution from scratch
- To be able to explain why error rate cannot be used in gradient training of a classifier
- To be able to formulate PCA as probabilistic model
- To be able to formulate PCA as sequential optimization problem
- To be able to make derivations for GMM
- To be able to show solution of PCA given SVD decomposition
- To be able to write pseudo-code for gradient decent algorithm
- To be familiar with extension of model to GLM (Poisson, Logistic), multioutput
- To be familiar with key objects of the course
- To define formally problem of binary classification
- To derive impurity criterions for regression and classification problems
- To derive statistical results for bootstrap and ml algorithms in simple cases
- To explain the properties of different quality metrics
- To formulate and statistical concepts: bootstrap, bias, variance
- To formulate optimization objective for Ligostic Regression and SVM
- To get familiar with anomaly detection based on convex-hull methods and disroder testing
- To get familiar with EM algorithm
- To get familiar with variety of approaches to clustering: metric-based, graph-based and hierarchical
- To get idea of functional gradient and its projection
- To know how tree can be applied to unsupervisied problems
- To know pruning and regularization strategies
- To understand and can derive properties of l1, l2 regularization
- To understand basic idea of proximal updates in context of machine learning problems
- To understand bias-variance tradeoff in machine learning tasks
- To understand computational complexity of kernel methods
- To understand concept of overfitting and crossvalidation, can derive LOOCV
- To understand connection of course contents with applications
- To understand connection of machine learning algorithms and statistical methods such a bootstrap
- To understand course goals
- To understand foundation of gradient approaches and the role of each component (learning rate, precondition)
- To understand general idea of log reg
- To understand geometric interpretation of linear classifier and corresponding notation
- To understand implementation tricks
- To understand reglarization techniques
- To understand what is a kernel and kernel trick
- To understand what is a support vector
- To write psedocode for greedy tree construction
Course Contents
- Introduction to Machine Learning
- Basic gradient optimization.
- Linear Regression Model
- Linear Classification
- Logistic Regression and SVM
- Decision Trees
- Bagging, Random Forest and Bias-Variance Tradeoff
- Gradient boosting
- Clustering and Anomaly Detection
- EM and PCA
- Bayesian Linear Regression
- GP for regression and classification tasks
- MLP and DNN for Classification
- Deep Generative Models
Bibliography
Recommended Core Bibliography
- Christopher M. Bishop. (n.d.). Australian National University Pattern Recognition and Machine Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EBA0C705
- Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (Vol. Second edition, corrected 7th printing). New York: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=277008
Recommended Additional Bibliography
- Mehryar Mohri, Afshin Rostamizadeh, & Ameet Talwalkar. (2018). Foundations of Machine Learning, Second Edition. The MIT Press.