Математические методы анализа данных

Бакалавриат 2022/2023

Лучший по критерию «Новизна полученных знаний»

Статус: Курс обязательный (Программная инженерия)

Направление: 09.03.04. Программная инженерия

Кто читает: Департамент больших данных и информационного поиска

Где читается: Факультет компьютерных наук

Когда читается: 3-й курс, 1, 2 модуль

Формат изучения: с онлайн-курсом

Онлайн-часы: 10

Охват аудитории: для своего кампуса

Преподаватели: Буденный Семен Андреевич, Быков Кирилл Валерьевич, Воронкова Анастасия Михайловна, Егоров Андрей Вадимович, Полунина Полина Алексеевна, Тихонова Мария Ивановна

Язык: английский

Кредиты: 5

Контактные часы: 60

Full Syllabus Ask Question

Abstract

This course presents the foundations of rapidly developing scientific field called intellectual data analysis or machine learning. This field is about algorithms that automatically adjust to data and extract valuable structure and dependencies from it. The automatic adjustment to data by machine learning algorithms makes it especially convenient tool for analysis of big volumes of data, having complicated and diverse structure which is a common case in modern "information era". During this course most common problems of machine learning are considered, including classification, regression, dimensionality reduction, clustering, collaborative filtering and ranking. The most famous and widely used algorithms suited to solve these problems are presented. For each algorithm its data assumptions, advantages and disadvantages as well as connections with other algorithms are analyzed to provide an in-depth and critical understanding of the subject. Much attention is given to developing practical skills during the course. Students are asked to apply studied algorithms to real data, critically analyze their output and solve theoretical problems highlighting important concepts of the course. Machine learning algorithms are applied using python programming language and its scientific extensions, which are also taught during the course. The course is designed for students of the bachelor program "Software Engineering" at the Faculty of Computer Science, HSE.

Learning Objectives

distinguish major problems of data analysis, solved with machine learning
recognise and be able to apply major algorithms to solve stated problems
understand and be able to reproduce core machine learning algorithms
understand dependencies between algorithms, their advantages and disadvantages
be able to use data analysis libraries from python - numpy, scipy, pandas, matplotlib and scikit-learn
understand, which kinds of algorithms are more appropriate for what kinds of data
know, how to transform data to make it more suitable for machine learning algorithms

Expected Learning Outcomes

Can derive updates for kmeans
Can state and explain one-class SVM optimization problem
To be able to explain why error rate cannot be used in gradient training of a classifier
To be familiar with key objects of the course
To formulate and statistical concepts: bootstrap, bias, variance
To formulate optimization objective for logistic Regression and SVM

Course Contents

Introduction to Machine Learning
Basic gradient optimization
Linear Regression Model
Linear Classification
Logistic Regression and SVM
Decision Trees
Bagging, Random Forest and Bias-Variance Tradeoff
Gradient boosting
Clustering and Anomaly Detection
Dimensionality reduction: PCA, SVD
Testing your model: AA/AB tests
From MultiLayer Perceptron to Deep Neural Networks
Machine Learning: Business Applications
Basic CV and convolutional layer
Summary

Assessment Elements

Экзамен
Домашние задания
ДЗ — средняя оценка за все домашние задания

Interim Assessment

2022/2023 2nd module
0.7 * Домашние задания + 0.3 * Экзамен

Bibliography

Recommended Core Bibliography

Christopher M. Bishop. (n.d.). Australian National University Pattern Recognition and Machine Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EBA0C705
Trevor Hastie, Robert Tibshirani , et al., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edition, 2017. Free from the publisher: https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf

Recommended Additional Bibliography

Mehryar Mohri, Afshin Rostamizadeh, & Ameet Talwalkar. (2018). Foundations of Machine Learning, Second Edition. The MIT Press.

Authors

Оруджева Альбина Александровна

Course Syllabus