Теория вычислительного обучения

2022/2023

Статус: Маго-лего

Кто читает: Департамент больших данных и информационного поиска

Когда читается: 1 модуль

Онлайн-часы: 82

Охват аудитории: для своего кампуса

Преподаватели: Баувенс Бруно Фредерик Л., Каледин Максим Львович

Язык: английский

Кредиты: 5

Контактные часы: 8

Full Syllabus Ask Question

Abstract

Our course aims at introducing machine learning and its paradigms. We will provide basic tools and concepts for theoretical justification of reliability of statistical learning including the concentration of measure phenomenon, empirical process theory, VC-dimension, and Rademacher’s complexity. These tools will be then applied to the analysis of popular algorithms in supervised and unsupervised learning. It is supposed that students have a solid background in calculus, linear algebra, and probability theory in order to fluently follow the course.

Learning Objectives

Understand why overfitting is often a problem. Understand why the nearest neighbor algorithm fails in high dimensions.
Construct various variants of the halving and weighted majority algorithms and provide mistake bounds. Understand the realizable and agnostic setting for online learning. Understand why a hypothesis class is needed to obtain non-trivial mistake bounds.
Prove bounds on the number of mistakes of online algorithms using a margin assumption. Be able to prove a mistake bound for the perceptron algorithm.
Understand the necessity of a hypothesis class (a.k.a no-free-lunch theorem). Understand the definitions of risk, sample complexity and PAC-learnability. Prove bounds on them.
Analyze optimality of learning estimators with Rademacher’s complexity.
Derive probabilistic bounds for bounded random variables and apply them.
Construct bounds for excess risk with McDiarmid inequality and Massart’s lemma.
Write maximal inequalities and concentration bound for Rademacher’s complexity of VC- classes.
Analyze SVM-algorithm with Rademacher’s complexity.
Formulate classic regression problem.
Obtain convergence rates of the algorithm as sample size grows.
Explain how the classic regression can be improved with regularization techniques.
Formulate a problem of dimensionality reduction and provide the examples.
Show probabilistic guarantees for PCA under some assumptions.
Implement K-means algorithm.
Formulate the Clustering problem.
Argue about convergence and quality of solution of K-means.

Expected Learning Outcomes

Understand why overfitting is often a problem. Understand why the nearest neighbor algorithm fails in high dimension
Analyze optimality of learning estimators with Rademacher’s complexity
Construct bounds for excess risk with McDiarmid inequality and Massart’s lemma
Write maximal inequalities and concentration bound for Rademacher’s complexity of VC-classes
Analyze SVM-algorithm with Rademacher’s complexity
Obtain convergence rates of the algorithm as sample size grows
Explain how the classic regression can be improved with regularization techniques
Show probabilistic guarantees for PCA under some assumptions
Implement K-means algorithm
Argue about convergence and quality of solution of K-means

Course Contents

1. A mathematical definition of good learning
2. Rademacher Complexity
3. VC-Dimensions and SVM Classification
4. Regression
5. Dimensionality Reduction
6. Clustering

Assessment Elements

Quizzes
Weekly quizzes.
Programming Assignment
Staff-graded Assignment
Weekly assignments.

Interim Assessment

2022/2023 1st module
0.036 * Programming Assignment + 0.544 * Quizzes + 0.42 * Staff-graded Assignment

Bibliography

Recommended Core Bibliography

Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.
Mehryar Mohri, Afshin Rostamizadeh, & Ameet Talwalkar. (2018). Foundations of Machine Learning, Second Edition. The MIT Press.

Recommended Additional Bibliography

Kulkarni, S., Harman, G., & Wiley InterScience (Online service). (2011). An Elementary Introduction to Statistical Learning Theory. Hoboken, N.J.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=391376

Authors

BAUVENS BRUNO FREDERIK L.
Литвишкина Ален Витальевна

Course Syllabus

Course Syllabus

Course Syllabus

Abstract

Learning Objectives

Expected Learning Outcomes

Course Contents

Assessment Elements

Interim Assessment

Bibliography

Recommended Core Bibliography

Recommended Additional Bibliography

Authors