Магистратура
2024/2025
Машинное обучение
Статус:
Курс по выбору (Аналитика данных и прикладная статистика / Data Analytics and Social Statistics)
Направление:
01.04.02. Прикладная математика и информатика
Где читается:
Факультет социальных наук
Когда читается:
1-й курс, 3 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
40
Охват аудитории:
для своего кампуса
Прогр. обучения:
Аналитика данных и прикладная статистика
Язык:
английский
Кредиты:
3
Контактные часы:
8
Course Syllabus
Abstract
Machine learning is implemented within the field of statistical learning theory and is basically drawn from statistics and functional analysis. The goal of the course is to study, in a statistical framework, the properties of learning algorithms. This study serves a two-fold purpose. On one hand it provides strong guarantees for existing algorithms, and on the other hand suggests new algorithmic approaches that are potentially more powerful. In this course we will go in detail into the theory and methods of statistical learning, and in particular complexity regularization (i.e., how do you choose the complexity of your model when you have to learn it from data). This issue is at the heart of the most successful and popular machine learning algorithms today, and it is critical for their success. This course is an elective course and is implemented both with R and Python.
Learning Objectives
- The course gives students an important foundation to develop and conduct their own research as well as to evaluate research of others.
Expected Learning Outcomes
- Be able to develop and/or foster critical reviewing skills of published empirical research using applied statistical methods.
- Be able to to criticize constructively and determine existing issues with applied linear models in published work .
- Be able to calculate sizes of training sets for several machine learning tasks in the context of PAC-learning (and hence calculate VC-dimensions).
- Have a training of mathematical skills such as abstract thinking, formal thinking and problem solving;
- Have in-depth understanding of boosting algorithms and a few other algorithms for machine learning.
- Have theoretical understanding of several online learning algorithms and learning with expert advice.
- Know several paradigms in statistical learning theory to select models (Structural risk minimiza-tion, Maximal likelihood, Minimal Description Length, etc.).
- Know the basic concepts from statistical learning theory.
- Know the link between cryptography and computational limitations of statistical learning.
- Know theoretical foundation of why some machine learning algorithms are successful in a large range of applications, with special emphasis on statistics.
- Be able to apply the basic concepts from machine learning theory
- Be able to identify appropriately the type of a machine learning problem at hand, e.g. classification, regression, clustering
- Be able to differentiate between supervised and unsupervised learning methods, understand their benefits and limitations
- Be able to master theoretical understanding of key methods for supervised learning to apply decision trees, linear regression, logistic regression, quantile regression, variations of regression for non-Gaussian distributions of the target variable
- Be able to differentiate and correctly apply most common approaches to ensemble learning (random forests, gradient boosting, stacking, blending, etc.) as well as to explain their benefits and limitations
- Be able to identify and tackle issues related to overfitting and model instability
- Be able to apply basic tools and approaches to automated text processing as well as to incorporate text data into machine learning solutions
- Be able to systematize and prioritize best practices in experiment tracking and sustainable ML development
Course Contents
- Section 1. Introduction to classification and regression problems in machine learning.
- Section 2. Model evaluation, key metrics for classification and regression.
- Section 3. Text mining.
- Section 4. Ensemble learning.
- Section 5. Unsupervised learning.
- Section 6. Association rules: theory and applications.
- Section 7. Advanced regression analysis.
- Section 8. Model explainability.
- Section 9. Basics of machine learning development.
Interim Assessment
- 2024/2025 3rd module0.5 * Final project + 0.3 * Graded quizzes + 0.2 * Mid-term homework
Bibliography
Recommended Core Bibliography
- Harman, G., & Kulkarni, S. (2007). Reliable Reasoning : Induction and Statistical Learning Theory. Cambridge, Mass: A Bradford Book. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=189264
- Haroon, D. (2017). Python Machine Learning Case Studies : Five Case Studies for the Data Scientist. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1623520
- Kulkarni, S., Harman, G., & Wiley InterScience (Online service). (2011). An Elementary Introduction to Statistical Learning Theory. Hoboken, N.J.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=391376
Recommended Additional Bibliography
- Lantz, B. (2019). Machine Learning with R : Expert Techniques for Predictive Modeling, 3rd Edition (Vol. Third edition). Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=2106304
- Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=480968
- Ramasubramanian, K., & Singh, A. (2017). Machine Learning Using R. [Place of publication not identified]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402990
- Sarkar, D., Bali, R., & Sharma, T. (2018). Practical Machine Learning with Python : A Problem-Solver’s Guide to Building Real-World Intelligent Systems. [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1667293