Бакалавриат
2023/2024
Введение в статистическое обучение
Статус:
Курс по выбору (Прикладной анализ данных)
Направление:
01.03.02. Прикладная математика и информатика
Где читается:
Факультет компьютерных наук
Когда читается:
3-й курс, 1, 2 модуль
Формат изучения:
без онлайн-курса
Охват аудитории:
для своего кампуса
Язык:
английский
Кредиты:
4
Контактные часы:
56
Course Syllabus
Abstract
This course introduces the students to the elements of machine learning (ML), including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods, etc. The course covers classic ML. The weekly or biweekly team-based Kaggle competitions are released in Python programming language. Other assignments (quizzes and theoretical derivations) are highly individualized and autograded with tools in Moodle LMS. Participation at lectures, seminars and a class forum is assessed and graded. Pre-requisites: calculus 1, vector calculus, linear algebra, probability/statistics, computer programming in a high level language such as Python.This course offers a more practical (hands-on) approach than Fundamentals of Statistical Learning.
Learning Objectives
- The course aims to help students develop an understanding of the process to learn from data, familiarize them with a wide variety of algorithmic and model based methods to extract information from data, teach to apply and evaluate suitable methods to various datasets by model selection and predictive performance evaluation.
Expected Learning Outcomes
- Know the basic concepts from statistical learning theory.
- Build features suitable for the selected machine learning models
- Evaluate performance of the models
- Tune models to improve prediction and classification performance of the models
- Construct machine learning models on the proposed data sets in Python
- Build and interpret the data visualizations in Python
Course Contents
- Math Essentials. Intro to Python in Google Colab
- Intro to Statistical learning
- Linear Regression (SLR) and K-Nearest Neighbors (KNN)
- Classification with Logistic Regression, LDA, QDA, KNN
- Resampling methods. CV, Bootstrap
- Linear model selection and regularization
- Non-linear regression
- Decision Trees, Bagging, Random Forest, Boosting
- Support Vector Machines and Classifiers
- Clustering methods. PCA, k-Means, Hierarchical Clustering, DBSCAN
Assessment Elements
- Home assignmentsHome assignments. The grade for the current category is calculated as cumulative from the beginning of the course.
- ExamThese are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. The assessment of the exam is based on the marking scheme that comes with the exam assignment. Each problem and their sub parts are worth a certain number of points, the sum of these points is equal to 100, which is the maximum grade for the exam on the 100 point scale. The student is awarded the assigned number of points for the correct answer to each part of the question and partial credit may also be awarded.
- QuizzesThe grade for the current category is calculated as cumulative from the beginning of the course.
- TestThese are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. The assessment of the test is based on the marking scheme that comes with the exam assignment. Each problem and their sub parts are worth a certain number of points, the sum of these points is equal to 100, which is the maximum grade for the exam on the 100 point scale. The student is awarded the assigned number of points for the correct answer to each part of the question and partial credit may also be awarded.
- ParticipationThe grade for the current category is calculated as cumulative from the beginning of the course.
Interim Assessment
- 2023/2024 2nd module0.2 * Exam + 0.3 * Home assignments + 0.1 * Participation + 0.2 * Quizzes + 0.2 * Test
Bibliography
Recommended Core Bibliography
- Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani. (2013). An Introduction to Statistical Learning : With Applications in R. Springer.
Recommended Additional Bibliography
- Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.