• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2023/2024

Introduction to Statistical Learning

Type: Elective course (Data Science and Business Analytics)
Area of studies: Applied Mathematics and Information Science
When: 3 year, 1, 2 module
Mode of studies: offline
Open to: students of one campus
Instructors: Alexey Boldyrev, Kirill Bykov, Языкова Татьяна Владимировна
Language: English
ECTS credits: 4
Contact hours: 56

Course Syllabus

Abstract

This course introduces the students to the elements of machine learning (ML), including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods, etc. The course covers classic ML. The weekly or biweekly team-based Kaggle competitions are released in Python programming language. Other assignments (quizzes and theoretical derivations) are highly individualized and autograded with tools in Moodle LMS. Participation at lectures, seminars and a class forum is assessed and graded. Pre-requisites: calculus 1, vector calculus, linear algebra, probability/statistics, computer programming in a high level language such as Python.This course offers a more practical (hands-on) approach than Fundamentals of Statistical Learning.
Learning Objectives

Learning Objectives

  • The course aims to help students develop an understanding of the process to learn from data, familiarize them with a wide variety of algorithmic and model based methods to extract information from data, teach to apply and evaluate suitable methods to various datasets by model selection and predictive performance evaluation.
Expected Learning Outcomes

Expected Learning Outcomes

  • Know the basic concepts from statistical learning theory.
  • Build features suitable for the selected machine learning models
  • Evaluate performance of the models
  • Tune models to improve prediction and classification performance of the models
  • Construct machine learning models on the proposed data sets in Python
  • Build and interpret the data visualizations in Python
Course Contents

Course Contents

  • Math Essentials. Intro to Python in Google Colab
  • Intro to Statistical learning
  • Linear Regression (SLR) and K-Nearest Neighbors (KNN)
  • Classification with Logistic Regression, LDA, QDA, KNN
  • Resampling methods. CV, Bootstrap
  • Linear model selection and regularization
  • Non-linear regression
  • Decision Trees, Bagging, Random Forest, Boosting
  • Support Vector Machines and Classifiers
  • Clustering methods. PCA, k-Means, Hierarchical Clustering, DBSCAN
Assessment Elements

Assessment Elements

  • non-blocking Home assignments
    Home assignments. The grade for the current category is calculated as cumulative from the beginning of the course.
  • non-blocking Exam
    These are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. The assessment of the exam is based on the marking scheme that comes with the exam assignment. Each problem and their sub parts are worth a certain number of points, the sum of these points is equal to 100, which is the maximum grade for the exam on the 100 point scale. The student is awarded the assigned number of points for the correct answer to each part of the question and partial credit may also be awarded.
  • non-blocking Quizzes
    The grade for the current category is calculated as cumulative from the beginning of the course.
  • non-blocking Test
    These are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. The assessment of the test is based on the marking scheme that comes with the exam assignment. Each problem and their sub parts are worth a certain number of points, the sum of these points is equal to 100, which is the maximum grade for the exam on the 100 point scale. The student is awarded the assigned number of points for the correct answer to each part of the question and partial credit may also be awarded.
  • non-blocking Participation
    The grade for the current category is calculated as cumulative from the beginning of the course.
Interim Assessment

Interim Assessment

  • 2023/2024 2nd module
    0.2 * Exam + 0.3 * Home assignments + 0.1 * Participation + 0.2 * Quizzes + 0.2 * Test
Bibliography

Bibliography

Recommended Core Bibliography

  • Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani. (2013). An Introduction to Statistical Learning : With Applications in R. Springer.

Recommended Additional Bibliography

  • Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: Data Mining, Inference, and Prediction. – Springer, 2009. – 745 pp.

Authors

  • BOLDYREV ALEKSEY SERGEEVICH
  • MELNIKOV OLEG
  • Стоякина Елена Игоревна
  • Karpov Maksim Evgenevich