Bachelor
2023/2024
Fundamentals of Statistical Learning
Type:
Elective course (Data Science and Business Analytics)
Area of studies:
Applied Mathematics and Information Science
Delivered by:
Big Data and Information Retrieval School
Where:
Faculty of Computer Science
When:
3 year, 1, 2 module
Mode of studies:
offline
Open to:
students of one campus
Language:
English
ECTS credits:
4
Contact hours:
56
Course Syllabus
Abstract
This course introduces the students to the elements of machine learning (ML), including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods, etc. The course covers classic ML. The weekly or biweekly team-based Kaggle competitions are released in Python programming language. Other assignments (quizzes and theoretical derivations) are highly individualized and autograded with tools in Moodle LMS. Participation at lectures, seminars and a class forum is assessed and graded. Pre-requisites: calculus 1, vector calculus, linear algebra, probability/statistics, computer programming in a high level language such as Python.This course offers a more theoretical approach than Introduction to Statistical Learning.
Learning Objectives
- The course aims to help students develop an understanding of the process to learn from data, familiarize them with a wide variety of algorithmic and model based methods to extract information from data, teach to apply and evaluate suitable methods to various datasets by model selection and predictive performance evaluation.
Expected Learning Outcomes
- Know the basic concepts from statistical learning theory.
- Build features suitable for the selected machine learning models
- Evaluate performance of the models
- Tune models to improve prediction and classification performance of the models
- Construct machine learning models on the proposed data sets in Python
- Build and interpret the data visualizations in Python
Course Contents
- Math Essentials. Intro to Python in Google Colab
- Intro to Statistical learning
- Linear Regression (SLR) and K-Nearest Neighbors (KNN)
- Classification with Logistic Regression, LDA, QDA, KNN
- Resampling methods. CV, Bootstrap
- Linear model selection and regularization
- Non-linear regression
- Decision Trees, Bagging, Random Forest, Boosting
- Support Vector Machines and Classifiers
- Clustering methods. PCA, k-Means, Hierarchical Clustering, DBSCAN
Assessment Elements
- Home assignmentsHome assignments. The grade for the current category is calculated as cumulative from the beginning of the course.
- QuizzesThe grade for the current category is calculated as cumulative from the beginning of the course.
- ParticipationThe grade for the current category is calculated as cumulative from the beginning of the course.
- ExamThese are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. The assessment of the exam is based on the marking scheme that comes with the exam assignment. Each problem and their sub parts are worth a certain number of points, the sum of these points is equal to 100, which is the maximum grade for the exam on the 100 point scale. The student is awarded the assigned number of points for the correct answer to each part of the question and partial credit may also be awarded.
- TestThese are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. The assessment of the test is based on the marking scheme that comes with the exam assignment. Each problem and their sub parts are worth a certain number of points, the sum of these points is equal to 100, which is the maximum grade for the exam on the 100 point scale. The student is awarded the assigned number of points for the correct answer to each part of the question and partial credit may also be awarded.
Interim Assessment
- 2023/2024 2nd module0.2 * Exam + 0.3 * Home assignments + 0.1 * Participation + 0.2 * Quizzes + 0.2 * Test
Bibliography
Recommended Core Bibliography
- Robert A. Beezer, T. Hastie, R. Tibshirani, & J. Friedman Springer. (2002). The Elements of Statistical Learning: Data Mining, Inference and Prediction. By. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.C9BC2266
Recommended Additional Bibliography
- James, G. et al. An introduction to statistical learning. – Springer, 2013. – 426 pp.