Бакалавриат
2021/2022
Машинное обучение 1
Статус:
Курс обязательный
Направление:
01.03.02. Прикладная математика и информатика
Где читается:
Факультет компьютерных наук
Когда читается:
3-й курс, 1-4 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
10
Охват аудитории:
для своего кампуса
Преподаватели:
Болдырев Алексей Сергеевич,
Ваньков Даниил Андреевич,
Мельников Олег,
Червонцев Сергей Сергеевич
Язык:
английский
Кредиты:
8
Контактные часы:
120
Course Syllabus
Abstract
This course introduces the students to the elements of machine learning, including supervised and unsupervised methods such as linear and logistic regressions, splines, decision trees, support vector machines, bootstrapping, random forests, boosting, regularized methods and several topics in deep learning, such as artificial neural networks, recurrent neural networks, convolutional neural networks, transformers and attention mechanisms, auto-encoders, etc. The first two modules (Sep-Dec) DSBA and ICEF students apply Python programming language and popular packages, such as pandas, scikit-learn and TensorFlow, to investigate and visualize datasets and develop machine learning models that solve theoretical and data-driven problems. The next two modules (Jan-Jun) DSBA/ICEF students apply R programming language and dive deeper into mathematical, statistical, and algorithmic concepts. Pre-requisites: at least one semester of calculus on a real line, vector calculus, linear algebra, probability and statistics, computer programming in high level language such as Python or R.
Learning Objectives
- The course aims to help students develop an understanding of the process to learn from data, familiarize them with a wide variety of algorithmic and model based methods to extract information from data, teach to apply and evaluate suitable methods to various datasets by model selection and predictive performance evaluation.
Expected Learning Outcomes
- Build and interpret the data visualizations in Python and R programming language
- Build features suitable for the selected machine learning models
- Construct machine learning models on the proposed data sets in R
- Evaluate performance of the models
- Tune models to improve prediction and classification performance of the models
Course Contents
- Math Essentials. Intro to Python in Google Colab
- Intro to Statistical learning
- Linear Regression (SLR) & K-Nearest Neighbors (KNN)
- Classification with Logistic Regression, LDA, QDA, KNN
- Resampling methods. CV, Bootstrap
- Linear model selection & regularization
- Non-linear regression
- Decision Trees, Bagging, Random Forest, Boosting
- Support Vector Machines/Classifiers
- Clustering methods. PCA, k-Means, Hierarchical Clustering, DBSCAN
- Artificial Neural Networks (ANN)
- Convolutional Neural Networks (CNN)
- Recurrent Neural Networks (RNN) and Long-Short Term Memory (LSTM) Networks
- Transformer and Attention Layers
Assessment Elements
- QuizzesAll questions and answers are in English. These closely follow the textbook, lectures, seminars and material posted in LMS, including questions about Syllabus and ethics/integrity/honor code.
- homework assignmentsStudents will likely be formed in groups of about 2 students. Collaborations outside of their group will only be allowed at a high level. See grading rubric and syllabus for further instructions.
- ExamThese are individualized, timed, (possibly) proctored and otherwise constrained tests to prevent cheating. In general, expect 60 questions in 60 minutes, some of which you may will have seen in quizzes. Also coursework project and one exam are administered by University of London (UoL), but the grade will be counted towards the grade in this course.
- Coursework Project (CP) in R programming languageAdministered by LSE/UoL
- ParticipationSee syllabus for more info.
- TestsThere will be tests at the end of each of the 4 modules. The examination locations are TBD. An in-class test is closed book, notes, calculators and phones. Take-home test is an open book/internet, but no collaboration. Test questions are different from homework questions: HW deepens your understanding, but the tests measure it. Each test is cumulative. Do not book travel that conflicts with this date.
Interim Assessment
- 2021/2022 2nd module0.1 * Participation + 0.4 * Exam + 0.2 * Quizzes + 0.3 * homework assignments
- 2021/2022 4th module0.3 * homework assignments + 0.1 * Participation + 0.2 * Quizzes + 0.4 * Exam
Bibliography
Recommended Core Bibliography
- Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, & Maintainer Trevor Hastie. (2013). Type Package Title Data for An Introduction to Statistical Learning with Applications in R Version 1.0. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.28D80286
Recommended Additional Bibliography
- Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (Vol. Second edition, corrected 7th printing). New York: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=277008