Master
2021/2022
Applied Machine Learning
Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'
Type:
Elective course (Business Analytics and Big Data Systems)
Area of studies:
Business Informatics
Delivered by:
Department of Business Informatics
Where:
Graduate School of Business
When:
1 year, 3 module
Mode of studies:
distance learning
Online hours:
16
Open to:
students of all HSE University campuses
Instructors:
Sergey Lisitsyn
Master’s programme:
Business Analytics and Big Data Systems
Language:
English
ECTS credits:
5
Contact hours:
28
Course Syllabus
Abstract
Machine learning is the field of study that helps us to find the dependencies in data automatically. Such a technology enables to solve different problems without explicit programming of rules. Due to advances in computing and the field itself, during last decade machine learning has become an essential feature of products ranging from web-services to banks. In this course the student is going to overview the essential concepts of machine learning and then practice employing machine learning methods to solve business tasks. This course emphasizes the practical part and considers various aspects of solving real-world problems. The course content covers all the popular methods such as linear methods, gradient boosting, and neural networks. Finally, the course considers the best practices of major companies leveraging the machine learning technology.
Learning Objectives
- Learn to identify a machine learning problem to solve a business problem
- Practice fitting models to solve essential machine learning problems such as regression and classification
- Learn to design and to develop machine learning systems
- Learn to re-use pre-trained models to lower the development cost of a machine learning systems
Expected Learning Outcomes
- Can identify a problem suitable for machine learning
- Able to apply gradient boosting approach to solve classification and regression problems
- Able to fit a logistic regression model on a given dataset
- Able to fit and interpret a decision tree model on a given dataset
- Able to identify a clustering problem
- Able to identify classification, regression, and clustering problems
- Able to identify overfitting
- Able to identify the suitable metric for a machine learning system
- Able to train a neural network given a dataset
- Able to use pre-trained models
- Can fit a clustering model given a dataset
- Can identify a recommender problem
- Knows at least a few modern applications of machine learning
- Knows the essential rules to develop and support machine learning systems
- Knows the limitations of linear models
- Knows the relations between complexity and overfitting
- Understands the boosting approach to create an ensemble of models
- Understands the concept of differentiable programming
- Understands the concept of embeddings
- Understands the concept of non-parametric learning
- Understands the essential methods for recommenders: collaborative filtering, content-based, and matrix factorization
- Understands the idea of convolution as the base operation for images and audio data
- Understands the universality of gradient boosting approach
Course Contents
- Scope of machine learning
- Machine learning problems
- Linear models for regression and classification
- Decision trees and ensembles
- Overfitting
- Boosting and gradient boosting
- Recommender systems and embeddings
- Non-parametric methods for classification and regression
- Clustering
- Metrics of machine learning
- Neural networks
- Convolutional neural networks
- Machine learning in production systems
Assessment Elements
- Homework №1A student should provide a Jupyter notebook.
- Homework №2A student should either provide a Jupyter notebook to the professor, or participate in an in-class Kaggle competition.
- TestThere are no time limitations to submit the test responses. Tests are provided online with no proctoring.
- Written examFormat: the exam is taken in written form, online (as a programming assignment). The MS Teams platform is used to communicate with students. Students are not allowed to involve any other person in their programming assignment. Any interaction with other students that gives advantage on the assignment is prohibited and so is any plagiarism in the programming assignment. Students are allowed to use any Internet resources and clarify their assignment with the professor.
Interim Assessment
- 2021/2022 3rd module0.25 * Homework №2 + 0.4 * Written exam + 0.1 * Test + 0.25 * Homework №1
Bibliography
Recommended Core Bibliography
- D. Sculley, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, & Michael Young. (n.d.). Machine Learning: The High-Interest Credit Card of Technical Debt. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.BAEF1F2C
- Deep learning, Goodfellow, I., 2016
- Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction (Vol. Second edition, corrected 7th printing). New York: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=277008
- Machine learning : a probabilistic perspective, Murphy, K. P., 2012
- Machine learning in action, Harrington, P., 2012
- Machine learning, Mitchell, T. M., 1997
- Pattern recognition and machine learning, Bishop, C. M., 2006
- Segaran, T. (2007). Programming Collective Intelligence : Building Smart Web 2.0 Applications. Beijing: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=415280
Recommended Additional Bibliography
- Caselles-Dupré, H., Lesaint, F., & Royo-Letelier, J. (2018). Word2Vec applied to Recommendation: Hyperparameters Matter. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsarx&AN=edsarx.1804.04212