MDS Courses
Python Basic
Course author: Kirill Chmel
This course will introduce you to the Python programming language which is widely used in data analysis and machine learning. It can become an efficient tool for achieving at first small and then incredible goals. At the end of this course, you will have a good understanding of the basic constructions of the Python programming language.
Course topics:
● Background information about Python
● How to handle different data types
● Conditional constructions
● For and while loops
● How to create functions
During this course, you will be expected to write multiple programs in Python, take tests and complete one project per week.
Python Advanced
Course authors: Yury Gorishniy, Borisov Dmitrii
This course is a continuation of the Python Basic course featuring more complex topics. During this six-week course, you will learn new data structures and patterns that are used by both data scientists and software engineers on a daily basis. In fact, these new concepts are at the heart of many programming languages and frameworks beyond Python, so you can use them to solve a wide range of problems. By the end of the course, you will have a complete set of must-have tools enabling you to dive into specific fields where Python is required, such as data science, machine learning, deep learning and many others.
Course topics:
● New data structures: sets and dictionaries
● Sorting
● Iterators and Python-specific tools for working with them
● Working with files
● Error handling
● Classes and object-oriented programming
● Useful modules of the standard library
During the course, you will gain hands-on experience by writing multiple programs and completing tests.
Discrete Mathematics
Course authors: Vladimir Podolskii, Stepan Kuznetsov, Ilya V. Schurov
This course encompasses various topics in discrete mathematics that are relevant for data analysis.
We will begin with a brief introduction to combinatorics, a branch of mathematics concerned with counting. Familiarity with this topic is critical for anyone who wants to work in data analysis or computer science. We will learn how to put our new knowledge into practice, for example, we will count the number of features in a dataset and estimate the time required for a Python program to run.
Next, we will use our knowledge of combinatorics to study the Basic Probability Theory. Probability is the cornerstone of data analysis, and we will consider it in much more detail later. However, this section will allow you to get a taste of the probability theory and learn important information which will be essential for the Algorithms and Data Structures course.
Finally, we will study a combinatorial structure that remains most relevant for data analysis, namely graphs. Graphs can be found everywhere around us, and we will provide you with numerous examples proving this statement. In this course, we will focus on social network graphs. You will learn the most important notions of the graph theory, have a look at how social network graphs work and study their basic properties. At the end of the course, you will be expected to complete a project related to social graphs.
Topics:
● Basic combinatorics
● Advanced combinatorics
● Discrete probability
● Introduction to graphs
● Basic graph parameters
● Social graphs
Calculus
Course author: Anton Savostianov
This course will cover calculus fundamentals which are essential for more advanced courses in data science. We will begin with a basic introduction to concepts related to functional mappings. Then we will study limits (in relation to sequences, single- and multivariate functions), differentiability (again, starting from a single variable and building up to multiple variables), and integration. This will become our foundation before we proceed to the introduction to basic optimization. At the end of the course, you will be expected to complete a final programming project showcasing the use of optimization routine in machine learning. This will allow you to test your practical skills and give you relevant hands-on experience in programming. You will also have access to additional materials, including interactive plots in GeoGebra environment used during lectures, bonus PDF files with more information on the general methods and further insights into the discussed topics, as well as optional programming tasks, which can be used to deepen your knowledge and get a taste of real-life cases.
Course topics:
● Introduction: numerical sets, functions, limits
● Limits and multivariate functions
● Derivatives and linear approximations: single variate functions
● Derivatives and linear approximations: multivariate functions
● Integrals: anti-derivative, area under curve, multivariate functions
● Optimization
Linear Algebra
Course authors: Vsevolod Chernyshev, Dmitri Piontkovski
The key goal of this course is to explain the most essential concepts of linear algebra which can be used in data analysis and machine learning. It will also help you improve your practical skills needed to use linear algebra methods in machine learning and data analysis.
This course covers the fundamentals of handling data in vector and matrix form. You will learn how to solve systems of linear algebraic equations, find basic matrix decompositions and make decisions about their applicability.
In addition to basic theory, we will show you how to use some of the basic tools of data analysis and machine learning relying on the application of linear algebra to linear regression, binary classifiers, dimensionality reduction, and principal component analysis.
Course topics:
● Systems of linear equations and linear classifier
● Full rank decomposition and systems of linear equations
● Dimensionality reduction
● Linear operators, eigenvectors and eigenvalues, walks on graphs
● Distances and operators in Euclidean space
● Singular value decomposition and Principal Component Analysis. Final project
Probability Theory
Course author: Ilya V. Schurov
Probability theory is the cornerstone of mathematical statistics and data analysis. In statistics, we assume that the analyzed data is obtained as a result of random experiments. For example, opinion poll results largely depend on the sample composition. If we want to make conclusions which can be extrapolated to other samples, first, we have to study the actual data generation process. We can do that by modelling this process using a system of random variables. In the Probability Theory course, we will begin with the basic notions of probability theory (conditional probability and independence) and then study discrete and continuous random variables and their properties. The law of large numbers and the central limit theorem are the key topics of this course. We also discuss how to study probabilistic processes using computer simulations.
Algorithms and Data Structures – I
Course authors: Gleb Pogudin, Olga Abakumova
This course highlights basic algorithmic techniques and ideas for computational problems frequently arising in practical applications: searching and sorting, divide and conquer algorithms, greedy algorithms, and dynamic programming. The course provides in-depth theoretical information, e.g. you will learn how to sort data and how this can help in your search for specific data; how to break a large problem into smaller pieces and solve them recursively; when it makes sense to proceed greedily; how dynamic programming can be used in genomic studies. You will also gain hands-on experience in solving computational problems, designing new algorithms, and effectively implementing solutions.
Course topics:
● Algorithms complexity
● Sorting data
● Linear and binary search
● Divide and conquer algorithm
● Greedy algorithms
● Dynamic programming
Algorithms and Data Structures - II
Course author: Andrey Kharatyan
A good algorithm usually comes hand in hand with a good data structure set as it ensures effective data manipulation. In this course, we will study the common data structures which are widely used for solving various computational problems. You will learn how these data structures can be implemented in different programming languages and try doing it yourself during our programming assignments. This will help you understand the inner workings of specific built-in implementations of data structures and know what to expect from them. The course will also highlight the typical use cases for given data structures.
This course covers the following topics:
● Queues
● Stacks and deques
● Heap and priority queues
● Binary search trees
● Hash tables
Basic Statistics
Course author: Ilya V. Schurov
The course in Basic Statistics will introduce you to the key tools used in statistical analysis. We will begin with exploratory data analysis in Python and Pandas, descriptive statistics, and data visualization. Then we will proceed to statistical hypothesis testing, the main concept of frequentist statistical framework. Here, we will discuss the conditions required to make generalizations about the underlying data-generation process based on the given data. In particular, we will introduce different types of statistical estimates (point estimates and confidence intervals) and discuss their properties (consistency and unbiasedness). Finally, we will learn how to estimate the relationship between two random variables using correlational analysis.
SQL
Course author: Kirill S. Gomenyuk
Course objectives:
● To introduce future data analysts to a relations data source
● To teach students how to use the language tools of modern DBMS to extract and prepare data
Practical skills:
● Building SQL queries in a PostgreSQL database
● Preparing analytical reports on relational databases
● Data preparation for further analysis
● Retrieving data from DB at the application level (e.g. with Python)
Pre-requisites:
● Beginner programming skills
● Basic knowledge of discrete mathematics (working with sets)
Other information:
● Students will learn to write SQL queries for several different databases
● The course includes three parts: theoretical foundation for data manipulation, the key features of SQL language, and SQL application for solving practical tasks.
Data Scraping
Course author: Ilya Golubev
Data scraping is the process of importing information from websites, spreadsheets, PDF files and other data sources. Machine learning without a well-prepared dataset will never yield good results, but datasets of proper quality, which would be suitable for use in machine learning, are very hard to find. Data scaping solves this problem by automating the preparation of such datasets. This course will examine text file encodings, network interaction with web servers, the fundamentals of the HTML hypertext markup language, XML and JSON data storage and exchange formats, interaction with servers using the API, and work with non-static sites. Python and its libraries will be used to retrieve the data. At the end of the course, you will be expected to complete a data scraping project.
Course topics:
● Processing excel/xml/json/pdf files using Python
● ip, dns, http. GET- and POST- requests
● HTML basics
● BeautifulSoup library, automatization with Selenium
● Using APIs
● Project preparation
The Internet is a great source of information, and the good thing is that it is at arm’s length nowadays. However, the amount of data may seem overwhelming; it comes in many forms, tends to grow exponentially fast and sometimes gets hard to cope with.
In this course, we will help you master the tools that are necessary to transform the seemingly immense ocean of data into meaningful, useful information. We will examine most common data formats, study the Internet architecture, investigate the structure of a webpage and learn how to create one of our own, as well as dive into the concept of API. Finally, we will consolidate the knowledge we acquired by implementing a project.
After the end of the course, you will know how to deal with a complex practical task like data scraping and will have completed your own project.
Data Analysis: Machine Learning
Course author: Anna Kuzina
It’s impossible to write precise algorithms for many modern problems. However, it may be possible to automatically extract algorithms from data and use them to solve a problem while producing results of appropriate quality. Machine learning is concerned with exactly that - methods for automatic data processing and analysis. This course introduces the most popular supervised (linear models for classification and regression, decision trees, and ensembles) and unsupervised (clustering and dimensionality reduction) methods and discusses quality measurement and assessment in detail. All topics will include homework in Python based on real-life datasets.
Readings:
● Hastie T., Tibshirani R, Friedman J. The Elements of Statistical Learning (2nd edition). Springer, 2009.
● Bishop C. M. Pattern Recognition and Machine Learning. Springer, 2006.
● Mohri M., Rostamizadeh A., Talwalkar A. Foundations of Machine Learning. MIT Press, 2012.
● Murphy K. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
● Mohammed J. Zaki, Wagner Meira Jr. Data Mining and Analysis. Fundamental Concepts and Algorithms. Cambridge University Press, 2014.
● Willi Richert, Luis Pedro Coelho. Building Machine Learning Systems with Python. Packt Publishing, 2013.
Decision Making: Applied Machine Learning
Course authors: Andrey Zimovnov, Evgeny Kovalev
The course in Machine Learning Basics encompasses all essential mathematical methods for building models, but we always have to consider additional factors when dealing with a real-life problem. This course will give you an in-depth understanding of all important steps like data preparation and exploratory data analysis, feature extraction for complex data, and learning with non-standard loss functions. You will also learn the key approaches to ranking, time series forecasting and recommender systems.
Readings:
● Hastie T., Tibshirani R, Friedman J. The Elements of Statistical Learning (2nd edition). Springer, 2009.
● Bishop C. M. Pattern Recognition and Machine Learning. Springer, 2006.
● Mohri M., Rostamizadeh A., Talwalkar A. Foundations of Machine Learning. MIT Press, 2012.
● Murphy K. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
● Mohammed J. Zaki, Wagner Meira Jr. Data Mining and Analysis. Fundamental Concepts and Algorithms. Cambridge University Press, 2014.
● Willi Richert, Luis Pedro Coelho. Building Machine Learning Systems with Python. Packt Publishing, 2013.
Applied Statistics
Course author: Evgeny Ryabenko
Statistical methods are widely used in data science. For example, A/B-testing is usually used for measuring the effects of product variations and making data-driven decisions. In this course, you will study hypothesis testing, statistical model analysis, time series models, and A/B-testing design.
Readings:
● Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
● Wasserman L. All of Nonparametric Statistics. Springer, 2006.
● Bishop C.M. Pattern Recognition and Machine Learning. Springer, 2006.
● David Mackay J.C. Information Theory, Inference, and Learning Algorithms. Cambridge, 2007.
● Grimmett G., Stirzaker D. Probability and Random Processes. Oxford University Press, 2001.
● Forrester A., Sobester A., Keane A. Engineering Design via Surrogate Modelling. A Practical Guide. Wiley, 2008.
● Lee J.A., Verleysen M. Nonlinear Dimensionality Reduction. Springer, 2007.
Computational Complexity
Course authors: Sergei Obiedkov, Bruno Frederik Bauwens
When dealing with computational tasks, you need to have a certain intuition and always be able to establish the complexity of a given problem. Is it something that will take your computer fractions of a second to process or is it going to take years? This course covers the topic of computational complexity, and it will help you develop intuition allowing you to categorize computational tasks in terms of their complexity. The cornerstone of this course is the notion of NP-completeness which shows that numerous natural computational problems can be equivalent complexity-wise. We will also discuss complexity classes related to probabilistic computations.
Introduction to Deep Learning
Course authors: Andrey Zimovnov, Ekaterina Lobacheva, Alexey Kovalev
This course covers the basics of modern neural networks and their applications in computer vision and natural language understanding. We will begin by discussing stochastic optimization methods that are essential for training deep neural networks. You will learn all popular building blocks of neural networks, including fully connected layers, convolutional and recurrent layers.
Then, you can use these building blocks to define complex modern architectures in TensorFlow and Keras frameworks. For the course project, you will be expected to implement a deep neural network for image captioning in order to produce a text description for an input image.
Course topics:
● Optimization for machine learning
● Feed-forward networks, backpropagation
● Convolutional neural networks
● Autoencoders
● Recurrent neural networks
● Project on image captioning
Readings:
● Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016.
● Sam Abrahams, Danijar Hafner, Erik Erwitt, Ariel Scarpinelli. TensorFlow For Machine Intelligence: A hands-on introduction to learning algorithms. Bleeding Edge Press; 1 edition (July 23, 2016).
Computational Learning Theory
Course authors: Nikita Puchkin, Maksim Kaledin, Bruno Frederik Bauwens
If you want to be intuitive in solving practical tasks using machine learning, first, you need to understand its underlying theoretical foundations. This course will give you insight into these foundations.
We will study the classical theoretical models that capture (to a certain extent) the models used in practice. We will examine the PAC-learning setting, discuss the classical notion of VC-dimension and its importance, and provide a brief introduction to the statistical learning theory.
Large Scale Machine Learning part 1
Course authors: Mikhail Anukhin, Oleg Ivchenko, Julia Ivanova
The rapid growth in the volumes of available data has made machine learning quite popular in the modern world. It is often impossible to store all datasets on one computer; therefore, it needs to be processed in a distributed manner. This course will highlight MapReduce paradigm, introduce Hadoop and Spark systems and explain approaches to big data processing based on these technologies.
Readings:
● Bastiaan Sjardin. Large Scale Machine Learning with Python. Packt Publishing; 1 edition (August 3, 2016).
● Tom White. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly Media; 4 edition (April 11, 2015).
Large Scale Machine Learning part 2
Course authors:
The LSML 1 course provides a good understanding of big data storage technologies, but sometimes we need to train models in a distributed manner, as well. This course discusses distributed training of popular machine learning models, like linear models, decision trees and ensembles, and closely examines large-scale recommender systems.
Readings:
● Bastiaan Sjardin. Large Scale Machine Learning with Python. Packt Publishing; 1 edition (August 3, 2016).
● Tom White. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly Media; 4 edition (April 11, 2015).
Optimizations for Machine Learning
Course authors: Evgeny Bobrov, Dmitriy Kropotov
The key goal of optimization is to train a machine learning model based on given datasets, which directly affects the model quality. Many classic optimization algorithms prove ineffective with big data and require modifications tailored for specific problems. This course examines both classical and modern optimization techniques for continuous optimization and discusses their applications in machine learning. The course topics include one-dimensional optimization, gradient descent and its modifications, Newton’s method, L-BFGS, constrained optimization, non-convex optimization, and many others.
Readings:
● J. Nocedal, S. Wright. Numerical Optimization, Springer, 2006.
● A. Ben-Tal, A. Nemirovski. Optimization III. Lecture Notes, 2013.
● Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, Springer, 2003.
● S. Boyd, L. Vandenberghe. Convex Optimization, Cambridge University Press, 2004.
● D. Bertsekas. Convex Analysis and Optimization, Athena Scientific, 2003.
Advanced Algorithms
Course authors: Fyodor Strok, Andrey Lyashko, Alexandr Borzunov
The Advanced Algorithms course examines complex algorithms which are often used in machine learning, data collection and processing. The algorithms highlighted in this course demonstrate extreme asymptotic efficiency in terms of speed and memory consumption. For example, they will allow implementing fast machine learning models that work in real time. Familiarity with these algorithms will not only improve your coding skills and allow you to create effective programs but also help you prepare for the technical part of your job interview. Regardless of the position you are applying for, most interviews include algorithmic puzzles. This course will give you an opportunity to test your abilities in solving such problems.
Natural Language Processing
Course authors: Mariya Tikhonova, Alyona Fenogenova, Vladislav Mihailov
This course encompasses a wide range of tasks related to natural language processing, from basic to advanced ones: sentiment analysis, summarization, and dialogue state tracking, to name only a few. Upon completing the course, you will be able to single out NLP tasks in your daily work, propose relevant approaches, and make judgements as to which techniques are most likely to work well in a particular case. The final project will feature one of the most popular topics in today’s NLP. You will build your own conversational chat-bot that will provide search assistance on a StackOverflow website. The project will incorporate course-related practical assignments, and you will gain hands-on experience with respect to such tasks as text classification, named entities recognition, and duplicates detection.
During the lectures, we will try to find a balance between traditional and deep learning techniques in NLP and examine them in parallel. For example, we will discuss word alignment models in machine translation and establish their similarity to the attention mechanism in encoder-decoder neural networks. The core techniques will not be black-boxed. On the contrary, you will gain an in-depth understanding of the inner workings of each technique. To get the most from this course, you need to be familiar with the basics of linear algebra and probability theory, machine learning setup, and deep neural networks. Some materials will be based on one-month-old papers; therefore, you will gain access to cutting-edge NLP research.
Course topics:
● Text classification
● Language modelling and sequence tagging
● Vector space models
● Sequence to sequence tasks
● Dialogue systems
● Final project
Readings:
● Christopher Manning and Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
● Dan Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice Hall; 2nd edition (May 16, 2008)
DevOps
Course authors: Alexander Mikhalevich, Nikita Starichkov, Yuriy Badanin
DevOps is a set of software development practices that combine software development (Dev) and information-technology operations (Ops) to shorten the systems development life cycle, while frequently delivering features, fixes, and updates in close alignment with the given business objectives. Graduates often lack practical skills and experience required for professional success in the IT industry. The DevOps course will give you an opportunity to develop and polish relevant skills needed for large-scale complex projects, including system design, system deployment, support, version control systems, virtualization, etc. This valuable hands-on experience will allow you to start working on your own industrial-level projects and effectively collaborate with your team members if you are hired by an IT company. During the course, you will have an opportunity to solve many practical problems focused on various aspects of a product life cycle.
Course topics:
● System version control (git), compiling with CMake
● Bug tracking and debugging
● System architecture
● Continuous integration/deployment/delivery
● Development methodologies
● Virtualization and containerization
Deep Generative Models
Course authors: Ivan Savin, Anatoliy Bardukov
Applications of deep generative models can be found in a variety of domains nowadays. This course closely examines the modern architectures of generative models and the corresponding training algorithms. Lectures will cover industrial applications of such techniques, including variational autoencoders (VAE), generative adversarial networks (GAN), autoregressive models and normalising flows. The course will give you an in-depth understanding of the pros and cons of each method so that you can select and apply the best-fitting model. We will also discuss possible future applications of the wide spectrum of deep generative models in our everyday lives.
Readings:
● Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016.
● Sam Abrahams, Danijar Hafner, Erik Erwitt, Ariel Scarpinelli. TensorFlow For Machine Intelligence: A hands-on introduction to learning algorithms. Bleeding Edge Press; 1 edition (July 23, 2016).
Computer Vision
Course author: Artem Filatov
Thanks to the deep learning technology, computer vision, being a rapidly developing field as it is, experienced a real quantum leap breakthrough. With deep learning, a lot of new applications of computer vision techniques have been introduced and are now becoming indispensable parts of our everyday lives. Such applications include face recognition and indexing, image stylization, and machine vision in self-driving cars.
This course will introduce you to computer vision, starting from basics all the way to cutting-edge deep learning models. We will discuss both image and video recognition, including image classification and annotation, object recognition and image search, various object detection techniques, motion estimation, object tracking in video files, human action recognition, and, finally, image stylization, editing and new image generation. As part of the course project, you will learn how to build face recognition and manipulation system and gain an understanding of the internal mechanics of this technology, which is arguably the most well-known example of the application of computer vision and AI often found in movies and TV shows.
Course topics:
● Introduction to image processing and computer vision
● Convolutional features for image recognition
● Object detection
● Object tracking
● Image segmentation
Readings:
● Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016.
● Sam Abrahams, Danijar Hafner, Erik Erwitt, Ariel Scarpinelli. TensorFlow For Machine Intelligence: A hands-on introduction to learning algorithms. Bleeding Edge Press; 1 edition (July 23, 2016).
C++
Course author: Sergey Shershakov
This course closely examines C++, which is one of the most popular languages used for developing effective applications. The first part of the course briefly explains the basic language constructs which should be familiar to students who took a course in the Python language: input-output, variables, conditional operators, loops, functions, and containers. One of the most important topics is the use of templates, standard template library (STL), and iterators. We will also explore the basics of object-oriented programming: classes, objects, and methods. During the course, you will have an opportunity to solve many practical problems.
Course topics:
● Input-output, variables, conditional operator, loops, functions
● Containers: vector, string
● STL containers: set, map
● STL containers: unordered set, priority queue
● STL algorithms
● OOP basics
Bayesian Methods for Machine Learning
Course author: Denis Rakitin
Bayesian methods are used in many areas from game development to drug discovery. They give superpowers to many machine learning algorithms, e.g. handling missing data and extracting extensive information from small datasets. Bayesian methods allow us to estimate uncertainty in predictions, which is a desirable feature for many fields like medicine. When applied to deep learning, Bayesian methods make it possible to perform a hundred-fold model compression and automatic hyperparameter tuning, thereby saving both our time and money. During the six-week course, we will be studying the basics of Bayesian methods, from defining a probabilistic model to making predictions based on this model. We will also discuss how the workflow can be automated and accelerated by using certain advanced techniques. Then, we will have a look at the applications of Bayesian methods in deep learning and learn how to generate new images. And finally, we will discuss how new cures for severe diseases can be discovered using Bayesian methods.
Course topics:
● Introduction to Bayesian methods. Conjugate priors
● Expectation-Maximization algorithm
● Variational inference
● Markov chain Monte Carlo
● Variational Autoencoder
Readings:
● Barber D. Bayesian Reasoning and Machine Learning. Cambridge University Press, 2012.
● Murphy K.P. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
● Bishop C.M. Pattern Recognition and Machine Learning. Springer, 2006.
● Mackay D.J.C. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
Final Project
The final project is a large-scale assignment which includes solving a practical or research problem. You can select one of the proposed projects. The course includes several milestone assessments to check whether or not you have been able to complete a given course section. Your final project can become a strong point on your resume.