Магистратура
2023/2024
Прикладная наука о данных
Лучший по критерию «Полезность курса для Вашей будущей карьеры»
Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Статус:
Курс обязательный (Бизнес-аналитика и системы больших данных)
Направление:
38.04.05. Бизнес-информатика
Кто читает:
Департамент бизнес-информатики
Где читается:
Высшая школа бизнеса
Когда читается:
1-й курс, 2 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
20
Охват аудитории:
для своего кампуса
Преподаватели:
Калмыкова Надежда Сергеевна
Прогр. обучения:
Бизнес-аналитика и системы больших данных
Язык:
английский
Кредиты:
3
Контактные часы:
24
Course Syllabus
Abstract
Data Science is the field of study that helps us to find the dependencies in data automatically. Such a technology enables to solve different problems without explicit programming of rules. Due to advances in computing and the field itself, during last decade machine learning has become an essential feature of products ranging from web-services to banks. In this course the student is going to overview the essential concepts of machine learning and then practice employing machine learning methods to solve business tasks. This course emphasizes the practical part and considers various aspects of solving real-world problems. The course content covers all the popular methods such as linear methods, gradient boosting, and clustering. Finally, the course considers the best practices of major companies leveraging the machine learning technology.
Learning Objectives
- Learn to identify a machine learning problem to solve a business problem
- Practice fitting models to solve essential machine learning problems such as regression and classification
- Learn to design and to develop machine learning systems
Expected Learning Outcomes
- Able to apply gradient boosting approach to solve classification and regression problems
- Demonstrate main Pandas methods
- Analyze the performance of a model and report results
- Able to fit and interpret a Decision Tree model and k Nearest Neighbors model on a given dataset
- Able to identify and correctly state classification, regression, and clustering problems
- Able to fit a logistic and linear regression model on a given dataset
- Identify the suitable metric for a machine learning system
- Discribe the bagging approach to create an ensemble of models
- Apply transformation of raw data into features suitable for modeling
- Apply transformation of data to improve the accuracy of the algorithm
- Able to reduce the dimensionality of the original data
- Describe the main approaches for grouping similar data points
- Describe methods and models for time series prediction
Course Contents
- Exploratory data analysis with Pandas
- Visual Data Analysis
- Classification, Decision Trees, and k Nearest Neighbors
- Linear Classification and Regression
- Bagging and Random Forest
- Feature Engineering and Feature Selection
- Unsupervised Learning: Principal Component Analysis and Clustering
- Time Series Analysis with Python
- Gradient Boosting
Bibliography
Recommended Core Bibliography
- Aurélien Géron. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow : Concepts, Tools, and Techniques to Build Intelligent Systems: Vol. Second edition. O’Reilly Media.
- Dr. Ossama Embarak. (2018). Data Analysis and Visualization Using Python : Analyze Data to Create Visualizations for BI Systems. Apress.
- Harish Garg. (2018). Mastering Exploratory Analysis with Pandas : Build an End-to-end Data Analysis Workflow with Python. Packt Publishing.
- James Douglas Hamilton. (2020). Time Series Analysis. Princeton University Press.
- Müller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python : A Guide for Data Scientists: Vol. First edition. Reilly - O’Reilly Media.
- Wei-Meng Lee. 2019. Python Machine Learning. John Wiley & Sons, Incorporated
- Wei-Meng Lee. 2019. Python Machine Learning. John Wiley & Sons, Incorporated
- Yang, X.-S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
Recommended Additional Bibliography
- 9781789958294 - Raschka, Sebastian; Mirjalili, Vahid - Python Machine Learning : Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, 3rd Edition - 2019 - Packt Publishing - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=2329991 - nlebk - 2329991
- Nelli, F. (2015). Python Data Analytics : Data Analysis and Science Using Pandas, Matplotlib and the Python Programming Language. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1056488
- Nelli, F. (2018). Python Data Analytics : With Pandas, NumPy, and Matplotlib (Vol. Second edition). New York, NY: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1905344