2024/2025
Data Mining
Type:
Mago-Lego
Delivered by:
Department of Statistics and Data Analysis
When:
1, 2 module
Open to:
students of one campus
Instructors:
Ruslan Iskyandyarov
Language:
English
ECTS credits:
6
Course Syllabus
Abstract
Data Mining is the creation of new knowledge to solve business problems, by using business knowledge to discover and interpret patterns in data. The course contents both methodology of the data mining process and practical guide to data preparation, modeling, and deployment analytics into business processes. It covers the following main types of modelling techniques: classification, regression, clustering, anomaly detection and association rules. Special attention will be given to the hands-on data analysis using available software tools.
Learning Objectives
- To introduce students to the concept of data mining process aimed at solving business problems. To provide knowledge of the basic data mining techniques. To gain practical skills in in data analysis, building models and evaluating their quality.
Expected Learning Outcomes
- Aggregates data using Python.
- • Learn to explore and analyze data with Python.
- • Knowledge of methodology of the data mining process
- • Understanding and practical skills in data preparation.
- • Understanding and practical skills in modeling: classification, regression, clustering, association rules and anomaly detection
- • Understanding and practical skills in evaluation of modeling results.
- Analyze the data sets and use python to visualize results.
- - Analyze the data sets and use python to visualize results.
Course Contents
- 1. Data Mining process
- 2. Data preparation
- 3. Classification
- 4. Regression
- 5. Clustering techniques
- 6. Outlier detection.
- 7. Association rules
- Литература по курсу "Современные методы обработки статистических данных (англ)"
- Литература
Assessment Elements
- Активность на семинарахАктивность на семинарах (решение заданий, ответы на вопросы преподавателя)
- Подготовка и защита проекта (в командах)Самостоятельное исследование данных, постановка гипотез, проверки гипотез, проведение анализа данных, построение базовых моделей машинного обучения
- Итоговый тест по темам курсаИтоговый онлайн тест по темам курса
- Выступление с презентацией (в командах)Короткое (10-15 минут) выступление на тему из перечня + подготовка 5 вопросов квиза для аудитории
- Итоговый анализ данныхСамостоятельное исследование предложенных данных и ответы на вопросы по датасету
Interim Assessment
- 2024/2025 2nd module0.05 * Активность на семинарах + 0.05 * Активность на семинарах + 0.1 * Выступление с презентацией (в командах) + 0.1 * Выступление с презентацией (в командах) + 0.25 * Итоговый анализ данных + 0.15 * Итоговый тест по темам курса + 0.3 * Подготовка и защита проекта (в командах)
Bibliography
Recommended Core Bibliography
- Álvaro Scrivano. (2019). Coding with Python. Minneapolis: Lerner Publications ™. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1947372
- Aman Kedia, & Mayank Rasu. (2020). Hands-On Python Natural Language Processing : Explore Tools and Techniques to Analyze and Process Text with a View to Building Real-world NLP Applications. Packt Publishing.
- Hajba G.L. Website Scraping with Python: Using BeautifulSoup and Scrapy / G.L. Hajba, Berkeley, CA: Apress, 2018.
- Handbook of statistical analysis and data mining applications, Nisbet, R., 2009
- Pandas for everyone : Python data analysis, Chen, D. Y., 2023
- Python for data analysis : data wrangling with pandas, numPy, and IPhython, Mckinney, W., 2017
- Robert Nisbet, John Elder, & Gary D. Miner. (2009). Handbook of Statistical Analysis and Data Mining Applications. Academic Press.
Recommended Additional Bibliography
- Linoff, G. (2016). Data Analysis Using SQL and Excel: Vol. Second edition. Wiley.