• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2023/2024

Соревновательный анализ данных

Статус: Курс по выбору (Прикладной анализ данных)
Направление: 01.03.02. Прикладная математика и информатика
Когда читается: 4-й курс, 3 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для своего кампуса
Язык: английский
Кредиты: 4
Контактные часы: 20

Course Syllabus

Abstract

The course consists of an overview of successful practices in competitive data analysis. A detailed analysis of the winning solutions is provided, as well as simple solutions that show high quality on real data. Particular attention is paid to visualizing tabular data, searching for features in the data (gaps, inconsistencies, anomalies) and generating new features.
Learning Objectives

Learning Objectives

  • To study the modern approaches to fitting high-performance models for real-world data analysis problems
  • To master modern tools for building machine learning models
  • To learn how to preprocess the data and generate new features from various sources such as text and images
  • To know the basics of exploratory data analysis
  • To be able to quickly come up with simple models for solving problems and know the logic of complicating them to improve quality
  • To be able to find features in data: omissions, inaccuracies, anomalous values, etc.
Expected Learning Outcomes

Expected Learning Outcomes

  • Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance.
  • Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data.
  • Gain experience of analysing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors and other data-related issues such as leakages and you will learn how to overcome them.
  • Get exposed to past (winning) solutions and codes and learn how to read them.
  • Master the art of combining different machine learning models and learn how to ensemble.
Course Contents

Course Contents

  • Introductory lecture
  • Exploratory data analysis, data visualization
  • Simple methods for solving complex problems
Assessment Elements

Assessment Elements

  • non-blocking Homework 1
    Project: Participation in a real data analysis competition.
  • non-blocking Homework 2
    Project: Preparing exploratory dataset analysis (EDA).
Interim Assessment

Interim Assessment

  • 2023/2024 3rd module
    0.5 * Homework 1 + 0.5 * Homework 2
Bibliography

Bibliography

Recommended Core Bibliography

  • Christopher M. Bishop. (n.d.). Australian National University Pattern Recognition and Machine Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EBA0C705
  • Mehryar Mohri, Afshin Rostamizadeh, & Ameet Talwalkar. (2018). Foundations of Machine Learning, Second Edition. The MIT Press.

Recommended Additional Bibliography

  • Cady, F. (2017). The Data Science Handbook. Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1456617