• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2021/2022

How to Win a Data Science Competition: Learn from Top Kagglers

Area of studies: Applied Mathematics and Information Science
When: 4 year, 2, 3 module
Mode of studies: distance learning
Online hours: 32
Open to: students of all HSE University campuses
Language: English
ECTS credits: 6
Contact hours: 6

Course Syllabus

Abstract

If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few. At the same time you get to do it in a competitive context against thousands of participants where each one tries to build the most predictive algorithm. Pushing each other to the limit can result in better performance and smaller prediction errors. Being able to achieve high ranks consistently can help you accelerate your career in data science. In this course, you will learn to analyse and solve competitively such predictive modelling tasks.
Learning Objectives

Learning Objectives

  • To study the modern approaches to fitting high-performance models for real-world data analysis problems
  • To understand how to solve predictive modelling competitions efficiently and learn which of the skills obtained can be applicable to real-world tasks.
  • To learn how to preprocess the data and generate new features from various sources such as text and images
  • To master modern tools for building machine learning models
  • To be taught advanced feature engineering techniques like generating mean-encodings, using aggregated statistical measures or finding nearest neighbours as a means to improve your predictions.
Expected Learning Outcomes

Expected Learning Outcomes

  • Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance.
  • Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data.
  • Gain experience of analysing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors and other data-related issues such as leakages and you will learn how to overcome them.
  • Get exposed to past (winning) solutions and codes and learn how to read them.
  • Master the art of combining different machine learning models and learn how to ensemble.
Course Contents

Course Contents

  • Strategies for participation in competitions
  • Tricks of the deep learning
  • Leaks in the data and how to use them
Assessment Elements

Assessment Elements

  • non-blocking Online course
    Coursera course “How to Win a Data Science Competition: Learn from Top Kagglers”
  • non-blocking Competition
Interim Assessment

Interim Assessment

  • 2021/2022 3rd module
    0.6 * Competition + 0.4 * Online course
Bibliography

Bibliography

Recommended Core Bibliography

  • Mehryar Mohri, Afshin Rostamizadeh, & Ameet Talwalkar. (2018). Foundations of Machine Learning, Second Edition. The MIT Press.

Recommended Additional Bibliography

  • Christopher M. Bishop. (n.d.). Australian National University Pattern Recognition and Machine Learning. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EBA0C705

Authors

  • BIRSHERT ALEKSEY DMITRIEVICH
  • SHABALIN ALEKSANDR MIKHAYLOVICH
  • SADRTDINOV ILDUS RUSTEMOVICH