• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Data Science for Economics

2024/2025
Academic Year
ENG
Instruction in English
4
ECTS credits
Course type:
Elective course
When:
4 year, 1, 2 module

Instructors


Khassan, Yana

Course Syllabus

Abstract

The course consists of three parts: 1. Introduction to programming; 2. Overview of the most commonly used machine learning algorithms; 3. Time permitting, an introduction to causal inference and applications of machine learning algorithms to causal inference. In the first part of the course students will learn basic programming using computing language R. Obtained skills will allow to implement all methods taught subsequently. Additionally, students will learn how to explore and analyse structured and un-structured data sets. Finally, provided introduction to programming will also be useful in subsequent courses in econometrics and economics. At first, to gain intuition, we will study how to solve the problem in a brute force manner and then explore R packages and built in functions to deal with a problem in the most efficient manner. In the second part of the course we will focus on most commonly used machine learning algorithms. We will cover regression techniques (parametric, nonparametric and high-dimensional), classification methods, resampling methods, model selection, unsupervised learning and text analysis (time permitting). Finally, in the last part of the course we will cover research papers which have recently applied machine learning methods to causal inference in economics. Course Pre-requisites: Statistics; Mathematics for Economists. In the second part of the course I will present and derive statistical properties of various estimators. To be able to follow this part of the course students should have a certain level of mathematical maturity. In practice, this means that students should have done some simple mathematical proofs before taking this class (for example, understand “epsilon-delta” arguments in the context of limits of sequences).
Learning Objectives

Learning Objectives

  • The objective of this course is to provide students with a hands on introduction to data science in economics (or more broadly to data science in the social sciences).
  • At the end of the course students should have developed the following skills: Ability to write simple computer programs using computing language R;
  • Implement basic machine learning algorithms;
  • Understand assumptions and statistical properties of machine learning algorithms;
  • Be able to use machine learning algorithms to solve real world business problems.
Expected Learning Outcomes

Expected Learning Outcomes

  • analyse unbiasedness, consistency and obtain asymptotic distribution of these estimators
  • apply basic ideas of statistical learning
  • be able to avoid unnecessary control structures
  • be able to manipulate basic data structures used in the computing language R
  • be able to study properties of kernel estimators using Monte – Carlo techniques and apply these estimators on real datasets using the computing language R
  • be able to use non – parametric techniques
  • derive OLS estimator both in the univariate and in the multivariate case
  • explain how machine learning methods are used in the causal inference in the “conditional on observables” framework.
  • explain how the data science is used in industry and in academia
  • extract and summarize the data from .html and .xml files
  • implement algorithms (maximal margin classifier, support vector classifier, support vector machine) using the computing language R
  • implement algorithms (regression trees, classification trees, bagging, random forest, boosting) using the computing language R
  • Implement both model selection techniques and bootstrap methods using the computing language R
  • implement data.table framework
  • Implement estimators of linear model selection on the computer, using the computing language R
  • Implement logit and linear (quadratic) discriminant analysis using the computing language R
  • implement OLS estimator in the computing language R, analyse their properties using Monte – Carlo simulations and also apply OLS estimation techniques to the real data
  • implement PCA, K – Means clustering, Hierarchical clustering using the computing language R
  • solve data science problems implementing control structures and functions
  • write basic regular expressions
Course Contents

Course Contents

  • Introduction to Data Science In Economics
  • Control Structures and Functions
  • Vectorized Computation and Data Aggregation
  • Working With Text and the Web
  • Introduction to Statistical Learning
  • Large Sample Properties of OLS
  • Classification
  • Resampling Methods
  • Linear Model Selection and Regularization
  • Nonparametric Estimation
  • Tree Based Methods
  • Support Vector Machines
  • Unsupervised Learning
  • Topics in Causal Inference
Assessment Elements

Assessment Elements

  • blocking Final Exam
    In order to get a passing grade for the course, the student must sit (all parts) of the examination.
  • non-blocking assignment 3
  • non-blocking assignment 1
  • non-blocking assignment 4
  • non-blocking assignment 2
Interim Assessment

Interim Assessment

  • 2024/2025 2nd module
    0.6 * Final Exam + 0.1 * assignment 1 + 0.1 * assignment 2 + 0.1 * assignment 3 + 0.1 * assignment 4
Bibliography

Bibliography

Recommended Core Bibliography

  • Alexandre Belloni, Victor Chernozhukov, & Christian Hansen. (2014). High-Dimensional Methods and Inference on Structural and Treatment Effects. Journal of Economic Perspectives, (2), 29. https://doi.org/10.1257/jep.28.2.29
  • Einav, L., & Levin, J. (2014). Economics in the age of big data. Science, 346(6210), 1–6. https://doi.org/10.1126/science.1243089
  • Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, & Maintainer Trevor Hastie. (2013). Type Package Title Data for An Introduction to Statistical Learning with Applications in R Version 1.0. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.28D80286
  • Hands-On programming with R, Grolemund, G., 2014
  • Sendhil Mullainathan, & Jann Spiess. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, (2), 87. https://doi.org/10.1257/jep.31.2.87
  • Susan Athey. (2018). The Impact of Machine Learning on Economics. NBER Chapters, 507. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsrep&AN=edsrep.h.nbr.nberch.14009
  • Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

  • Computer age statistical inference : algorithms, evidence, and data science, Efron, B., 2017
  • McKinney, W. (2012). Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython. Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=495822
  • Murphy, K. P. (2012). Machine Learning : A Probabilistic Perspective. Cambridge, Mass: The MIT Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=480968
  • Wickham, H. (2015). Advanced R, Second Edition. Boca Raton, FL: Chapman and Hall/CRC. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=934735

Authors

  • DEMESHEV BORIS BORISOVICH