2024/2025
Nonparametric Theory and Data Analysis
Type:
Mago-Lego
Delivered by:
International Laboratory for Applied Network Research
When:
3, 4 module
Open to:
students of one campus
Language:
English
ECTS credits:
6
Course Syllabus
Abstract
This course is devoted to a separate section of statistical theory dealing with non-parametric methods of statistical data analysis (non-parametric statistics, NPS). This section of statistics is very often used in conjunction with more “classical” approaches based on Gaussian statistics, but it is arranged differently and requires a special approach to understanding and interpretation. Currently, more and more business decisions are made on the basis of data measured in categorical and rank scales, and therefore the relevance of this type of data analysis is increasing. Throughout the course, students will receive a theoretical and practical understanding of how to approach the procedure of non-parametric data analysis, on what types of data it is possible to do this, what needs to be considered and how to interpret the data. A special place in the course is occupied by rank regression and loglinear regression as special cases of working with data that do not have a Gaussian distribution. All the learning process is based on R language with special libraries.
Learning Objectives
- Get an idea of what non-parametric statistics is and how it differs in the process of data analysis;
- Formulate a typical algorithm of actions for diagnosing data for the need for non-parametric statistics;
- Get a comprehensive understanding of the models for setting tasks for comparing and evaluating the effects of various factors on processes that are of a categorical nature;
- Master principles of data interpretation and analytical conclusions with applications to business-related data.
Expected Learning Outcomes
- Able to fit a logistic regression model on a given dataset
- Able to use R programming language for complex statistical computations
- Can test parametric and nonparametric hypotheses
- - Implement causal inference methods (matching, instrumental variables, regression discontinuity, difference-in-difference, fixed effects) - Identify which causal assumptions are necessary for each type of statistical method
- Become familiar with non-parametric statistics
- Students become familiar with the data loading process and EDA principles.
- Students are able to use different R packages for visualization of the distribution of data, as well as the interpretation of the received data.
- Students are able to use statistical functions for testing variables for the presence of an abnormal distribution and associated key characteristics, in order to subsequently choose a fundamentally different stack of functions and methods for further statistical data analysis
- Students get acquainted with special packages pdfCluster, BayesBinMix, functions suitable with DBSCAN approaches (clues, base-R).
- Students are introduced to the theoretical background and challenges of working with non-parametric data in order to obtain statistically valid inferences and interpretations
Course Contents
- Section 1: Course Structure. Types of data for NPS/EDA. The EDA framework
- Section 2: Statistical Distribution (Part 1): Theoretical Genesis.
- Section 3: Statistical Distribution (Part 2): Applied Principles.
- Section 4: NPS Cluster Analsysis: General Framework and Genesis
- Section 5: Non-Parametric Regression Adventures (Part 1): The Genesis of NP-reg and ordinal/nominal models
- Sections 6-8: Non-Parametric Regression Adventures (Part 2). Log-Linear Models. The Genesis of Contingency Tables
Assessment Elements
- Practice Task 1According to this assignment, you need to familiarize yourself with the structure of the R language and its specifics in order to dive into the course material. As part of the assignment, several exercises are proposed that are aimed at mastering basic (and advanced) functions of R, as well as learning how to work with MS Visual Studio Code / VSCodium and work with appropriate extensions. Tasks can be of a test and practical nature (it is assumed that the practical task is performed using Jupyter Notebook).
- Practice Task 2This task is devoted to sections 2-4. The task is a series of practical exercises. You need to consistently answer all the questions in order to solve the task. The work is done using R, in the Jupyter Notebooks environment.
- Practice Task 3This task is devoted to sections 5-8. The task is a series of practical exercises. You need to consistently answer all the questions in order to solve the task. The work is done using R, in the Jupyter Notebooks environment.
- Final ProjectLaboratory work involves working in a group of up to 3/4 people. The task of the laboratory work is to prepare a presentation of 15 smaller and over 47 larger slides, which is a mini-study of a certain set of data, with an emphasis on the materials of sections No. 2 to No. 6 (8). Laboratory work also implies a reflection of all procedures for non-parametric data analysis.
Interim Assessment
- 2024/2025 4th module0.35 * Final Project + 0.15 * Practice Task 1 + 0.25 * Practice Task 2 + 0.25 * Practice Task 3
Bibliography
Recommended Core Bibliography
- 9781292034898 - Agresti, Alan; Finlay, Barbara - Statistical Methods for the Social Sciences - 2014 - Pearson - https://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1418314 - nlebk - 1418314
- Agresti, A. (2013). Categorical Data Analysis (Vol. Third edition). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=769330
- Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=941245
- Agresti, A. (2017). Statistics: The Art and Science of Learning From Data, Global Edition. Pearson.
- Bruce E. Hansen, Donald W. K. Andrews, A. Ronald, Gallant Douglas, W. Nychka, & James G. Mackinnon. (n.d.). Semi-Nonparametric Maximum Likelihood Estimation. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.7BD2F74E
- Corder, G. W., & Foreman, D. I. (2014). Nonparametric Statistics : A Step-by-Step Approach (Vol. Second edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=798830
- Francois Treves. (2013). Topological Vector Spaces, Distributions and Kernels. [N.p.]: Dover Publications. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1151250
- Härdle, W., Müller, M., Sperlich, S. A., & Werwatz, A. (2004). Nonparametric and Semiparametric Models. Switzerland, Europe: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.121C8F13
- Myatt, G. J., & Johnson, W. P. (2014). Making Sense of Data I : A Practical Guide to Exploratory Data Analysis and Data Mining (Vol. Second edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=809795
- Wasserman, L. All of nonparametric statistics. – Springer Science & Business Media, 2006. – 270 pp.
Recommended Additional Bibliography
- Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131