Nonparametric Theory and Data Analysis

2024/2025

Category 'Best Course for Career Development'

Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'

Category 'Best Course for New Knowledge and Skills'

Type: Mago-Lego

Delivered by: International Laboratory for Applied Network Research

When: 3, 4 module

Open to: students of one campus

Instructors: Stanislav Pashkov

Language: English

ECTS credits: 6

Full Syllabus Ask Question

Abstract

This course is devoted to a separate section of statistical theory dealing with non-parametric methods of statistical data analysis (non-parametric statistics, NPS). This section of statistics is very often used in conjunction with more “classical” approaches based on Gaussian statistics, but it is arranged differently and requires a special approach to understanding and interpretation. Currently, more and more business decisions are made on the basis of data measured in categorical and rank scales, and therefore the relevance of this type of data analysis is increasing. Throughout the course, students will receive a theoretical and practical understanding of how to approach the procedure of non-parametric data analysis, on what types of data it is possible to do this, what needs to be considered and how to interpret the data. A special place in the course is occupied by rank regression and loglinear regression as special cases of working with data that do not have a Gaussian distribution. All the learning process is based on R language with special libraries.

Learning Objectives

Get an idea of what non-parametric statistics is and how it differs in the process of data analysis;
Formulate a typical algorithm of actions for diagnosing data for the need for non-parametric statistics;
Get a comprehensive understanding of the models for setting tasks for comparing and evaluating the effects of various factors on processes that are of a categorical nature;
Master principles of data interpretation and analytical conclusions with applications to business-related data.

Expected Learning Outcomes

Able to fit a logistic regression model on a given dataset
Able to use R programming language for complex statistical computations
Can test parametric and nonparametric hypotheses
- Implement causal inference methods (matching, instrumental variables, regression discontinuity, difference-in-difference, fixed effects) - Identify which causal assumptions are necessary for each type of statistical method
Become familiar with non-parametric statistics
Students become familiar with the data loading process and EDA principles.
Students are able to use different R packages for visualization of the distribution of data, as well as the interpretation of the received data.
Students are able to use statistical functions for testing variables for the presence of an abnormal distribution and associated key characteristics, in order to subsequently choose a fundamentally different stack of functions and methods for further statistical data analysis
Students get acquainted with special packages pdfCluster, BayesBinMix, functions suitable with DBSCAN approaches (clues, base-R).
Students are introduced to the theoretical background and challenges of working with non-parametric data in order to obtain statistically valid inferences and interpretations

Course Contents

Section 1: Course Structure. Types of data for NPS/EDA. The EDA framework
Section 2: Statistical Distribution (Part 1): Theoretical Genesis.
Section 3: Statistical Distribution (Part 2): Applied Principles.
Section 4: NPS Cluster Analsysis: General Framework and Genesis
Section 5: Non-Parametric Regression Adventures (Part 1): The Genesis of NP-reg and ordinal/nominal models
Sections 6-8: Non-Parametric Regression Adventures (Part 2). Log-Linear Models. The Genesis of Contingency Tables

Assessment Elements

Practice Task 1
According to this assignment, you need to familiarize yourself with the structure of the R language and its specifics in order to dive into the course material. As part of the assignment, several exercises are proposed that are aimed at mastering basic (and advanced) functions of R, as well as learning how to work with MS Visual Studio Code / VSCodium and work with appropriate extensions. Tasks can be of a test and practical nature (it is assumed that the practical task is performed using Jupyter Notebook).
Practice Task 2
This task is devoted to sections 2-4. The task is a series of practical exercises. You need to consistently answer all the questions in order to solve the task. The work is done using R, in the Jupyter Notebooks environment.
Practice Task 3
This task is devoted to sections 5-8. The task is a series of practical exercises. You need to consistently answer all the questions in order to solve the task. The work is done using R, in the Jupyter Notebooks environment.
Final Project
Laboratory work involves working in a group of up to 3/4 people. The task of the laboratory work is to prepare a presentation of 15 smaller and over 47 larger slides, which is a mini-study of a certain set of data, with an emphasis on the materials of sections No. 2 to No. 6 (8). Laboratory work also implies a reflection of all procedures for non-parametric data analysis.

Interim Assessment

2024/2025 4th module
0.35 * Final Project + 0.15 * Practice Task 1 + 0.25 * Practice Task 2 + 0.25 * Practice Task 3

Bibliography

Recommended Core Bibliography

9781292034898 - Agresti, Alan; Finlay, Barbara - Statistical Methods for the Social Sciences - 2014 - Pearson - https://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1418314 - nlebk - 1418314
Agresti, A. (2013). Categorical Data Analysis (Vol. Third edition). Hoboken, NJ: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=769330
Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=941245
Agresti, A. (2017). Statistics: The Art and Science of Learning From Data, Global Edition. Pearson.
Bruce E. Hansen, Donald W. K. Andrews, A. Ronald, Gallant Douglas, W. Nychka, & James G. Mackinnon. (n.d.). Semi-Nonparametric Maximum Likelihood Estimation. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.7BD2F74E
Corder, G. W., & Foreman, D. I. (2014). Nonparametric Statistics : A Step-by-Step Approach (Vol. Second edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=798830
Francois Treves. (2013). Topological Vector Spaces, Distributions and Kernels. [N.p.]: Dover Publications. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1151250
Härdle, W., Müller, M., Sperlich, S. A., & Werwatz, A. (2004). Nonparametric and Semiparametric Models. Switzerland, Europe: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.121C8F13
Myatt, G. J., & Johnson, W. P. (2014). Making Sense of Data I : A Practical Guide to Exploratory Data Analysis and Data Mining (Vol. Second edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=809795
Wasserman, L. All of nonparametric statistics. – Springer Science & Business Media, 2006. – 270 pp.

Recommended Additional Bibliography

Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Authors

Pavlova Irina Anatolevna
PASHKOV STANISLAV GEORGIEVICH

Course Syllabus