Master
2020/2021
Data Analysis
Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'
Type:
Compulsory course (System and Software Engineering)
Area of studies:
Software Engineering
Delivered by:
School of Software Engineering
Where:
Faculty of Computer Science
When:
1 year, 3, 4 module
Mode of studies:
offline
Instructors:
Alisa Melikyan
Master’s programme:
Software and Systems Engineering
Language:
English
ECTS credits:
4
Contact hours:
64
Course Syllabus
Abstract
The course is taught to students of a master degree of Computer science faculty in NRU HSE in the third and fourth modules of the first year of training. The number of credits is 4. Training in an audience takes 64 hours, including 24 hours of lectures and 40 hours of seminars. The control includes in-class tasks, a homework, a control work, and an examination work.The main purpose of the course is to teach students how to use different data analysis methods to analyze real data.
Learning Objectives
- give students an introduction to the most widely used data analysis methods
- explain the data analysis methods using real data and concentrating on complications that may occur during the analysis in real-life research
- teach students how to organize their own research project using the knowledge obtained during the course
- explain how to use data analysis tools in the most effective way to perform the research tasks
Expected Learning Outcomes
- select appropriate methods of data analysis depending on the research question and types of empirical data
- prepare empirical data for their further analysis
- formulate research hypotheses and construct models
- create a regression model and describe it
- create a factor model and describe it
- create a cluster model and describe it
Course Contents
- Introduction to data analysisStatistical packages and programming languages for data analysis. Data sources. Working with data (exploring data, entering new data, coding variables, preparing data for analysis, export/import of the data, modifying data).
- Descriptive data analysisFrequency analysis. Graphical analysis. Statistical characteristics: central tendency estimations, dispersion, standard deviation, standard error of mean, confidence interval, percentile values, measuring symmetry and pointiness of distribution. Normal distribution, Z-standardization, Kolmogorov-Smirnov test of normality. Working with multiple response questions.
- Investigating relationships between variablesCross tabulation analysis. Formulation and testing hypothesis. Level of significance and first type error. Chi-square test. Correlation coefficients: bivariate, part and partial. T-tests. ANOVA. Non-parametric tests.
- Regression analysisObjectives of regression analysis. Graphical representation of regression line. Simple and multiple linear regression. Logistic regression. Interpreting results of regression analysis. Multicollinearity. Heteroscedasticity. Dummy variables. Regression model limitations and diagnostics.
- Factor analysisFactor analysis steps. Evaluating applicability of data for factor analysis. Methods of factor analysis. Factor loading, rotation. Saving factors as new variables. Interpreting factors.
- Cluster analysisCluster analysis steps. Evaluating applicability of data for cluster analysis. Methods of cluster analysis: hierarchical and k-means. Saving cluster membership information as new variable. Characterizing clusters.
- Panel data analysisAdvantages and problems of using panel data. Classification of panel data models. Panel data regression estimation methods. Models with fixed and random effects. Criteria for choosing the optimal model.
- Time series analysisStationary and non-stationary time series. Forecasting values for future periods. Autoregressive models, integral models and moving average models (ARIMA).
Assessment Elements
- Tasks in class(TC)tasks which are performed in class and are aimed at developing students’ skills in data analysis
- Homework (HW)
- Control Work (CW)two written works which are performed in class
- Examination Work (EW)Экзамен проводится в письменной форме. Экзамен проводится на платформе MS Teams (https://www.microsoft.com/ru-ru/microsoft-365/microsoft-teams/group-chat-software). К экзамену необходимо подключиться за 5 минут до начала. Компьютер студента должен удовлетворять требованиям: наличие рабочей камеры и микрофона, установленное приложение MS Teams. Для участия в экзамене студент обязан явиться на экзамен согласно точному расписанию и быть готовым отвечать на вопросы преподавателя с включённым микрофоном и камерой. Во время экзамена студентам запрещено пользоваться подсказками посторонних людей. Во время экзамена студентам разрешено задавать преподавателю уточняющие вопросы, если не понятно задание. Кратковременным нарушением связи во время экзамена считается нарушение связи менее 10 минут. Долговременным нарушением связи во время экзамена считается нарушение длительностью более 10 минут. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи.
Interim Assessment
- Interim assessment (4 module)0.3 * Control Work (CW) + 0.3 * Examination Work (EW) + 0.2 * Homework (HW) + 0.2 * Tasks in class(TC)
Bibliography
Recommended Core Bibliography
- Core concepts in data analysis: summarization, correlation and visualization, Mirkin, B., 2011
- Introduction to econometrics, Dougherty, C., 2016
Recommended Additional Bibliography
- Idris, I. (2016). Python Data Analysis Cookbook. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1290098
- McKinney, W. (2018). Python for Data Analysis : Data Wrangling with Pandas, NumPy, and IPython (Vol. Second edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1605925