Бакалавриат
2021/2022
Анализ данных в социологии
Статус:
Курс обязательный (Социология и социальная информатика)
Направление:
39.03.01. Социология
Кто читает:
Департамент социологии
Где читается:
Санкт-Петербургская школа социальных наук
Когда читается:
3-й курс, 3, 4 модуль
Формат изучения:
с онлайн-курсом
Онлайн-часы:
32
Охват аудитории:
для всех кампусов НИУ ВШЭ
Язык:
английский
Кредиты:
4
Контактные часы:
46
Course Syllabus
Abstract
The Data Analysis in Sociology in the 4th year of the Program focuses on categorical data and covers special types of prediction and classification models (logistic regression and cluster analysis). The course finishes with a discussion of data culture and data acumen, from data management to inference and prediction. This course is also the starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.
Learning Objectives
- The course covers the foundations and popular techniques of quantitative data analysis with the goal of training students to be informed producers and consumers of quantitative research.
- develop skills necessary to solve typical data analysis problems on social data in the R software environment
- develop skills necessary to solve typical problems in analysing social data in R software environment
Expected Learning Outcomes
- Students can apply a theoretical framework to define hypotheses and explain the results of a study; they can apply appropriate statistical models and generalize the results.
- Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models.
- Students can generalize and analyze the materials they read, assess it critically, express their own opinions and give their interpretation.
- Students can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results.
- Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
- Conduct statistical analyses in RStudio
- Create analytical reports describing all the stages of analysis and interpreting its results
- Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
- Perform data transformations
- Represent graphically the results of the statistical analyses
Course Contents
- Topic 18. Binary logistic regression
- Topic 19. Cluster analysis
- Topic 20. Data culture and data acumen
- Topic 21. Data management. Revised variable types
- Topic 22. Understanding causality and prediction
Assessment Elements
- Project 1Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
- Project 2Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
- Written ExamThe exam consists of two problems involving the methods covered in this course.
- Test 1If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
- Test 2If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
- Practical tasksAfter each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
- Project1Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
- ExamЭкзамен проводится в письменной форме. Экзамен проводится на платформе MsTeams: через модуль "Задания" всем рассылается задание экзамена; выполненную работу следует прикрепить также в модуле "Задания" MsTeams. В случае сбоя в работе MsTeams, студент также может направить выполненную работу на корпоративную почту преподавателя со своей корпоративной почты. На выполнение экзамена выделяется 2 дня. Вы можете начать в любое время, но рассчитайте свои силы и возможности так, чтобы уложиться до дедлайна. Компьютер студента должен удовлетворять требованиям: подключение к интернету, предустановленный RStudio одной из последних версий. Во время экзамена студентам запрещено кооперироваться и коллективно выполнять задание. Во время экзамена студентам разрешено пользоваться любыми источниками - учебниками, интернетом. Долговременным нарушением связи во время экзамена считается отсутствие интернета в течение всего времени экзамена/ отсутствие доступа к компьютеру в течение всего срока экзамена. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи. О проблемах со связью или доступом к компьютеру студент должен сообщить преподавателю незамедлительно (как только появится такая возможность). При своевременном сообщении о проблеме каждый случай технических неполадок будет рассматриваться отдельно, решение о возможности и форме прохождения экзамена будет выноситься индивидуально.
- Project2There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
- ProjectsLate submissions are not considered (try us). If you are ill during the project submission, present a medical certificate to get the formula adjusted for you. If you miss more than one project, there might be a makeup assignment. When you submit a project in MS Teams, you must click on the "Turn in" button to complete the submission. All projects are, first, posted to the dedicated channel where they are peer-reviewed, and submitted in the Assignments section by each contributing student. If you have any questions about the project, sign up for a consultation.
- In-class activity
- Exam
- Project 1Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
- Project 2Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
- Written ExamThe exam consists of two problems involving the methods covered in this course.
- Test 1If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
- Test 2If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
- Short tests
- MOOC completion
- Mid-Term Test
Interim Assessment
- 2020/2021 4th module0.05 * MOOC completion + 0.1 * In-class activity + 0.15 * Mid-Term Test + 0.1 * Short tests + 0.4 * Projects
- 2021/2022 4th module0.2 * Exam + 0.2 * Project2 + 0.4 * Practical tasks + 0.2 * Project1
- 2022/2023 3rd module0.3 * Written Exam + 0.25 * Project 2 + 0.1 * Test 1 + 0.25 * Project 1 + 0.1 * Test 2
Bibliography
Recommended Core Bibliography
- Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
- Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878
Recommended Additional Bibliography
- Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
- Valentin Amrhein, David Trafimow, & Sander Greenland. (2019). Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician, (S1), 262. https://doi.org/10.1080/00031305.2018.1543137