• A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Бакалавриат 2021/2022

Анализ данных в социологии

Статус: Курс обязательный (Социология и социальная информатика)
Направление: 39.03.01. Социология
Когда читается: 3-й курс, 3, 4 модуль
Формат изучения: с онлайн-курсом
Онлайн-часы: 32
Охват аудитории: для всех кампусов НИУ ВШЭ
Язык: английский
Кредиты: 4
Контактные часы: 46

Course Syllabus

Abstract

The Data Analysis in Sociology in the 4th year of the Program focuses on categorical data and covers special types of prediction and classification models (logistic regression and cluster analysis). The course finishes with a discussion of data culture and data acumen, from data management to inference and prediction. This course is also the starting point for students interested in pursuing advanced training in research methods or planning to use quantitative methods with categorical outcomes in their own research.
Learning Objectives

Learning Objectives

  • The course covers the foundations and popular techniques of quantitative data analysis with the goal of training students to be informed producers and consumers of quantitative research.
  • develop skills necessary to solve typical data analysis problems on social data in the R software environment
  • develop skills necessary to solve typical problems in analysing social data in R software environment
Expected Learning Outcomes

Expected Learning Outcomes

  • Students can apply a theoretical framework to define hypotheses and explain the results of a study; they can apply appropriate statistical models and generalize the results.
  • Students can carry out statistical analyses of a data set, propose hypotheses and choose the methods needed to reach the goals, interpret the results and assess the quality of proposed solutions. Students provide reasons for their choice of techniques, interpret the outputs correctly, and assess the quality of their own and others’ models.
  • Students can generalize and analyze the materials they read, assess it critically, express their own opinions and give their interpretation.
  • Students can set research goals, propose a research plan based on the results of previous research and social theory, carry out data analysis and report the results.
  • Choose appropriate methods and techniques for certain types of variables and certain aims of the analysis
  • Conduct statistical analyses in RStudio
  • Create analytical reports describing all the stages of analysis and interpreting its results
  • Give meaningful interpretation of statistical results: regression coefficients, tables, plots and diagrams (produced in R)
  • Perform data transformations
  • Represent graphically the results of the statistical analyses
Course Contents

Course Contents

  • Topic 18. Binary logistic regression
  • Topic 19. Cluster analysis
  • Topic 20. Data culture and data acumen
  • Topic 21. Data management. Revised variable types
  • Topic 22. Understanding causality and prediction
Assessment Elements

Assessment Elements

  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Practical tasks
    After each seminar, students are assigned a practical task which should be completed until Friday, 12 p.m.
  • non-blocking Project1
    Project. There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Exam
    Экзамен проводится в письменной форме. Экзамен проводится на платформе MsTeams: через модуль "Задания" всем рассылается задание экзамена; выполненную работу следует прикрепить также в модуле "Задания" MsTeams. В случае сбоя в работе MsTeams, студент также может направить выполненную работу на корпоративную почту преподавателя со своей корпоративной почты. На выполнение экзамена выделяется 2 дня. Вы можете начать в любое время, но рассчитайте свои силы и возможности так, чтобы уложиться до дедлайна. Компьютер студента должен удовлетворять требованиям: подключение к интернету, предустановленный RStudio одной из последних версий. Во время экзамена студентам запрещено кооперироваться и коллективно выполнять задание. Во время экзамена студентам разрешено пользоваться любыми источниками - учебниками, интернетом. Долговременным нарушением связи во время экзамена считается отсутствие интернета в течение всего времени экзамена/ отсутствие доступа к компьютеру в течение всего срока экзамена. При долговременном нарушении связи студент не может продолжить участие в экзамене. Процедура пересдачи аналогична процедуре сдачи. О проблемах со связью или доступом к компьютеру студент должен сообщить преподавателю незамедлительно (как только появится такая возможность). При своевременном сообщении о проблеме каждый случай технических неполадок будет рассматриваться отдельно, решение о возможности и форме прохождения экзамена будет выноситься индивидуально.
  • non-blocking Project2
    There are three basic features assessed: correct calculations and correct code (syntax); correct interpretations – students must describe trends properly, assess significance of the results, and predict values of dependent variable correctly; and produce correct graphics, with proper types of plots and formatting applied.
  • non-blocking Projects
    Late submissions are not considered (try us). If you are ill during the project submission, present a medical certificate to get the formula adjusted for you. If you miss more than one project, there might be a makeup assignment. When you submit a project in MS Teams, you must click on the "Turn in" button to complete the submission. All projects are, first, posted to the dedicated channel where they are peer-reviewed, and submitted in the Assignments section by each contributing student. If you have any questions about the project, sign up for a consultation.
  • non-blocking In-class activity
  • non-blocking Exam
  • non-blocking Project 1
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Project 2
    Two individual projects are due, on binary logistic regression and on cluster analysis. Two projects sum up to a student’s portfolio. Specific project requirements are available in LMS. If the student has a respected reason to miss the project deadline, the student should inform the instructor about it before the deadline and can submit later, as agreed with the instructor. The documents confirming the student's absence are to be presented no later than two weeks after the initial deadline, otherwise, they will not be considered.
  • non-blocking Written Exam
    The exam consists of two problems involving the methods covered in this course.
  • non-blocking Test 1
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Test 2
    If the student has a respected reason to miss the test, the student should inform the instructor about it before the test. The documents confirming the student's absence are to be presented no later than two weeks after the test, otherwise, they will not be considered.
  • non-blocking Short tests
  • non-blocking MOOC completion
  • non-blocking Mid-Term Test
Interim Assessment

Interim Assessment

  • 2020/2021 4th module
    0.05 * MOOC completion + 0.1 * In-class activity + 0.15 * Mid-Term Test + 0.1 * Short tests + 0.4 * Projects
  • 2021/2022 4th module
    0.2 * Exam + 0.2 * Project2 + 0.4 * Practical tasks + 0.2 * Project1
  • 2022/2023 3rd module
    0.3 * Written Exam + 0.25 * Project 2 + 0.1 * Test 1 + 0.25 * Project 1 + 0.1 * Test 2
Bibliography

Bibliography

Recommended Core Bibliography

  • Ledolter, J. (2013). Data Mining and Business Analytics with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=587979
  • Upton, G. J. G. (2016). Categorical Data Analysis by Example. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1402878

Recommended Additional Bibliography

  • Mood, C. (2010). Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can Do About It. European Sociological Review, 26(1), 67–82. https://doi.org/10.1093/esr/jcp006
  • Valentin Amrhein, David Trafimow, & Sander Greenland. (2019). Inferential Statistics as Descriptive Statistics: There Is No Replication Crisis if We Don’t Expect Replication. The American Statistician, (S1), 262. https://doi.org/10.1080/00031305.2018.1543137

Authors

  • KORSUNOVA VIOLETTA IGOREVNA
  • SHIROKANOVA ANNA ALEKSANDROVNA
  • TITKOVA VERA VIKTOROVNA