Компьютерные методы анализа текста

Бакалавриат 2024/2025

Статус: Курс по выбору (Социология и социальная информатика)

Направление: 39.03.01. Социология

Кто читает: Департамент социологии

Где читается: Санкт-Петербургская школа социальных наук

Когда читается: 3-й курс, 1, 2 модуль

Формат изучения: без онлайн-курса

Охват аудитории: для всех кампусов НИУ ВШЭ

Преподаватели: Снарский Ярослав Александрович

Язык: английский

Кредиты: 5

Full Syllabus Ask Question

Abstract

For social science research, written text provide essential data for studying media and political discourse, ideology, conflict, sentiment and political affiliation and many other things. With a growing availability of larger digital collections of texts it is tempting to scale the research up in terms of the population studied (e.g. “all social media users of a town”), time spans (e.g. “all of the Post-Soviet history”), and geographical scope (e.g. “all educational migration in Russia”). Computational methods for text analysis are expected to help where traditional content analysis is not feasible. During the course we will cover basic word statistics, various exploratory methods, supervised and unsupervised modeling of text phenomena. Data Culture level (0.2.2 — Basic level: Programming + Data Analysis) will be achieved through studying methods of preprocessing and transformation of text data, as well as supervised and unsupervised methods of text analysis, such as topic modeling, classification, semantic analysis.

Learning Objectives

provide basic understanding on how to properly use collections of texts as quantitative evidence, and to make this knowledge practical

Expected Learning Outcomes

Being able to adequately interpret and report the results of computational text analysis in research papers.
Being able to apply computational methods of text analysis (e.g. analysis of word frequency and co-occurrence, document classification, topic modeling) to collections of texts
Being able to apply word embedding and clustering methods to downstream tasks, such as sentiment analysis, ideological scaling etc.
Understanding multidimenional representation of lexical meaning and the role of the dimensionality reduction.
Understanding possibilities of the automated text analysis as well as its pitfalls and important caveats about applying statistical tests to language data.

Course Contents

Text Prepocessing
Contrastive Analysis
Text Classification
Topic Modelling
Word Embedding

Assessment Elements

Homework
Studens complete homeworks to emnsure a more complete understaning of materials discussed in the classroom.
In-class assignment
Exam

Interim Assessment

2024/2025 2nd module
0.3 * Exam + 0.3 * Homework + 0.4 * In-class assignment

Bibliography

Recommended Core Bibliography

Bamman, D., Eisenstein, J., & Schnoebelen, T. (2012). Gender identity and lexical variation in social media. https://doi.org/10.1111/josl.12080
Jurafsky, D., Chahuneau, V., Routledge, B. R., & Smith, N. A. (2014). Narrative framing of consumer sentiment in online restaurant reviews. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.18543C32

Recommended Additional Bibliography

Text analysis in R. (2017). Communication Methods and Measures, 11(4), 245–265. https://doi.org/10.1080/19312458.2017.1387238

Authors

Zubarev Nikita Sergeevich
Ильина Мария Ивановна

Course Syllabus

Course Syllabus

Course Syllabus

Abstract

Learning Objectives

Expected Learning Outcomes

Course Contents

Assessment Elements

Interim Assessment

Bibliography

Recommended Core Bibliography

Recommended Additional Bibliography

Authors