We use cookies in order to improve the quality and usability of the HSE website. More information about the use of cookies is available here, and the regulations on processing personal data can be found here. By continuing to use the site, you hereby confirm that you have been informed of the use of cookies by the HSE website and agree with our rules for processing personal data. You may disable cookies in your browser settings.

  • A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Master 2020/2021

Linguistic Data: Quantitative Analysis and Visualisation

Category 'Best Course for Career Development'
Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'
Area of studies: Fundamental and Applied Linguistics
Delivered by: School of Linguistics
When: 1 year, 3, 4 module
Mode of studies: distance learning
Open to: students of one campus
Instructors: Olga Lyashevskaya, Ivan Pozdniakov, Ilya Schurov
Master’s programme: Linguistic Theory and Language Description
Language: English
ECTS credits: 3
Contact hours: 64

Course Syllabus

Abstract

The course is devoted to modern methods of data analysis, as applied to linguistic data, including methods of statistical inference and explanatory data analysis with visualizations. We begin with theoretical background in mathematical statistics and discuss limitations of statistical methods and their applicability to linguistical problems. From practical point of view, we use R system to do actual analysis with real datasets. We also discuss different visualization techniques using popular library ggplot2.
Learning Objectives

Learning Objectives

  • Within this course you will: ● learn about the principal steps of a quantitative research in linguistics; ● learn about the possibilities and limitations of quantitative approaches as applied to different research questions; ● learn to formulate research questions and develop them into testable hypotheses; ● explore the possibilities of data collection and different approaches to sampling; ● learn to evaluate the quality of a quantitative approach; ● study the most common corpus, experimental, and mixed design of the linguistic studies and learn to evaluate research plans, discover and prevent the associated threats to data validity; ● practice in preparing your quantitative data for analysis, evaluating the quality of your data; treating missing data; ● learn about the possibilities and limitations of conventional statistical techniques and criteria, as well as some popular contemporary multivariate statistical methods; ● learn to choose and apply in practice a set of appropriate statistical tests for your research question.
Expected Learning Outcomes

Expected Learning Outcomes

  • are able to account for basic types of data used in linguistic research
  • are able to apply basic quantitative methods for analysing linguistic data
  • are able to critically discuss the limitations of commonly used methods for answering research questions about language
  • are able to reason on how to interpret linguistic results, including how to evaluate what kind of information a given method can offer and how to estimate the potential range of variables that can affect results in linguistic research
  • are able to critically evaluate linguistic data presented in previous research
  • are able to apply different techniques for presenting both qualitative and quantitative linguistic data in scholarly writing
Course Contents

Course Contents

  • №1
    Introduction to R. Types of data. Dataframe. Functions and arguments.
  • №2
    Descriptive statistics. Basic visualizations.
  • №3.
    Dplyr style in R, pipes. Visualizing data with ggplot2.
  • №4.
    Hypothesis testing. Types of distribution. P-values. Exact binomial test, t-test, ANOVA. Confidence intervals. Chi-squared and Fisher exact test.
  • №5.
    Correlation.
  • №6.
    Regressionsː linear and polynomial.
  • №7.
    Logistic regression.
  • №8.
    Fixed and random effects. Mixed-effects models.
  • №9.
    Bootstrap. Decision trees. Decision forests.
  • №10.
    Distance matrices. Clusterization.
  • №11.
    Dimension reduction, visualisations using MDS, PCA, CA, MCA.
  • №12.
    Bayesian statistics.
Assessment Elements

Assessment Elements

  • non-blocking homeworks
    Written assignments includes theoretical tests and practical problem-solving. The assignments are published online. The assignments should be submitted via an electronic form.
  • non-blocking exam
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.4 * exam + 0.6 * homeworks
Bibliography

Bibliography

Recommended Core Bibliography

  • Wickham H. ggplot2: elegant graphics for data analysis. Second edition. Cham: Springer, 2016. 260 p.

Recommended Additional Bibliography

  • Stowell, Sarah (2014). Using R for Statistics. Apress. https://link.springer.com/book/10.1007%2F978-1-4842-0139-8