• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2023/2024

Computer Tools for Linguistic Research

Category 'Best Course for New Knowledge and Skills'
Type: Compulsory course (Fundamental and Applied Linguistics)
Area of studies: Fundamental and Applied Linguistics
Delivered by: School of Fundamental and Applied Linguistics
When: 2 year, 3, 4 module
Mode of studies: distance learning
Online hours: 20
Open to: students of all HSE University campuses
Language: English
ECTS credits: 3
Contact hours: 52

Course Syllabus

Abstract

The course is aimed at imparting to students knowledge of modern computer tools and resources used in research in the fields of corpus, applied and computational linguistics, as well as teaching students to apply these tools and resources to linguistic problems. The computer tools covered in this course in-clude: concordancers, corpus managers, corpus-building (and bootstrapping) tools, lemmatizers, stemmers, morphological analyzers, part-of-speech taggers, syntactic and semantic taggers, regular expressions, as well as the text-processing capabilities of the Python programming language. The course involves conducted individual and group research and presenting the results to the class. Pre-requisites: basic Python programming skills, general knowledge of linguistics
Learning Objectives

Learning Objectives

  • The discipline is aimed at students' acquiring knowledge about current computer tools and resources used by linguists in research in the field of corpus, applied and computer linguistics, as well as practical skills in the use of these tools. Computer tools studied within the discipline include concordancers, corpus managers, programs for automatic corpus creation, lemmatizers, stemmers, morphological analyzers and automatic text markup, regular expressions, and Python programming language tools for processing text data.
Expected Learning Outcomes

Expected Learning Outcomes

  • Familiar with corpora of the Russian language
  • Familiar with the main stages of corpus preprocessing, able to build a corpus (manually and automatically)
  • Has an idea of Cipf's law, able to visualize syntax trees, use regular expressions, works with web interfaces of popular corpora, able to make corpora based on the web and explore ready corpora in AntConc
  • Has an idea of the periods of development of corpus linguistics, familiar with the main corpus of English
  • Understands the basic concepts of corpus linguistics, knows types and properties of corpora, able to obtain concordance. Understands the idea of using the web as a corpus, familiar with the criticism of corpus linguistics
  • The student writes technical documentation
  • The student is able to correctly label data
  • The student is able to analyze data using modern computer tools
  • The student is able to write dialogue scripts for chatbots, as well as put them into practice
  • The student is able to work in a team and distribute the load among team members
Course Contents

Course Contents

  • Acquiring technical writing skills
  • Acquiring data labeling skills
  • Acquiring data analytics skills
  • Acquiring developer-linguist skills
  • Acquiring project manager skills
Assessment Elements

Assessment Elements

  • non-blocking Class performance on the block "Corpus Studies"
  • Partially blocks (final) grade/grade calculation Control on the block "Scraping Research"
  • blocking Exam
  • non-blocking Homework on the block "Corpus Studies"
  • non-blocking Test on the block "Corpus Studies"
  • non-blocking Project 1 on the block "Corpus Studies"
  • non-blocking Project 2 on the block "Corpus Studies"
Interim Assessment

Interim Assessment

  • 2023/2024 4th module
    0.05 * Class performance on the block "Corpus Studies" + 0.4 * Control on the block "Scraping Research" + 0.2 * Exam + 0.05 * Homework on the block "Corpus Studies" + 0.1 * Project 1 on the block "Corpus Studies" + 0.1 * Project 2 on the block "Corpus Studies" + 0.1 * Test on the block "Corpus Studies"
Bibliography

Bibliography

Recommended Core Bibliography

  • Dipanjan Sarkar. (2019). Text Analytics with Python : A Practitioner’s Guide to Natural Language Processing: Vol. Second edition. Apress.
  • Perkins, J. Python Text Processing with NLTK 2.0 Cookbook: Use Python NLTK Suite of Libraries to Maximize Your Natural Language Processing Capabilities [Электронный ресурс] / Jacob Perkins; DB ebrary. – Birmingham: Packt Publishing Ltd, 2010. – 336 p.

Recommended Additional Bibliography

  • Fundamentals of project management, Heagney, J., 2012
  • Joseph Heagney. (2016). Fundamentals of Project Management: Vol. Fifth edition. AMACOM.
  • Грудева, Е. В. Корпусная лингвистика : учебное пособие / Е. В. Грудева. — 3-е изд. — Москва : ФЛИНТА, 2017. — 165 с. — ISBN 978-5-9765-1497-3. — Текст : электронный // Лань : электронно-библиотечная система. — URL: https://e.lanbook.com/book/106859 (дата обращения: 00.00.0000). — Режим доступа: для авториз. пользователей.

Authors

  • MALAFEEV Aleksei Iurevich
  • KHOMENKO ANNA Iurevna
  • Klimova Margarita Andreevna