Master
2020/2021
Linguistic Data: Quantitative Analysis and Visualisation
Category 'Best Course for Career Development'
Category 'Best Course for Broadening Horizons and Diversity of Knowledge and Skills'
Category 'Best Course for New Knowledge and Skills'
Type:
Compulsory course (Linguistic Theory and Language Description)
Area of studies:
Fundamental and Applied Linguistics
Delivered by:
School of Linguistics
Where:
Faculty of Humanities
When:
2 year, 1, 2 module
Mode of studies:
offline
Instructors:
Daria Popova
Master’s programme:
Linguistic Theory and Language Description
Language:
English
ECTS credits:
3
Contact hours:
32
Course Syllabus
Abstract
Preprocessing of linguistic data in Python is designed to further the students’ knowledge of natural language processing and to polish their programming skills. The course aims to provide the students with the programming and natural language processing knowledge and competencies necessary to plan and conduct research projects of their own leading to the M.Sc. dissertation and scientific publications.
Learning Objectives
- to further the students’ programming skills
- to provide the students with the necessary skills to write programs for experiments and corpus studies
- to teach the students how to re-format data
- to teach the students how to retrieve data from the Internet
- to teach the students how to write their code so that it is readable by other linguists
- to teach the students how to present their research that involves coding in the written and in the oral form
- to provide an overview of some of the most exciting current computational projects
- to teach the students how to read and to assess critically linguistic research that uses computational methods
- to teach the students how to formulate linguistic questions in a way that can be addressed computationally
- to teach the students to conduct independent computational studies
Expected Learning Outcomes
- is able to re-format data
- writes programs (code) for experiments and corpus studies
- writes their code so that it is readable by other linguists and programmers
- retrieves data from the Internet
- presents their research that involves coding in the written and in the oral form
- reads and assesses critically linguistic research that uses computational methods
- formulates linguistic questions in a way that can be addressed computationally
- conducts independent natural language processing studies
Course Contents
- Datatypes and variablesVariables assignment, basic datatypes, mutability.
- Control structuresGrouping and indentation. If, for, while, break and continue.
- Input and outputCommand-line input, keyboard input, file input and output.
- Subroutines and modulesSimple functions, functions that return values, functions that take arguments, recursive functions, modules, writing modules. Classes.
- Regular expressionsMatching, searching for patterns, patterns.
- Text manipulationTokenization, stemming, parsing different data formats.
- Internet data
- Retrieving webpages, HTML, parsing HTML, webcrawlers.
- Different data formats: csv, databases, json
- Basics of web design: creating a web site for a linguistic experiment
- Word2vec
- Graphs in Python
Assessment Elements
- homework assignment 1
- homework assignment 2
- in-class presentation
- экзамен (final project)
Interim Assessment
- Interim assessment (2 module)0.25 * homework assignment 1 + 0.25 * homework assignment 2 + 0.2 * in-class presentation + 0.3 * экзамен (final project)
Bibliography
Recommended Core Bibliography
- Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632
Recommended Additional Bibliography
- Romano, F. (2015). Learning Python. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1133614