Магистратура
2020/2021
Анализ лингвистических данных: квантитативные методы и визуализация
Лучший по критерию «Полезность курса для Вашей будущей карьеры»
Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Лучший по критерию «Новизна полученных знаний»
Статус:
Курс обязательный (Лингвистическая теория и описание языка)
Направление:
45.04.03. Фундаментальная и прикладная лингвистика
Кто читает:
Школа лингвистики
Где читается:
Факультет гуманитарных наук
Когда читается:
2-й курс, 1, 2 модуль
Формат изучения:
без онлайн-курса
Преподаватели:
Попова Дарья Павловна
Прогр. обучения:
Лингвистическая теория и описание языка
Язык:
английский
Кредиты:
3
Контактные часы:
32
Course Syllabus
Abstract
Preprocessing of linguistic data in Python is designed to further the students’ knowledge of natural language processing and to polish their programming skills. The course aims to provide the students with the programming and natural language processing knowledge and competencies necessary to plan and conduct research projects of their own leading to the M.Sc. dissertation and scientific publications.
Learning Objectives
- to further the students’ programming skills
- to provide the students with the necessary skills to write programs for experiments and corpus studies
- to teach the students how to re-format data
- to teach the students how to retrieve data from the Internet
- to teach the students how to write their code so that it is readable by other linguists
- to teach the students how to present their research that involves coding in the written and in the oral form
- to provide an overview of some of the most exciting current computational projects
- to teach the students how to read and to assess critically linguistic research that uses computational methods
- to teach the students how to formulate linguistic questions in a way that can be addressed computationally
- to teach the students to conduct independent computational studies
Expected Learning Outcomes
- is able to re-format data
- writes programs (code) for experiments and corpus studies
- writes their code so that it is readable by other linguists and programmers
- retrieves data from the Internet
- presents their research that involves coding in the written and in the oral form
- reads and assesses critically linguistic research that uses computational methods
- formulates linguistic questions in a way that can be addressed computationally
- conducts independent natural language processing studies
Course Contents
- Datatypes and variablesVariables assignment, basic datatypes, mutability.
- Control structuresGrouping and indentation. If, for, while, break and continue.
- Input and outputCommand-line input, keyboard input, file input and output.
- Subroutines and modulesSimple functions, functions that return values, functions that take arguments, recursive functions, modules, writing modules. Classes.
- Regular expressionsMatching, searching for patterns, patterns.
- Text manipulationTokenization, stemming, parsing different data formats.
- Internet data
- Retrieving webpages, HTML, parsing HTML, webcrawlers.
- Different data formats: csv, databases, json
- Basics of web design: creating a web site for a linguistic experiment
- Word2vec
- Graphs in Python
Assessment Elements
- homework assignment 1
- homework assignment 2
- in-class presentation
- экзамен (final project)
Interim Assessment
- Interim assessment (2 module)0.25 * homework assignment 1 + 0.25 * homework assignment 2 + 0.2 * in-class presentation + 0.3 * экзамен (final project)
Bibliography
Recommended Core Bibliography
- Perkins, J. (2014). Python 3 Text Processing with NLTK 3 Cookbook. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=836632
Recommended Additional Bibliography
- Romano, F. (2015). Learning Python. Birmingham: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1133614