• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
Bachelor 2023/2024

Introduction to Text Mining with R

Area of studies: Fundamental and Applied Linguistics
Delivered by: School of Linguistics
When: 4 year, 3 module
Mode of studies: distance learning
Online hours: 20
Open to: students of all HSE University campuses
Instructors: Ilya Makarchuk
Language: English
ECTS credits: 3
Contact hours: 6

Course Syllabus

Abstract

In this online course, you will learn about the next big thing in applied analytics – text analysis. This course is self-contained: you will learn everything from basic programming skills to advanced natural language modelling for topic discovery. This course is designed around a problem-oriented approach, meaning that we will not spend too much time learning theoretical concepts but instead focus on applying them to practical problems.a. The goal of this online course is to equip students with the necessary knowledge and skills for analysing text data with R programming language.b. We do not assume any specific prerequisites for this course. However, some knowledge of natural language processing or R programming might ease the dive into the course materials.c. Each week on the course is accompanied by tests, gradable and non-gradable programming assignments, and links to additional material for those who want to dig deeper into the course material. At the end of the course, you’ll have to complete a project and then review your peers' projects.d. R (programming language), RStudioe. This course is heavily tilted toward practical skills. During this course, students will dive into the basics of R for text analysis, tidy text approach, regular expressions, different algorithms for topic modelling and text classification with machine learning and deep learning approaches, and many more. Various synthetic and real-world databases will help participants see how to apply these techniques to extract insights from user reviews, social media posts, short descriptions of the products. This distance learning opportunity is brought to you by HSE University, one of the top think tanks in Russia, by instructors experienced in using text analysis for business-oriented projects.The online course consists on short pre-recorded lectures, 5 to 15 minutes in length.Each week will have a graded test with 10 to 15 questions. At the end of the last week, students will have to complete a project utilising the skills learned in the course, and then review and grade the projects of their peers. The course gives students an opportunity to learn the methods on natural language processing (NLP) and then apply these methods to problems in students’ own areas of interest.
Learning Objectives

Learning Objectives

  • The goal of this online course is to equip students with the necessary knowledge and skills for analysing text data with R programming language.
Expected Learning Outcomes

Expected Learning Outcomes

  • student has the necessary knowledge and skills for analysing text data with R programming language
  • student is familiar with the basics of R for text analysis, tidy text approach, regular expressions, different algorithms for topic modelling and text classification with machine learning and deep learning approaches
Course Contents

Course Contents

  • R and RStudio Basics
  • Working with Tidyverse
  • Supervised machine learning with the bag-of-words approach
  • Unsupervised machine learning
Assessment Elements

Assessment Elements

  • non-blocking Test
    Each week on the course is accompanied by tests, gradable and non-gradable programming assignments
  • non-blocking Final Project
    You will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.
Interim Assessment

Interim Assessment

  • 2023/2024 3rd module
    The final grade is the grade for the online course.
Bibliography

Bibliography

Recommended Core Bibliography

  • Derryberry, D. R. (2014). Basic Data Analysis for Time Series with R. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=817454

Recommended Additional Bibliography

  • Bivand, R., Pebesma, E. J., & Gómez-Rubio, V. (2013). Applied Spatial Data Analysis with R (Vol. 2nd ed). New York, NY: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=601853

Authors

  • LANDER Iurii ALEKSANDROVICH