• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site
2024/2025

Introduction to Text Mining in R

Type: Mago-Lego
When: 3 module
Online hours: 40
Open to: students of one campus
Language: English
ECTS credits: 3
Contact hours: 12

Course Syllabus

Abstract

The course covers topics in the process of exploring and analyzing large amounts of unstructured text data for identify attributes in the data (concepts, keywords, patterns) with R
Learning Objectives

Learning Objectives

  • The course gives students an important foundation to develop and conduct their own research to analysis of language data using machine learning, statistics, and linguistics.
Expected Learning Outcomes

Expected Learning Outcomes

  • Be able to work with modules and packages of R in the RStudio and be able to combine formatted text and R code using RMarkdown
  • Be able to prepare data (import texts, preprocess texts, convert texts into a document-feature matrix) with R
  • Be able to build, interpret and evaluate supervised learning models for textual data.
  • Be able to analyze data (count specific patterns, use machine learning models) with R
  • Be able to perform sentiment analysis of text data
  • Know group of Text Mining methods (basic preprocessing of the text, selection of high-frequency words, thematic and sentiment analysis) for research in a wide range of disciplines such as psychology, economics, education, as well as political and social sciences.
  • Be able to use R to collect text data from online sources.
Course Contents

Course Contents

  • Introduction to basic R data types and structures, work with RStudio and RMarkdown
  • Modules and packages of R
  • Supervised machine learning with the bag-of-words approach
  • Unsupervised machine learning
  • Sentiment analysis
  • Text data acquisition
Assessment Elements

Assessment Elements

  • non-blocking Quizzes
  • non-blocking Final project
Interim Assessment

Interim Assessment

  • 2024/2025 3rd module
    0.4 * Final project + 0.6 * Quizzes
Bibliography

Bibliography

Recommended Core Bibliography

  • An introduction to R : a programming environment for data analysis and graphics, Venables, W. N., 2009
  • Berry, M. W., & Kogan, J. (2010). Text Mining : Applications and Theory. Chichester, U.K.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=314553
  • Munzert, S. (2014). Automated Data Collection with R : A Practical Guide to Web Scraping and Text Mining. HobokenChichester, West Sussex, United Kingdom: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=878670
  • Silge, J., & Robinson, D. (2017). Text Mining with R : A Tidy Approach (Vol. First edition). Sebastopol, CA: O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=nlebk&AN=1533983
  • Text mining in practice with R, Kwartler, T., 2017
  • Text mining with R : a tidy approach, Silge, J., 2017
  • The text mining handbook : advanced approaches in analyzing unstructured data, Feldman, R., 2009
  • Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131

Recommended Additional Bibliography

  • Advanced positioning, flow, and sentiment analysis in commodity markets : bridging fundamental and technical analysis, Keenan, M. J. S., 2020
  • Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data, Osborne, J. W., 2013
  • Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
  • Deepti Gupta. (2018). Applied Analytics Through Case Studies Using SAS and R : Implementing Predictive Models and Machine Learning Techniques. Apress.
  • From Text Mining to Visual Classification: Rethinking Computational New Cinema History with Jean Desmet’s Digitised Business Archive. (2018). Tijdschrift Voor Mediageschiedenis, 21(2), 127–145.
  • Kao, A., Poteet S. Natural Language Processing and Text Mining. - Springer, 2007. - ЭБС Books 24x7.
  • Matt Wiley, & Joshua F. Wiley. (2019). Advanced R Statistical Programming and Data Models : Analysis, Machine Learning, and Visualization. Apress.
  • Neustein, A. (2014). Text Mining of Web-Based Medical Content. Berlin: De Gruyter. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=887115
  • Yang, Y. (2016). Temporal Data Mining Via Unsupervised Ensemble Learning. Elsevier.
  • Технологии анализа данных: Data Mining, Visual Mining, Text Mining, OLAP : учеб. пособие, Барсегян, А. А., 2008

Authors

  • Карташева Анна Александровна
  • Павлова Ирина Анатольевна