2024/2025
Introduction to Text Mining in R
Type:
Mago-Lego
Delivered by:
International Laboratory for Applied Network Research
When:
3 module
Online hours:
40
Open to:
students of one campus
Instructors:
Карташева Анна Александровна
Language:
English
ECTS credits:
3
Course Syllabus
Abstract
The course covers topics in the process of exploring and analyzing large amounts of unstructured text data for identify attributes in the data (concepts, keywords, patterns) with R
Learning Objectives
- The course gives students an important foundation to develop and conduct their own research to analysis of language data using machine learning, statistics, and linguistics.
Expected Learning Outcomes
- Be able to work with modules and packages of R in the RStudio and be able to combine formatted text and R code using RMarkdown
- Be able to prepare data (import texts, preprocess texts, convert texts into a document-feature matrix) with R
- Be able to build, interpret and evaluate supervised learning models for textual data.
- Be able to analyze data (count specific patterns, use machine learning models) with R
- Be able to perform sentiment analysis of text data
- Know group of Text Mining methods (basic preprocessing of the text, selection of high-frequency words, thematic and sentiment analysis) for research in a wide range of disciplines such as psychology, economics, education, as well as political and social sciences.
- Be able to use R to collect text data from online sources.
Course Contents
- Introduction to basic R data types and structures, work with RStudio and RMarkdown
- Modules and packages of R
- Supervised machine learning with the bag-of-words approach
- Unsupervised machine learning
- Sentiment analysis
- Text data acquisition
Bibliography
Recommended Core Bibliography
- 9781491981627 - Silge, Julia; Robinson, David - Text Mining with R : A Tidy Approach - 2017 - O'Reilly Media - http://search.ebscohost.com/login.aspx?direct=true&db=nlebk&AN=1533983 - nlebk - 1533983
- An introduction to R : a programming environment for data analysis and graphics, Venables, W. N., 2009
- Berry, M. W., & Kogan, J. (2010). Text Mining : Applications and Theory. Chichester, U.K.: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=314553
- Munzert, S. (2014). Automated Data Collection with R : A Practical Guide to Web Scraping and Text Mining. HobokenChichester, West Sussex, United Kingdom: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=878670
- Text mining in practice with R, Kwartler, T., 2017
- Text mining with R : a tidy approach, Silge, J., 2017
- The text mining handbook : advanced approaches in analyzing unstructured data, Feldman, R., 2009
- Wickham, H., & Grolemund, G. (2016). R for Data Science : Import, Tidy, Transform, Visualize, and Model Data (Vol. First edition). Sebastopol, CA: Reilly - O’Reilly Media. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1440131
Recommended Additional Bibliography
- Advanced positioning, flow, and sentiment analysis in commodity markets : bridging fundamental and technical analysis, Keenan, M. J. S., 2020
- Best practices in data cleaning : a complete guide to everything you need to do before and after collecting your data, Osborne, J. W., 2013
- Beysolow, T. (2018). Applied Natural Language Processing with Python : Implementing Machine Learning and Deep Learning Algorithms for Natural Language Processing. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1892182
- Deepti Gupta. (2018). Applied Analytics Through Case Studies Using SAS and R : Implementing Predictive Models and Machine Learning Techniques. Apress.
- From Text Mining to Visual Classification: Rethinking Computational New Cinema History with Jean Desmet’s Digitised Business Archive. (2018). Tijdschrift Voor Mediageschiedenis, 21(2), 127–145.
- Kao, A., Poteet S. Natural Language Processing and Text Mining. - Springer, 2007. - ЭБС Books 24x7.
- Matt Wiley, & Joshua F. Wiley. (2019). Advanced R Statistical Programming and Data Models : Analysis, Machine Learning, and Visualization. Apress.
- Neustein, A. (2014). Text Mining of Web-Based Medical Content. Berlin: De Gruyter. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=887115
- Yang, Y. (2016). Temporal Data Mining Via Unsupervised Ensemble Learning. Elsevier.
- Технологии анализа данных: Data Mining, Visual Mining, Text Mining, OLAP : учеб. пособие, Барсегян, А. А., 2008