2023/2024
Data Mining
Type:
Mago-Lego
Delivered by:
International Laboratory for Applied Network Research
When:
3 module
Open to:
students of one campus
Instructors:
Ilia Karpov
Language:
English
ECTS credits:
3
Contact hours:
40
Course Syllabus
Abstract
This course provides a comprehensive introduction to Data Mining techniques and their applications in Orange. Students will explore methods for visualizing and interpreting data, develop predictive models, and apply classification and clustering algorithms to real-world problems. They will gain hands-on experience with advanced techniques such as hyperparameter tuning and dimensionality reduction. The course also covers text mining and the use of pre-trained deep learning models for complex data analysis tasks. By the end of the course, students will have a solid understanding of data mining principles and practical skills to extract valuable insights from diverse data.
Learning Objectives
- The course gives students an important foundation to develop and conduct their own research as well as to evaluate research of others.
Expected Learning Outcomes
- Be able to compare mining diverse patterns, including methods for mining multi-level, multi-dimensional patterns, qualitative patterns,
- Be able to compare negative correlations, compressed and redundancy-aware top-k patterns, and mining long (colossal) patterns.
- Be able to compare pattern evaluation issues, especially several popularly used measures, such as lift, chisquare, cosine, Jaccard, and Kulczynski, and their comparative strengths.
- Be able to recall important pattern discovery concepts, methods, and applications, in particular, the basic concepts of pattern discovery, such as frequent pattern, closed pattern, max-pattern, and association rules.
- Know constraint-based pattern mining, including methods for pushing different kinds of constraints, such as data and pattern-based constraints, anti-monotone, monotone, succinct, convertible, and multiple constraints.
- Know efficient pattern mining methods, such as Apriori, ECLAT, and FPgrowth.
- Know various pattern mining applications, such as mining spatiotemporal and trajectory patterns and mining quality phrases.
- Know well-known sequential pattern mining methods, including methods for mining sequential patterns, such as GSP, SPADE, PrefixSpan, and CloSpan
- Students will be able to create and interpret data visualizations, including Box Plots, Distribution Plots, Scatter Plots, Mosaic Displays, and Saive Diagrams, using Orange for data exploration.
- Students will be able to understand and apply basic predictive modeling techniques such as Linear Regression, Induction Trees, and Classification Trees.
- Students will be able to compare and apply classification models, including Logistic Regression, SVM, and Naive Bayesian Classifiers, selecting appropriate models for specific data sets.
- Studetns will be able to apply advanced classification techniques like non-linear SVMs, Random Forests, and k-Nearest Neighbors, and evaluate model performance using metrics like Precision, Recall, F1 score, and ROC Curve.
- Students will be able to inplement hyperparameter tuning and Regularization methods (Ridge and Lasso) to optimize model performance and prevent overfitting.
- Students will be able to implement and interpret clustering algorithms such as Hierarchical Clustering and k-Means, understanding their practical applications.
- Students will be able to analyze and interpret multidimensional data using MDS, PCA, FreeViz, and SOM, reducing dimensionality and uncovering patterns.
- Students will be able to understand Text Mining processes, including Preprocessing, Bag of Words, Word Enrichment, and Sentiment Analysis, applying these techniques to extract meaningful insights.
- Students will be able to work with pre-trained Deep Neural Network models for text mining and image analysis, learning to fine-tune these models for specific tasks.
Course Contents
- 1. Visualizations (and Getting to Know Orange)
- 2. Introduction to predictive modelling
- 3. Models for Classification I
- 4. Models for Classification II
- 5. Evaluating Model Perfomance
- 6. Regularization and Hyperparameters tuning
- 7. Models for Clustering
- 8. Dimensionality Reduction
- 9. Models for Text Mining
- 10. Pre-trained Neural Network Models
Bibliography
Recommended Core Bibliography
- ElAtia, S., Ipperciel, D., & Zaiane, O. R. (2017). Data Mining and Learning Analytics : Applications in Educational Research. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1351385
- Han, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (Vol. 3rd ed). Burlington, MA: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=377411
- Larose, D. T., & Larose, C. D. (2015). Data Mining and Predictive Analytics. Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=958471
- S. K. Mourya, & Shalu Gupta. (2013). Data Mining and Data Warehousing. [N.p.]: Alpha Science Internation Limited. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1688519
Recommended Additional Bibliography
- Brown, M. S. (2014). Data Mining For Dummies. Hoboken: For Dummies. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=842663
- Knobbe, A. J. (2006). Multi-relational Data Mining. Amsterdam: IOS Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=176061
- Motoda, H. (2002). Active Mining : New Directions of Data Mining. Amsterdam: IOS Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=87558