Мы используем файлы cookies для улучшения работы сайта НИУ ВШЭ и большего удобства его использования. Более подробную информацию об использовании файлов cookies можно найти здесь, наши правила обработки персональных данных – здесь. Продолжая пользоваться сайтом, вы подтверждаете, что были проинформированы об использовании файлов cookies сайтом НИУ ВШЭ и согласны с нашими правилами обработки персональных данных. Вы можете отключить файлы cookies в настройках Вашего браузера.

  • A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2024/2025

Упорядоченные множества в анализе данных

Статус: Курс обязательный (Науки о данных (Data Science))
Направление: 01.04.02. Прикладная математика и информатика
Когда читается: 1-й курс, 1, 2 модуль
Формат изучения: без онлайн-курса
Охват аудитории: для своего кампуса
Прогр. обучения: Науки о данных
Язык: английский
Кредиты: 6

Course Syllabus

Abstract

“Ordered sets for data analysis” is a first-semester master course within “Data Science” master program. Order relations are ubiquitous, we meet them when we consider numbers, Boolean algebras, partitions, multisets, graphs, logical formulas, and many other mathematical entities. On the one hand, order relations are used for representing data and knowledge, on the other hand they serve as important tools for describing models and methods of data analysis, such as decision trees, random forests, version spaces, association rules, etc. We start from the basic notions of binary relations, their representations with matrices and graphs, consider properties of relations and operations on them, giving examples on problems of data analysis. We come then to the notions of order, semilattice, and lattice. Very important data models like numbers, vectors of numbers, set algebras, partitions make lattices. We consider an important type of lattices, called concept lattices, which constitute the main subject of Formal Concept Analysis (FCA), a modern domain of applied lattice theory. Concept lattices provide very useful tools for data analysis and pattern mining. On the one hand, formal concepts, the elements of a concept lattice, describe sets of objects sharing sets of common attributes, thus presenting a nice model of (bi)clustering. Ordered sets of concepts give a hierarchy of classes of a subject domain, which makes a backbone of domain ontology. On the other hand, concept lattices represent dependencies on attributes, both precise (called implications), and imprecise, known in data mining as association rules. We consider most important algorithms and algorithmic problems related to applications of FCA. Concept lattices allow us to describe some fundamental models of data analysis and pattern mining such as decision trees, random forests, version spaces, and concept-based hypotheses. Since a serious limitation of many methods of pattern mining is computational complexity, complexity analysis makes an important leitmotif of the course. Throughout the course we consider applications of the introduced notions in various applied domains, such as text analysis, medicine, pharmacology, etc. Homework consists of both theoretical problems and practical problems in designing particular.
Learning Objectives

Learning Objectives

  • To help students to synthesize knowledge from undergraduate courses on discrete mathematics, algorithmic theory, data analysis to get a stereoscopic picture of the foundations of pattern mining.
  • Providing students with some important theoretical knowledge and practical skills that would help them to see the relationship of various methods of pattern mining and knowledge discovery.
Expected Learning Outcomes

Expected Learning Outcomes

  • Be capable of realizing and testing basic learning and classification model.
  • Be familiar with Association rules in Data Mining, their support and confidence.
  • Be familiar with complexity classes, (polynomial) reduction between problems, problem hardness and completeness.
  • Be familiar with concept-based (JSM) hypotheses.
  • Be familiar with implications, functional dependencies, Armstrong rules, and implication bases.
  • Be familiar with semilattices and lattices, and know their properties.
  • Be familiar with version space, minimal generators and minimal hypotheses.
  • Develop mathematical models based on order theory and Galois connections.
  • Know basic notions and terminology of algorithmic complexity.
  • Know basic notions and terminology of order theory and symbolic data analysis.
  • Know basic notions and terminology of relations, graphs, and matrices as data structures.
  • Know differense between Decision trees and hypothesis.
  • Understand basic notions of dependencies in data.
  • Understand clustering and biclustering, and usage of Formal concepts as biclusters.
  • Understand Formal Concept Analysis (FCA) and its Basic theorem.
  • Understand fundamental principles of learning and classification through Concept Lattices.
  • Understand fundamental principles of symbolic pattern mining.
  • Understand fundamental principles of working with complex data using pattern structures and pattern setups.
  • Understand Interestingness measures of patterns.
  • Understand the main algorithms for Pattern Mining with concept lattices.
Course Contents

Course Contents

  • Introduction. Relations, Graphs, and Matrices as Data Structures
  • Algorithmic Complexity for Data Science
  • Orders, Their Graphs, and Diagrams for Data and Knowledge Representation
  • Semilattices and Lattices
  • Galois Connections and Formal Concept Analysis (FCA)
  • Dependencies in Data: Implications and Functional Dependencies
  • Association Rules and Concept Lattices for Data Mining
  • Algorithms for Pattern Mining with Concept Lattices
  • Biclustering and Interestingness Measures of Patterns
  • Models of Learning and Classification Through Concept Lattices
  • Working with Complex Data: Pattern Structures and Pattern Setups
Assessment Elements

Assessment Elements

  • non-blocking Homework
  • non-blocking Research project
  • non-blocking Exam
    Students have to demonstrate knowledge of theory, but the most of tasks would evaluate their ability to solve practical examples, present straight operation, and recognition skills to solve them.
Interim Assessment

Interim Assessment

  • 2024/2025 1st module
    Grade for the 1st module = 0.5 *(average grade for the regular homework) + 0.5 * (grade for the midterm test, not blocking)
  • 2024/2025 2nd module
    0.5 * Exam + 0.5 * Research project
Bibliography

Bibliography

Recommended Core Bibliography

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. Introduction to Algorithms (3rd edition). – MIT Press, 2009. – 1292 pp.
  • Kuznetsov, S. O. Fitting pattern structures to knowledge discovery in big data // International conference on formal concept analysis. – Springer, Berlin, Heidelberg, 2013. – PP. 254-266.

Recommended Additional Bibliography

  • Kuznetsov, S. O. Pattern structures for analyzing complex data // International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing. – Springer, Berlin, Heidelberg, 2009. – P. 33-44.
  • Kuznetsov, S. O. Scalable knowledge discovery in complex data with pattern structures // International Conference on Pattern Recognition and Machine Intelligence. – Springer, Berlin, Heidelberg, 2013. – P. 30-39.

Authors

  • KUZNETSOV SERGEY OLEGOVICH