Мы используем файлы cookies для улучшения работы сайта НИУ ВШЭ и большего удобства его использования. Более подробную информацию об использовании файлов cookies можно найти здесь, наши правила обработки персональных данных – здесь. Продолжая пользоваться сайтом, вы подтверждаете, что были проинформированы об использовании файлов cookies сайтом НИУ ВШЭ и согласны с нашими правилами обработки персональных данных. Вы можете отключить файлы cookies в настройках Вашего браузера.

  • A
  • A
  • A
  • АБB
  • АБB
  • АБB
  • А
  • А
  • А
  • А
  • А
Обычная версия сайта
Магистратура 2020/2021

Перспективные базы данных

Лучший по критерию «Полезность курса для расширения кругозора и разностороннего развития»
Лучший по критерию «Новизна полученных знаний»
Статус: Курс обязательный (Системная и программная инженерия)
Направление: 09.04.04. Программная инженерия
Когда читается: 1-й курс, 1, 2 модуль
Формат изучения: без онлайн-курса
Прогр. обучения: Системная и программная инженерия
Язык: английский
Кредиты: 5
Контактные часы: 56

Course Syllabus

Abstract

The objective of Advanced databases course delivery is to form professional competencies related to design and implementation of non-relational databases, including object-oriented, object-relational, deductive, multidimensional and semistructured database models. Students will get a grasp on strengths and weaknesses of wide spectrum of approaches to data storage, search and retrieval, resulting in informed choice of database model. The course is based on “Advanced databases” course of Eindhoven University of Technology (Eindhoven, Netherlands), Faculty of Math and Computer Science, author and principal lecturer – Prof. Dr. Toon Calders. While Databases course of bachelor curricula covered many of the core concepts behind database modeling, design and implementation, there are many other considerations that should be addressed for successful scientific or industry career in this field. The main objective of this course is to expand students’ view and introduce advanced topics, including object-oriented extensions, deductive databases and appropriate query languages, data warehouses and online analytical data processing, and XML. The additional topics covered in this course will help students become more proficient in choosing appropriate technologies for projects and will expand their knowledge base so that they have a better understanding of the field. By the end of the course, students should have a solid grasp on business intelligence tools and XML, which will prove to be invaluable as they progress further in computer science studies.
Learning Objectives

Learning Objectives

  • This course studies different database models and their properties. The models that will be discussed are: • Object-oriented databases; • Deductive databases (Datalog); • Data warehousing and online analytical processing (ROLAP, MOLAP and HOLAP); • The XML data model (query languages XQuery and XPath, DTDs). For these conceptual models the course will concentrate on the following points: Why was the database model introduced? Which of the shortcomings of other models does it address? What are the most important concepts and notions for the database model? How is the model implemented? Which are the main techniques? The importance of understanding the internals of a particular database model cannot be overemphasized as it is closely connected to its limitations.
Expected Learning Outcomes

Expected Learning Outcomes

  • Knows the shortcomings and restrictions of relational data model.
  • Can reason about expressibility of relational query languages using notion of locality.
  • Knows data storage methods usable for object-oriented program systems, including pure object database systems and object-relational mappers, its advantages and disadvantages.
  • Knows models and methods of organization of deductive databases using Datalog language.
  • Knows implementation and optimization techniques for Datalog translator.
  • Knows the reference architectures of data warehouses and is aware of the basic functionality offered by available commercial and free data warehousing systems.
  • Master methods and tools for creating analytical database solutions.
  • Can choose dimensions for multidimensional database, group them into hierarchies and define aggregates.
  • Knows principles of view choice for materialization
  • Knows XML document model, DTD and XML Schema.
  • Can write and understand XPath and XQuery expressions.
  • Can create and interpret XLST transformations.
  • Demonstrate capability of reading and writing queries in SQL and/or relational algebra.
Course Contents

Course Contents

  • Advanced topics in databases: introduction
    Structure of course.Restrictions of relational model. Object-oriented databases. Deductive databases. Data warehousing and OLAP. Semistructured data and XML. Project. Grading.
  • Shortcomings of relational model
    Why relational model is insufficient. Expressivenes of relational algebra. Limitations of relational query languages concerning inexpressibility of certain queries classes, e.g. transitive closure and other recursive queries. Gaifman locality. Historical overview of non-relational models. Nested relational algebra. Nesting in SQL dialects. Extending SQL with recursion.
  • Object-oriented databases
    Applications that require storage and manipulation of complex data. Object-oriented programming languages for complex objects manipulation. Mapping objects to tuples in relations. Impedance mismatch. Extending SQL with complex types: collections, structures, inheritance, references. Methods for complex types. Notion of persistence. Persistent programming languages. Persistent objects. Object identity. Query languages for object-oriented databases. Object-relational databases. Object-relational mapping. Language-integrated query
  • Deductive databases
    Logic programming and Prolog. Combining rules and facts in one database. Intensional and extensional relations. Datalog syntax. Semantics of the Datalog program. Non-recursive Datalog programs. Negation. Safety of rule. Model-theoretic semantics. Recursive Datalog programs. Fixpoint evaluation. Stratified programs and strata. Aggregation. Equivalence of relational algebra and safe Datalog with negation, without recursion and aggregation. Evaluation and optimization of Datalog programs: avoiding repeated and unnecessary inferences, filtering with “magic sets”, indexing, materialization.
  • Data warehousing and OLAP
    Requirements to data management from decision support systems. Historical, summarized, integrated data. Statistical and analytical queries. Business intelligence applications. Three-tier architecture. Extract-transform-load process. Data warehouse, data mart. Online analytical processing (OLAP). Conceptual models for decision support. Multidimensional view on the data. Cross-tabulation. Data cubes. Operations with data cubes: roll-up, drill-down, pivot, slice & dice, select. Query languages for supporting OLAP. SQL extensions: Group by cube, group by rollup. Multidimensional expressions (MDX). Data explosion problem. View materialization: optimal set of views. Partial order on views, cost model, greedy algorithm. Relational OLAP (ROLAP): Star schema, snowflake schema, snowflake constellation. Multi-dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense dimensions. Indexing of dimensions: b-tree, bitmap, join indexes. Hybrid OLAP (HOLAP).
  • Semistructured data
    Semistructured data for human and machine consumption. XML: tags, elements, attributes, values. Well-formed and valid documents. Namespaces. XPath: axes, node-tests, predicates. XQuery: expressions, functions. FLWOR expressions. XQuery data model. DTD and XML Schema. Simple and complex types of XML Schema. Light XQuery. Extensible Stylesheet Language Transformations (XSLT): templates, parameters, variables. XML data management.
Assessment Elements

Assessment Elements

  • non-blocking Work on seminars
  • non-blocking Homework
  • non-blocking Exam
Interim Assessment

Interim Assessment

  • Interim assessment (2 module)
    0.4 * Exam + 0.3 * Homework + 0.3 * Work on seminars
Bibliography

Bibliography

Recommended Core Bibliography

  • Foster, E. C., & Godbole, S. (2016). Database Systems : A Pragmatic Approach (Vol. Second edition). [United States]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1174505
  • Greco, S., & Molinaro, C. (2016). Datalog and Logic Databases. San Rafael, California: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1100404

Recommended Additional Bibliography

  • Celko, J. (2006). Joe Celko’s Analytics and OLAP in SQL. San Francisco, Calif: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=195632
  • Elmasri, R., & Navathe, S. (2014). Fundamentals of Database Systems: Pearson New International Edition (Vol. Sixth edition). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1418149
  • Freeman, A., & Rattz, J. C. (2010). Pro LINQ : Language Integrated Query in C# 2010. [New York]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=373589
  • Gang Gou, & Rada Chirkova. (2007). Efficiently querying large XML data repositories: a survey. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.BD4CFD0F
  • Garcia-Molina, H., Ullman, J. D., Dawson Books, & Widom, J. (2014). Database Systems: Pearson New International Edition : The Complete Book (Vol. Second edition). Harlow, Essex: Pearson. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1418178
  • Haq, Q. M. R. U. (2016). Data Mapping for Data Warehouse Design. Amsterdam: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1115852
  • Inderpal Singh Mumick, Sheldon J. Finkelstein, Hamid Pirahesh, & Raghu Ramakrishnan. (1990). Magic is Relevant. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.DDFF5305
  • Inmon, W. H., Linst, D., & Levins, M. (2019). Data Architecture: A Primer for the Data Scientist : A Primer for the Data Scientist (Vol. Second Edition). London: Academic Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1951596
  • Inmon, W. H., Neushloss, G., & Strauss, D. (2008). DW 2.0: The Architecture for the Next Generation of Data Warehousing. Amsterdam: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=239178
  • Jack Minker. (1996). Logic and Databases: a 20 Year Retrospective. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.317CDD56
  • Jensen, C. S., Thomsen, C., & Pedersen, T. B. (2010). Multidimensional Databases and Data Warehousing. [San Rafael, Calif.]: Morgan & Claypool Publishers. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=440363
  • Jukic, N., Vrbsky, S., & Nestorov, S. (2017). Database Systems : Introduction to Databases and Data Warehouses. Burlington, Virginia: Prospect Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1562389
  • Kimball, R., & Ross, M. (2013). The Data Warehouse Toolkit : The Definitive Guide to Dimensional Modeling (Vol. 3rd edition). Hoboken, New Jersey: Wiley. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=605991
  • Lauri Hella, Leonid Libkin, & Juha Nurmonen. (1997). Notions of Locality and Their Logical Characterizations Over Finite Models. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.2F7E6D58
  • Libkin, L. (2003). Expressive power of SQL. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.47C7A56F
  • Linstedt, D., & Olschimke, M. (2015). Building a Scalable Data Warehouse with Data Vault 2.0. Amsterdam: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1065504
  • Melton, J., & Buxton, S. (2006). Querying XML : XQuery, XPath, and SQL/XML in Context. San Francisco, Calif: Morgan Kaufmann. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=230831
  • Paterson, J. (2006). The Definitive Guide to Db4o. Berkeley, CA: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=182733
  • Rad, R. (2016). Mastering Hibernate. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1239416
  • Raghu Ramakrishnan, & Jeffrey D. Ullman. (1993). A Survey of Research on Deductive Database Systems. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.B54AD10B
  • Sun, A. (2016). Comparison of Java Persistence Layer Technologies. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsndl&AN=edsndl.oai.union.ndltd.org.UPSALLA1.oai.DiVA.org.lnu-57667
  • The Java Data Object (JDO) [1],[2],[3],[4] standard. (2003). Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.EDFE9217
  • Thuraisingham, B. M. (2002). XML Databases and the Semantic Web. Boca Raton, FL: CRC Press. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=154600
  • Venky Harinarayan, Anand Rajaraman, & Jeffrey D. Ullman. (1996). Implementing Data Cubes Efficiently. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsbas&AN=edsbas.1C7F82F2