2024/2025
Data Bases and Data Warehouses
Type:
Mago-Lego
Delivered by:
Big Data and Information Retrieval School
When:
3, 4 module
Open to:
students of one campus
Language:
English
ECTS credits:
6
Course Syllabus
Abstract
Currently, data analysis plays a significant role in improving business efficiency. Most data sources are relational databases that can contain a large number of records. For data processing and analysis specialists, knowledge of how to extract such information, perform the necessary calculations, and save the results for later use becomes a priority.
This course consistently introduces students to database management systems, the theoretical basis of working with data, and writing simple and complex queries. The course covers the main aspects of data acquisition and processing. The second part of the course is devoted to the study of the methodology of creating data warehouses, software tools used in the development of data warehouses, as well as the formation of practical skills in designing databases.
Practical exercises will allow students to acquire new knowledge when writing queries to one of the most modern free databases, PostgreSQL, as well as to get acquainted with the features of organizing data flows in Airflow.
Learning Objectives
- Know possible ways to work with relational databases
- Know the features of the relational data model
- Know the syntax of the SQL language
- Understand the architecture of data warehouses and its components
- Be able to design data loading processes using SQL and Airflow
- Know the visualization tools for information obtained from data warehouses
Expected Learning Outcomes
- 1. Can extract data from relational databases using queries
- 2. Knows how to structure and process data in a relational database
- 3. Knows the syntax of the SQL language at a high level and can apply it to solve analytical problems.
- 4. Knows how to design a data warehouse
- 5. Applies DBMS tools in the implementation of the HD model
Course Contents
- Data models used in different layers of the data warehouse. The approach of Kimball and Inmon.
- Creating simple loading procedures using airflow
- Automation of data loading and processing processes using Airflow
- Data deduplication. Loading data using SQL.
- The ETL process. ETL and ELT. Methods for uploading data with a history of changes.
- Data management in the tasks of building analytical storefronts.
- An introduction to data warehouses. Prerequisites for creation, main components and users.
- Programming capabilities in databases. Users and roles.
- The use of nested queries in computations and complex environments.
- Nested queries. Using window functions.
- Query development with grouping and aggregating functions
- The use of aggregating functions. Query execution plan.
- Development of select project-join queries using built-in functions and operators.
- Operations of working with sets in SQL. Different connection options in requests.
- Reading, inserting, updating, and deleting in Postgres. Simple data operations.
- Theoretical foundations of query development. Expressions in the SQL language. Simple CRUD operations in SQL DML.
- Relational database development. Basics of working with Postgres.
- Basic concepts of relational databases. Description of the data structure using SQL DDL
- Incremental data loading with change history support.
- The Data Vault approach. Advantages and disadvantages of the models. Data lakes, lambda and kappa architectures.
- Designing a data warehouse using Data Vault
- Tools for building reports based on data warehouses
- Data visualization using BI tools
Assessment Elements
- HomeworkIssued for 1, 3, 4, 5 weeks. Each task is associated with a practice on the topic of the week.
- ProjectIt is issued after the 8th lecture. It is a team project in which students will try to create a small data warehouse based on open sources. The maximum number of participants is 4 students.
- Control workIt is conducted after the 6th lecture in the test format, which takes 60 minutes. You can use the course materials. Number of questions - 30
- ExamA written exam in the form of a test and one assignment with a detailed answer. The use of course materials is allowed.
Interim Assessment
- 2024/2025 4th module0.3 * Control work + 0.3 * Exam + 0.2 * Homework + 0.2 * Project
Bibliography
Recommended Core Bibliography
- Гордеев, С. И. Организация баз данных в 2 ч. Часть 1 : учебник для вузов / С. И. Гордеев, В. Н. Волошина. — 2-е изд., испр. и доп. — Москва : Издательство Юрайт, 2024. — 310 с. — (Высшее образование). — ISBN 978-5-534-04469-0. — Текст : электронный // Образовательная платформа Юрайт [сайт]. — URL: https://urait.ru/bcode/538593 (дата обращения: 27.08.2024).
- Гордеев, С. И. Организация баз данных в 2 ч. Часть 2 : учебник для вузов / С. И. Гордеев, В. Н. Волошина. — 2-е изд., испр. и доп. — Москва : Издательство Юрайт, 2024. — 513 с. — (Высшее образование). — ISBN 978-5-534-04470-6. — Текст : электронный // Образовательная платформа Юрайт [сайт]. — URL: https://urait.ru/bcode/539672 (дата обращения: 27.08.2024).
- Стружкин, Н. П. Базы данных: проектирование. Практикум : учебное пособие для вузов / Н. П. Стружкин, В. В. Годин. — Москва : Издательство Юрайт, 2024. — 291 с. — (Высшее образование). — ISBN 978-5-534-00739-8. — Текст : электронный // Образовательная платформа Юрайт [сайт]. — URL: https://urait.ru/bcode/537149 (дата обращения: 27.08.2024).
Recommended Additional Bibliography
- Beaulieu, A. (2009). Learning SQL : Master SQL Fundamentals: Vol. 2nd ed. O’Reilly Media.
- Perkins, L., Redmond, E., & Wilson, J. R. (2018). Seven Databases in Seven Weeks : A Guide to Modern Databases and the NoSQL Movement (Vol. Second edition). Raleigh, N. C: Pragmatic Bookshelf. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1806794