Master
2021/2022
Big Data: Advanced Level
Type:
Elective course (Master in International Management)
Area of studies:
Management
Delivered by:
Department of Business Informatics
Where:
Graduate School of Business
When:
1 year, 4 module
Mode of studies:
distance learning
Online hours:
2
Open to:
students of one campus
Instructors:
Leonid Smelov
Master’s programme:
Международный менеджмент
Language:
English
ECTS credits:
3
Contact hours:
2
Course Syllabus
Abstract
Program International Management Link https://www.coursera.org/learn/big-data-integration-processing?specialization=big-data Semester 2 Level Graduate Year 1 Study mode MOOC Type of course Elective ECTS 3 Prerequisites The Course “Big Data Advanced Analytics” is an elective course. It is recommended to have a preliminary knowledge in the following disciplines prior attending this course: Introduction to Data Science Learning outcomes • to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms Contents The course covers the basic concepts in big data integration and processing, the various aspects of data retrieval for NoSQL data, as well as data aggregation and working with data frames. It also introduces the big data pipelines and workflows as well as processing and analysis of big data using Apache Spark. The course also provides students with practical hands-on experience to analyze Twitter data. This course covers the following topics: • Retrieving Big Data • Big Data Integration • Processing Big Data • Big Data Analytics using Spark • Learn By Doing: Putting MongoDB and Spark to Work
Learning Objectives
- Students will be able to:• to be able retrieve data from example database and big data management systems • to be able to describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications • to be able to identify when a big data problem needs data integration • to be able to execute simple big data integration and processing on Hadoop and Spark platforms
Expected Learning Outcomes
- know basic concepts in big data integration and processing
- you will be introduced to data integration tools including Splunk and Datameer, and you will gain some practical insight into how information integration processes are carried out.
- you will be introduced to the Postgres database
- you will get some practical hands-on experience applying what you learned about Spark and MongoDB to analyze Twitter data
- you will learn the inner workings of the Spark Core. You will be introduced to two key tools in the Spark toolkit: Spark MLlib and GraphX.
Course Contents
- Big Data Integration
- Retrieving Big Data
- Learn By Doing: Putting MongoDB and Spark to Work
- Big Data Analytics using Spark
- Processing Big Data
Interim Assessment
- 2021/2022 4th modulethe result will be evaluated upon submission of the certificate
Bibliography
Recommended Core Bibliography
- Goyal, A. (2020). A Self-Assessing Compilation Based Search Approach for Analytical Research and Data Retrieval.
- Hoger Khayrolla Omar, & Alaa Khalil Jumaa. (2019). Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java. https://doi.org/10.24017/science.2019.1.2
- Ilya Ganelin, Ema Orhian, Kai Sasaki, & Brennon York. (2016). Spark : Big Data Cluster Computing in Production. Wiley.
Recommended Additional Bibliography
- Edward, S. G., & Sabharwal, N. (2015). Practical MongoDB : Architecting, Developing, and Administering MongoDB. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1124206
- Isaac Chun-Hai Fung, Jingjing Yin, Keisha D. Pressley, Carmen H. Duke, Chen Mo, Hai Liang, King-Wa Fu, Zion Tsz Ho Tse, & Su-I Hou. (2019). Pedagogical Demonstration of Twitter Data Analysis: A Case Study of World AIDS Day, 2014. https://doi.org/10.3390/data4020084
- Langewisch, R. P. (2016). Performance study of an implementation of the push-relabel maximum flow algorithm in Apache Spark’s GraphX, A.