2020/2021
Python для сбора и анализа данных
Статус:
Дисциплина общефакультетского пула
Кто читает:
Международный институт экономики и финансов
Где читается:
Международный институт экономики и финансов
Когда читается:
3, 4 модуль
Преподаватели:
Тамбовцева Алла Андреевна
Язык:
русский
Кредиты:
3
Контактные часы:
32
Программа дисциплины
Аннотация
Course is an optional one module course. During this course students will learn basics of programming, methods to process and visualize qualitative and quantitative data, and approaches to retrieving information from the Internet using web scraping and API requests. The ultimate goal of the course is to provide students with techniques useful for data collection, data visualization and exploratory data analysis. The course is taught in two languages: Russian and English. Classes will be taught in Russian, and materials will be available in both Russian and English. Course prerequisites: No special requirements.
Цель освоения дисциплины
- The course is aimed at developing basic programming skills, learning methods of data processing and visualization using Python libraries, learning methods of collecting data from the Web.
Планируемые результаты обучения
- Apply knowledge of different data structures to solve practical problems
- Use conditional structures, loops and functions to work with real data
- Apply methods of data processing and exploratory analysis using Pandas
- Visualize qualitative and quantitative data using graphical Python libraries
- Collect data from the Internet via web scraping and API requests
- Work in Jupyter Notebook, use interactive widgets and other elements of interaction with users
Содержание учебной дисциплины
- Introduction to PythonVariables and basic data types in Python.
- Data structures in Python: lists, tuples, dictionariesMutability and immutability in programming. Lists and methods on lists. Tuples and methods on lists. Lists vs tuples. Dictionaries methods on lists. Dictionaries and JSON-files.
- Control structures and functions in PythonIf-else conditional structures. For-loop and while-loop. User-defined functions. Local and global variables. Lambda-functions. Code debugging.
- Working with data frames in Python with libraries NumPy and PandasNumPy arrays for data analysis. Basic data handling using Pandas methods. Grouping and aggregation data. Merging and melting data. Gathering descriptive statistics for exploratory analysis.
- Data visualization with graphical Python librariesVisualizing mathematical functions in Python. Visualization of qualitative and quantitative data with Matplotlib. Visualization of data with Seaborn.
- Collecting data from the Web using PythonIntroduction to HTML and web design. Parsing html-files in Python with the libraries requests and BeautifulSoup. Introduction to CSS-selector. Управление браузером with Selenium. API as a source of data. Working with API of social networks.
Элементы контроля
- homeworks
- online practiceOnline practice includes doing tasks on the online platform DataCamp (https://www.datacamp.com/home), free access is provided to students. Online practice should be completed before the deadline (usually next class), late submissions will not be graded.
- final projectFor the final project students are expected to write a program of practical use that includes requesting some input from a user, retrieving data from the Internet and processing these data. For the final project students should submit two files: a file with Python code (ipynb-file or py-file), and a file with documentation for this code that describes its aims, usage and limitations. Project can be done individually or in groups up to 3 people.
Промежуточная аттестация
- Промежуточная аттестация (4 модуль)0.4 * final project + 0.4 * homeworks + 0.2 * online practice
Список литературы
Рекомендуемая основная литература
- Nelli, F. (2015). Python Data Analytics : Data Analysis and Science Using Pandas, Matplotlib and the Python Programming Language. [Berkeley, CA]: Apress. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1056488
Рекомендуемая дополнительная литература
- G. Nair, V. (2014). Getting Started with Beautiful Soup. Birmingham, UK: Packt Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=691839