ANR-Lab Summer School 'Web-scraping and API for social scientific research'
The International Laboratory of Applied Network Research announces enrollment for the summer school 'Web-scraping and API for social scientific research' and is waiting for applications from external participants!
The school will be held from June 10 to July 17 in a hybrid synchronous format: in a classroom on Pokrovsky Boulevard and online via Zoom.
What skills participants will gain: participants will learn how to independently collect and process data from web pages and through APIs using Python. The course is as practice-oriented as possible: within the course, each lesson is a small project on collecting and processing data. Data from open sources and obtained through APIs can be needed in a variety of areas and tasks, so after completing the course, you will be able to use the acquired skills to collect data for research projects and business purposes. The course also has a methodological orientation – in addition to practicing in Python, in each lesson we also discuss what social scientific research can be implemented with the data we collect in the lesson.
Language: Russian.
Attestation Format: presentation of the concept and defense of the final project.
Classes are taught by a Laboratory member, Lika Kapustina. Lika joined the laboratory in 2022 and immediately began working on the tasks of collecting and processing data through the API, and also managed to write and publish several articles in a team with other members and register RIA for the eLibrary data processing methodology “Bib-Elib”. She specializes in data collection and processing, teaches Python programming courses, and successfully independently developed and implemented (this) author’s course in 2023-2024 at the Faculty of Social Sciences. In 2023, Lika took 2nd place in the HSE University Research competition in political science with her thesis based on independently collected and processed data from the Moscow City Court.
Prerequisites for participation, format for selection of external participants: external participants must have Python skills at the level of the 'Introduction to Python Programming' courses: be able to work with basic data types and collections in Python, write their own functions. The selection format is based on a resume and motivation letter. In your motivation letter, describe how you rate your Python skills and how you have used this programming language over the past year for work and research tasks, and how you could apply the acquired skills in the future. External students will be implementing a project with data and/or research purposes from the International Laboratory for Applied Network Research. If you have long wanted to try your hand at working with real scientific problems, this is a great opportunity. Send motivation letters (300-600 words) and a resume with the subject line “Motivation_letter_school” to lkapustina@hse.ru by June 7. The number of places is limited, selection takes place on a competitive basis.
Classes schedule*:
- June 10, Mon, 18:30-21:30 – introduction to web-scraping, working with html;
- June 19, Wed, 18:30-21:30 – introduction to requests, BeautifulSoup;
- June 21, Fri, 18:30-21:30 – continuation of work with requests, BeautfiulSoup;
- June 24, Mon, 18:30-21:30 – introduction to the API. Working with the VKontakte API;
- June 26, Wed, 18:30-21:30 – introduction to working with third-party libraries. Receiving Telegram data via Pyrogram;
- July 1, Mon, 18:30-21:30 – introduction to Selenium. Open judicial data in Russia;
- July 3, Wed, 18:30-21:30 – continuation of work with Selenium;
- July 8, Mon, 18:30-21:30 – work with Scrapy and other frameworks for data collection;
- July 10, Wed, 18:30-20:00 – ethics and regulation of working with open data. Presentation of project plans;
- July 17, Wed, 18:30-21:30 – presentation of projects in groups.
*Classes schedule may be slightly adjusted. All classes take place in the evening on weekdays.