Getting data from the web: scraping


Date
Nov 18, 2021 9:30 AM
Location
Room 295, 1155 E 60th St

Overview

  • Define HTML and CSS selectors
  • Introduce the rvest package
  • Demonstrate how to extract information from HTML pages
  • Demonstrate how to extract tables and convert to data frames
  • Practice scraping data

Before class

Class materials

  • Web scraping
  • rvest
    • Load the library (library(rvest))
    • demo("tripadvisor") - scraping a Trip Advisor page
    • demo("united") - how to scrape a web page which requires a login
    • Scraping IMDB

What you need to do after class

Benjamin Soltoff
Benjamin Soltoff
Assistant Senior Instructional Professor in Computational Social Science