Session Ready
Exercise

Reading HTML

The first step with web scraping is actually reading the HTML in. This can be done with a function from xml2, which is imported by rvest - read_html(). This accepts a single URL, and returns a big blob of XML that we can use further on.

We're going to experiment with that by grabbing Hadley Wickham's wikipedia page, with rvest, and then printing it just to see what the structure looks like.

Instructions
100 XP
  • Load the rvest package.
  • Use read_html() to read the URL stored at test_url. Store the results as test_xml.
  • Print test_xml.