Do it the httr way
Here's some rvest
code that I used to find out the elevation of a beautiful place where I recently spent my vacation.
# Get the HTML document from Wikipedia
wikipedia_page <- read_html('https://en.wikipedia.org/wiki/Varigotti')
# Parse the document and extract the elevation from it
wikipedia_page %>%
html_elements('table tr:nth-child(9) > td') %>%
html_text()
As you have learned in the video, read_html()
actually issues an HTTP GET request if provided with a URL, like in this case.
The goal of this exercise is to replicate the same query without read_html()
, but with httr
methods instead.
Note: Usually rvest
does the job, but if you want to customize requests like you'll be shown later in this chapter, you'll need to know the httr
way.
For a little repetition, you'll also translate the CSS selector used in html_elements()
into an XPATH query.
This exercise is part of the course
Web Scraping in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Get the HTML document from Wikipedia using httr
wikipedia_response <- ___('https://en.wikipedia.org/wiki/Varigotti')
# Parse the response into an HTML doc
wikipedia_page <- ___(___)