CommencerCommencer gratuitement

Do it the httr way

Here's some rvest code that I used to find out the elevation of a beautiful place where I recently spent my vacation.

# Get the HTML document from Wikipedia
wikipedia_page <- read_html('https://en.wikipedia.org/wiki/Varigotti')
# Parse the document and extract the elevation from it
wikipedia_page %>% 
  html_elements('table tr:nth-child(9) > td') %>% 
  html_text()

As you have learned in the video, read_html() actually issues an HTTP GET request if provided with a URL, like in this case.

The goal of this exercise is to replicate the same query without read_html(), but with httr methods instead.

Note: Usually rvest does the job, but if you want to customize requests like you'll be shown later in this chapter, you'll need to know the httr way.

For a little repetition, you'll also translate the CSS selector used in html_elements() into an XPATH query.

Cet exercice fait partie du cours

Web Scraping in R

Afficher le cours

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Get the HTML document from Wikipedia using httr
wikipedia_response <- ___('https://en.wikipedia.org/wiki/Varigotti')
# Parse the response into an HTML doc
wikipedia_page <- ___(___)
Modifier et exécuter le code