BaşlayınÜcretsiz başlayın

Another extractor

In the previous exercise, we built a function that was able to extract the text content from H2 headers.

We'll try something else here: we want to extract all the links that exist on a specific page. To do this, we will need to call two httr functions: html_nodes(), with the css argument set to "a" (a is the HTML tag for links) and html_attr(), which extract a given attribute from a node — in our case, this attribute will be "href", which is the link address.

purrr and rvest has been loaded for you. You can still find the urls vector in your workspace.

Bu egzersiz, kursun bir parçasıdır

Intermediate Functional Programming with purrr

Kursa Göz Atın

Egzersiz talimatları

  • Prefill the html_nodes() with the css argument set to "a".

  • Create the href() function, which will be a prefilled version of html_attr().

  • Compose a new combination of href(), get_a() and read_html().

  • Map this new function on the urls vector.

Uygulamalı etkileşimli egzersiz

Bu egzersizi bu örnek kodu tamamlayarak deneyin.

# Create a partial version of html_nodes(), with the css param set to "a"
get_a <- ___(html_nodes, ___)

# Create href(), a partial version of html_attr()
href <- ___(___, name = "href")

# Combine href(), get_a(), and read_html()
get_links <- ___(___, ___, ___)

# Map get_links() to the urls list
res <- ___(urls, ___) %>%
  set_names(urls)

# See the result
res
Kodu Düzenle ve Çalıştır