Another extractor
In the previous exercise, we built a function that was able to extract the text content from H2
headers.
We'll try something else here: we want to extract all the links that exist on a specific page. To do this, we will need to call two httr
functions: html_nodes()
, with the css
argument set to "a"
(a
is the HTML tag for links) and html_attr()
, which extract a given attribute from a node — in our case, this attribute will be "href"
, which is the link address.
purrr
and rvest
has been loaded for you. You can still find the urls
vector in your workspace.
Diese Übung ist Teil des Kurses
Intermediate Functional Programming with purrr
Anleitung zur Übung
Prefill the
html_nodes()
with thecss
argument set to"a"
.Create the
href()
function, which will be a prefilled version ofhtml_attr()
.Compose a new combination of
href()
,get_a()
andread_html()
.Map this new function on the
urls
vector.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Create a partial version of html_nodes(), with the css param set to "a"
get_a <- ___(html_nodes, ___)
# Create href(), a partial version of html_attr()
href <- ___(___, name = "href")
# Combine href(), get_a(), and read_html()
get_links <- ___(___, ___, ___)
# Map get_links() to the urls list
res <- ___(urls, ___) %>%
set_names(urls)
# See the result
res