Parse hyperlinks into a data frame
Have a look at the following ul
list of "helpful links".
It consists of three li
elements that in turn contain a
elements with the links:
Helpful links
Compiled with help from Google.
The corresponding HTML code is available as a string in hyperlink_raw_html
.
In this exercise, you'll parse these links into an R data frame by selecting only a
elements that are within li
elements.
PS: You'll use tibble()
, a function from the Tidyverse, for that. tibble()
is basically a trimmed down version of data.frame()
, which you certainly already know. Just like data.frame()
, you specify column names and data as pairs of column names and values, like so:
my_tibble <- tibble(
column_name_1 = value_1,
column_name_2 = value_2,
...
)
This exercise is part of the course
Web Scraping in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract all the a nodes from the bulleted list
links <- hyperlink_raw_html %>%
read_html() %>%
html_elements('li ___')