Turn a table into a data frame with html_table()
If a table has a header row (with th
elements) and no gaps, scraping it is straightforward, as with the following table (having ID "clean"
):
Mountain | Height | First ascent | Country |
---|---|---|---|
Mount Everest | 8848 | 1953 | Nepal, China |
... |
Here's the same table (having ID "dirty"
) without a designated header row and a missing cell in the first row:
Mountain | Height | First ascent | Country |
Mount Everest | 8848 | 1953 | |
... |
For such cases, html_table()
has an extra argument you can use to correctly parse the table, as shown in the video. Missing cells are automatically recognized and replaced with NA
values.
Both tables are contained within the mountains_html
document.
Diese Übung ist Teil des Kurses
Web Scraping in R
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Extract the "clean" table into a data frame
mountains <- mountains_html %>%
html_element("table#clean") %>%
___
mountains