Select directly from a parent element with XPATH's text()
In this exercise, you'll deal with the same table. This time, you'll extract the function information in parentheses into their own column, so you are required to extract a data frame with not two, but three columns: actors, roles, and functions.
To do this, you'll need to apply the specific XPATH function that was introduced in the video instead of html_table(), which often does not work in practice if the HTML table element is not well structured, as it is the case here.
For your reference, here's again an excerpt of the table HTML:
<table>
<tr>
<th>Actor</th>
<th>Role</th>
</tr>
<tr>
<td class = 'actor'>Jayden Carpenter</td>
<td class = 'role'><em>Mickey Mouse</em> (Voice)</td>
</tr>
...
</table>
In this exercise, the roles_html variable contains the HTML document with its table element.
Bu egzersiz
Web Scraping in R
kursunun bir parçasıdırUygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Extract the actors in the cells having class "actor"
actors <- roles_html %>%
html_elements(xpath = '//table//td[@class = "actor"]') %>%
html_text()
actors
# Extract the roles in the cells having class "role"
roles <- roles_html %>%
html_elements(xpath = '//table//td[@class = "___"]/___') %>%
___()
roles