ComenzarEmpieza gratis

Select directly from a parent element with XPATH's text()

In this exercise, you'll deal with the same table. This time, you'll extract the function information in parentheses into their own column, so you are required to extract a data frame with not two, but three columns: actors, roles, and functions.

To do this, you'll need to apply the specific XPATH function that was introduced in the video instead of html_table(), which often does not work in practice if the HTML table element is not well structured, as it is the case here.

For your reference, here's again an excerpt of the table HTML:

<table>
 <tr>
  <th>Actor</th>
  <th>Role</th>
 </tr>
 <tr>
  <td class = 'actor'>Jayden Carpenter</td>
  <td class = 'role'><em>Mickey Mouse</em> (Voice)</td>
 </tr>
 ...
</table>

In this exercise, the roles_html variable contains the HTML document with its table element.

Este ejercicio forma parte del curso

Web Scraping in R

Ver curso

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Extract the actors in the cells having class "actor"
actors <- roles_html %>% 
  html_elements(xpath = '//table//td[@class = "actor"]') %>%
  html_text()
actors

# Extract the roles in the cells having class "role"
roles <- roles_html %>% 
  html_elements(xpath = '//table//td[@class = "___"]/___') %>% 
  ___()
roles
Editar y ejecutar código