ComeçarComece de graça

Select directly from a parent element with XPATH's text()

In this exercise, you'll deal with the same table. This time, you'll extract the function information in parentheses into their own column, so you are required to extract a data frame with not two, but three columns: actors, roles, and functions.

To do this, you'll need to apply the specific XPATH function that was introduced in the video instead of html_table(), which often does not work in practice if the HTML table element is not well structured, as it is the case here.

For your reference, here's again an excerpt of the table HTML:

<table>
 <tr>
  <th>Actor</th>
  <th>Role</th>
 </tr>
 <tr>
  <td class = 'actor'>Jayden Carpenter</td>
  <td class = 'role'><em>Mickey Mouse</em> (Voice)</td>
 </tr>
 ...
</table>

In this exercise, the roles_html variable contains the HTML document with its table element.

Este exercício faz parte do curso

Web Scraping in R

Ver curso

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Extract the actors in the cells having class "actor"
actors <- roles_html %>% 
  html_elements(xpath = '//table//td[@class = "actor"]') %>%
  html_text()
actors

# Extract the roles in the cells having class "role"
roles <- roles_html %>% 
  html_elements(xpath = '//table//td[@class = "___"]/___') %>% 
  ___()
roles
Editar e executar o código