CommencerCommencer gratuitement

Select directly from a parent element with XPATH's text()

In this exercise, you'll deal with the same table. This time, you'll extract the function information in parentheses into their own column, so you are required to extract a data frame with not two, but three columns: actors, roles, and functions.

To do this, you'll need to apply the specific XPATH function that was introduced in the video instead of html_table(), which often does not work in practice if the HTML table element is not well structured, as it is the case here.

For your reference, here's again an excerpt of the table HTML:

<table>
 <tr>
  <th>Actor</th>
  <th>Role</th>
 </tr>
 <tr>
  <td class = 'actor'>Jayden Carpenter</td>
  <td class = 'role'><em>Mickey Mouse</em> (Voice)</td>
 </tr>
 ...
</table>

In this exercise, the roles_html variable contains the HTML document with its table element.

Cet exercice fait partie du cours

Web Scraping in R

Afficher le cours

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Extract the actors in the cells having class "actor"
actors <- roles_html %>% 
  html_elements(xpath = '//table//td[@class = "actor"]') %>%
  html_text()
actors

# Extract the roles in the cells having class "role"
roles <- roles_html %>% 
  html_elements(xpath = '//table//td[@class = "___"]/___') %>% 
  ___()
roles
Modifier et exécuter le code