Get startedGet started for free

Use predicates to select nodes based on their children

Here's almost the same HTML as before. In addition, the third div has a p child with a third class.

<html>
  <body>
    <div id = 'first'>
      <h1 class = 'big'>Berlin Weather Station</h1>
      <p class = 'first'>Temperature: 20°C</p>
      <p class = 'second'>Humidity: 45%</p>
    </div>
    <div id = 'second'>...</div>
    <div id = 'third'>
      <p class = 'first'>Sunshine: 5hrs</p>
      <p class = 'second'>Precipitation: 0mm</p>
      <p class = 'third'>Snowfall: 0mm</p>
    </div>
  </body>
</html>

With XPATH, something that's not possible with CSS can be done: selecting elements based on the properties of their descendants. For this, predicates may be used. Here, your eventual goal is to select only div elements that enclose a p element with the third class. For that, you'll need to select only the div that matches a certain predicate — having the respective descendant (it needn't be a direct child). You'll do this step by step.

Again, the HTML above is provided as weather_html.

This exercise is part of the course

Web Scraping in R

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Select all divs
weather_html %>% 
  ___(xpath = ___)
Edit and Run Code