Use predicates to select nodes based on their children
Here's almost the same HTML as before. In addition, the third div
has a p
child with a third
class.
<html>
<body>
<div id = 'first'>
<h1 class = 'big'>Berlin Weather Station</h1>
<p class = 'first'>Temperature: 20°C</p>
<p class = 'second'>Humidity: 45%</p>
</div>
<div id = 'second'>...</div>
<div id = 'third'>
<p class = 'first'>Sunshine: 5hrs</p>
<p class = 'second'>Precipitation: 0mm</p>
<p class = 'third'>Snowfall: 0mm</p>
</div>
</body>
</html>
With XPATH, something that's not possible with CSS can be done: selecting elements based on the properties of their descendants. For this, predicates may be used. Here, your eventual goal is to select only div
elements that enclose a p
element with the third
class. For that, you'll need to select only the div
that matches a certain predicate — having the respective descendant (it needn't be a direct child). You'll do this step by step.
Again, the HTML above is provided as weather_html
.
This exercise is part of the course
Web Scraping in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Select all divs
weather_html %>%
___(xpath = ___)