BaşlayınÜcretsiz Başlayın

Extract nodes based on the number of their children

As shown in the video, the XPATH count() function can be used within a predicate to narrow down a selection to these nodes that match a certain children count. This is especially helpful if your scraper depends on some nodes having a minimum amount of children.

Here's an excerpt from a page (without any classes or IDs…) that you might be scraping:

...
<div>
  <h1>Tomorrow</h1>
</div>
<div>
  <h2>Berlin</h2>
  <p>Temperature: 20°C</p>
  <p>Humidity: 50%</p>
</div>
<div>
  <h2>London</h2>
  <p>Temperature: 15°C</p>
</div>
<div>
  <h2>Zurich</h2>
  <p>Temperature: 22°C</p>
  <p>Humidity: 60%</p>
</div>
...

You're only interested in divs that have exactly one h2 header and at least two paragraphs, because your application can't really deal with incomplete weather forecasts.

The above HTML is available to you via forecast_html.

Bu egzersiz

Web Scraping in R

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Select the desired divs with the appropriate XPATH selector, making use of the count() function.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Select only divs with one header and at least two paragraphs
forecast_html %>%
	html_elements(xpath = '___')
Kodu Düzenle ve Çalıştır