BaşlayınÜcretsiz Başlayın

Apply throttling to a multi-page crawler

The goal of this exercise is to get the coordinates of earth's three highest mountain peaks, together with their names.

You'll get this information from their corresponding Wikipedia pages, in real-time. In order not to stress Wikipedia too much, you'll apply throttling using the slowly() function. After each call to a Wikipedia page, your program should wait a small amount of time. Three pages of Wikipedia might not be that much, but the principle holds for any amount of scraping: be gentle and add wait time between requests.

You'll find the name of the peak within an element with the ID "firstHeading", while the coordinates are inside an element with class "geo-dms", which is a descendant of an element with ID "coordinates".

purrr has been preloaded and the URLs are contained in mountain_wiki_pages.

Bu egzersiz

Web Scraping in R

kursunun bir parçasıdır
Kursu Görüntüle

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Define a throttled read_html() function with a delay of 0.5s
read_html_delayed <- ___(___, 
                         rate = ___(___))
Kodu Düzenle ve Çalıştır