BaşlayınÜcretsiz Başlayın

Back to the office

You are still working as a data analyst for a web agency, and you've been asked to do web scraping. You have been given a list of URLs to analyze, an analysis you've already started in the previous chapter.

You expect this task to be recurrent: no doubt you'll be asked to do it again in a few weeks. In order to make your future work easier, you've decided to try and write clean code today, so that it will be easier to come back to it later.

We'll start by combining the two functions from httr we've seen in the previous chapter: GET(), for retrieving the webpage, and status_code(), to extract the status code, in order to create a status code extractor.

The urls vector is still available in your workspace. We have kept only the URLs that are reachable.

Bu egzersiz

Intermediate Functional Programming with purrr

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Launch purrr and httr.

  • Compose a status extractor with GET() and status_code().

  • Try this new function on "https://www.thinkr.fr" and "https://en.wikipedia.org".

  • Map this function directly on the vector urls.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Launch purrr and httr



# Compose a status extractor 
status_extract <- ___(___, ___)

# Try with "https://thinkr.fr" & "https://en.wikipedia.org"
___("https://thinkr.fr")
___("https://en.wikipedia.org")

# Map it on the urls vector, return a vector of numbers
___(urls, ___)
Kodu Düzenle ve Çalıştır