BaşlayınÜcretsiz Başlayın

Analyzing URLs

We are still working on our exploration of the #RStudioConf dataset. In this exercise, we'll focus on analyzing the URLs contained in the tweets.

The URLs are to be found in an element called "url_urls". These "url_urls" elements contain either a NULL if there was no URL in the tweet, or a list of one or more URLs.

We'll start by extracting all the "url_urls" elements from the dataset, and then we'll combine purrr and stringr to count how many tweets contain a link to a GitHub related URL. Since GitHub is a popular website for developers, a high prevalence of this website will indicate a strong community of developers in our dataset.

purrr and stringr has been loaded for you, and the rstudioconf dataset is still available in your workspace.

Bu egzersiz

Intermediate Functional Programming with purrr

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Extract all the "urls_url" elements, and pass the result into flatten() to remove a level of hierarchy.

  • Remove the NULL from the results.

  • Create a mapper called has_github, which detects if a character string contains "github".

  • Use the map_*() variant for logical with has_github, and pass it to sum() to count the number of links containing "github".

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Extract the "urls_url" elements, and flatten() the result
urls_clean <- ___(rstudioconf, ___) %>%
  ___()

# Remove the NULL
compact_urls <- ___(___)

# Create a mapper that detects the patten "github"
has_github <- ___(~ str_detect(.x, "github"))

# Look for the "github" pattern, and sum the result
___( compact_urls, has_github ) %>%
  ___()
Kodu Düzenle ve Çalıştır