Aan de slagGa gratis aan de slag

Analyzing URLs

We are still working on our exploration of the #RStudioConf dataset. In this exercise, we'll focus on analyzing the URLs contained in the tweets.

The URLs are to be found in an element called "url_urls". These "url_urls" elements contain either a NULL if there was no URL in the tweet, or a list of one or more URLs.

We'll start by extracting all the "url_urls" elements from the dataset, and then we'll combine purrr and stringr to count how many tweets contain a link to a GitHub related URL. Since GitHub is a popular website for developers, a high prevalence of this website will indicate a strong community of developers in our dataset.

purrr and stringr has been loaded for you, and the rstudioconf dataset is still available in your workspace.

Deze oefening maakt deel uit van de cursus

Intermediate Functional Programming with purrr

Cursus bekijken

Oefeninstructies

  • Extract all the "urls_url" elements, and pass the result into flatten() to remove a level of hierarchy.

  • Remove the NULL from the results.

  • Create a mapper called has_github, which detects if a character string contains "github".

  • Use the map_*() variant for logical with has_github, and pass it to sum() to count the number of links containing "github".

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Extract the "urls_url" elements, and flatten() the result
urls_clean <- ___(rstudioconf, ___) %>%
  ___()

# Remove the NULL
compact_urls <- ___(___)

# Create a mapper that detects the patten "github"
has_github <- ___(~ str_detect(.x, "github"))

# Look for the "github" pattern, and sum the result
___( compact_urls, has_github ) %>%
  ___()
Code bewerken en uitvoeren