Get startedGet started for free

Analyzing URLs

We are still working on our exploration of the #RStudioConf dataset. In this exercise, we'll focus on analyzing the URLs contained in the tweets.

The URLs are to be found in an element called "url_urls". These "url_urls" elements contain either a NULL if there was no URL in the tweet, or a list of one or more URLs.

We'll start by extracting all the "url_urls" elements from the dataset, and then we'll combine purrr and stringr to count how many tweets contain a link to a GitHub related URL. Since GitHub is a popular website for developers, a high prevalence of this website will indicate a strong community of developers in our dataset.

purrr and stringr has been loaded for you, and the rstudioconf dataset is still available in your workspace.

This exercise is part of the course

Intermediate Functional Programming with purrr

View Course

Exercise instructions

  • Extract all the "urls_url" elements, and pass the result into flatten() to remove a level of hierarchy.

  • Remove the NULL from the results.

  • Create a mapper called has_github, which detects if a character string contains "github".

  • Use the map_*() variant for logical with has_github, and pass it to sum() to count the number of links containing "github".

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Extract the "urls_url" elements, and flatten() the result
urls_clean <- ___(rstudioconf, ___) %>%
  ___()

# Remove the NULL
compact_urls <- ___(___)

# Create a mapper that detects the patten "github"
has_github <- ___(~ str_detect(.x, "github"))

# Look for the "github" pattern, and sum the result
___( compact_urls, has_github ) %>%
  ___()
Edit and Run Code