Analyzing URLs
We are still working on our exploration of the #RStudioConf dataset. In this exercise, we'll focus on analyzing the URLs contained in the tweets.
The URLs are to be found in an element called "url_urls"
. These "url_urls"
elements contain either a NULL
if there was no URL in the tweet, or a list of one or more URLs.
We'll start by extracting all the "url_urls"
elements from the dataset, and then we'll combine purrr
and stringr
to count how many tweets contain a link to a GitHub related URL. Since GitHub is a popular website for developers, a high prevalence of this website will indicate a strong community of developers in our dataset.
purrr
and stringr
has been loaded for you, and the rstudioconf
dataset is still available in your workspace.
This exercise is part of the course
Intermediate Functional Programming with purrr
Exercise instructions
Extract all the
"urls_url"
elements, and pass the result intoflatten()
to remove a level of hierarchy.Remove the
NULL
from the results.Create a mapper called
has_github
, which detects if a character string contains"github"
.Use the
map_*()
variant for logical withhas_github
, and pass it tosum()
to count the number of links containing"github"
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Extract the "urls_url" elements, and flatten() the result
urls_clean <- ___(rstudioconf, ___) %>%
___()
# Remove the NULL
compact_urls <- ___(___)
# Create a mapper that detects the patten "github"
has_github <- ___(~ str_detect(.x, "github"))
# Look for the "github" pattern, and sum the result
___( compact_urls, has_github ) %>%
___()