Session Ready
Exercise

Analyzing URLs

We are still working on our exploration of the #RStudioConf dataset. In this exercise, we'll focus on analyzing the URLs contained in the tweets.

The URLs are to be found in an element called "url_urls". These "url_urls" elements contain either a NULL if there was no URL in the tweet, or a list of one or more URLs.

We'll start by extracting all the "url_urls" elements from the dataset, and then we'll combine purrr and stringr to count how many tweets contain a link to a GitHub related URL. Since GitHub is a popular website for developers, a high prevalence of this website will indicate a strong community of developers in our dataset.

purrr and stringr has been loaded for you, and the rstudioconf dataset is still available in your workspace.

Instructions
100 XP
  • Extract all the "urls_url" elements, and pass the result into flatten() to remove a level of hierarchy.

  • Remove the NULL from the results.

  • Create a mapper called has_github, which detects if a character string contains "github".

  • Use the map_*() variant for logical with has_github, and pass it to sum() to count the number of links containing "github".