Session Ready
Exercise

Playing with tweets, round 1

Do you remember you've been working as a data analyst for a web agency in the last chapters of this course? Well, you've been doing great and now you've been given another project ;) In this chapter, you'll be analyzing a new kind of data: JSON output.

Your engineering team has given you the output of a data collection which contains tweets, gathered during the RStudio Conf 2018. As this dataset is in JSON, you've read it as a nested list with R.

First, you want to do some basic exploration of this dataset, and purrr will come to the rescue for that. The package has been loaded for you, and the rstudioconf dataset is available in your workspace.

Note: don't try to print the entire dataset — it's too big to be printed in the datacamp console.

Be aware that this is real data from Twitter and as such there is always a risk that it may contain profanity or other offensive content (in this exercise, and any following exercises that also use real Twitter data).

Instructions
100 XP
  • Print the first element of the list, to have an overview of the content and structure.

  • As you want to focus on tweets that are original (not retweets), create a sublist of non-retweet using the logical element "is_retweet" contained in each sublist.

  • Extract the "favorite_count" element of each element of this new sublist using the map_* variant for integers.

  • Get the median of the previous result.