Get startedGet started for free

Where can you observe Zipf's law?

Although Zipf observed a steep and predictable decline in word usage you may not buy into Zipf's law. You may be thinking "I know plenty of words, and have a distinctive vocabulary". That may be the case, but the same can't be said for most people! To prove it, let's construct a visual from 3 million tweets mentioning "#sb". Keep in mind that the visual doesn't follow Zipf's law perfectly, the tweets all mentioned the same hashtag so it is a bit skewed. That said, the visual you will make follows a steep decline showing a small lexical diversity among the millions of tweets. So there is some science behind using lexicons for natural language analysis!

In this exercise, you will use the package metricsgraphics. Although the author suggests using the pipe %>% operator, you will construct the graphic step-by-step to learn about the various aspects of the plot. The main function of the package metricsgraphics is the mjs_plot() function which is the first step in creating a JavaScript plot. Once you have that, you can add other layers on top of the plot.

An example metricsgraphics workflow without using the %>% operator is below:

metro_plot <- mjs_plot(data, x = x_axis_name, y = y_axis_name, show_rollover_text = FALSE)
metro_plot <- mjs_line(metro_plot)
metro_plot <- mjs_add_line(metro_plot, line_one_values)
metro_plot <- mjs_add_legend(metro_plot, legend = c('names', 'more_names'))
metro_plot

This exercise is part of the course

Sentiment Analysis in R

View Course

Exercise instructions

  • Use head() on sb_words to review top words.
  • Create a new column expectations by dividing the largest word frequency, freq[1], by the rank column.
  • Start sb_plot using mjs_plot().
    • Pass in sb_words with x = rank and y = freq.
    • Within mjs_plot() set show_rollover_text to FALSE.
  • Overwrite sb_plot using mjs_line() and pass in sb_plot.
  • Add to sb_plot with mjs_add_line().
    • Pass in the previous sb_plot object and the vector, expectations.
  • Place a legend on a new sb_plot object using mjs_add_legend().
    • Pass in the previous sb_plot object
    • The legend labels should consist of "Frequency" and "Expectation".
  • Call sb_plot to display the plot. Mouseover a point to simultaneously highlight a freq and Expectation point. The magic of JavaScript!

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Examine sb_words
___

# Create expectations
sb_words$expectations <- sb_words %$% 
  {___[___] / ___}

# Create metrics plot
sb_plot <- ___(___, x = ___, y = ___, ___ = ___)

# Add 1st line
sb_plot <- ___(___)

# Add 2nd line
sb_plot <- ___(___, ___)

# Add legend
sb_plot <- ___(___, legend = c("___", "___"))

# Display plot
sb_plot
Edit and Run Code