Where can you observe Zipf's law?
Although Zipf observed a steep and predictable decline in word usage you may not buy into Zipf's law. You may be thinking "I know plenty of words, and have a distinctive vocabulary". That may be the case, but the same can't be said for most people! To prove it, let's construct a visual from 3 million tweets mentioning "#sb". Keep in mind that the visual doesn't follow Zipf's law perfectly, the tweets all mentioned the same hashtag so it is a bit skewed. That said, the visual you will make follows a steep decline showing a small lexical diversity among the millions of tweets. So there is some science behind using lexicons for natural language analysis!
In this exercise, you will use the package metricsgraphics
. Although the author suggests using the pipe %>%
operator, you will construct the graphic step-by-step to learn about the various aspects of the plot. The main function of the package metricsgraphics
is the mjs_plot()
function which is the first step in creating a JavaScript plot. Once you have that, you can add other layers on top of the plot.
An example metricsgraphics
workflow without using the %>%
operator is below:
metro_plot <- mjs_plot(data, x = x_axis_name, y = y_axis_name, show_rollover_text = FALSE)
metro_plot <- mjs_line(metro_plot)
metro_plot <- mjs_add_line(metro_plot, line_one_values)
metro_plot <- mjs_add_legend(metro_plot, legend = c('names', 'more_names'))
metro_plot
This exercise is part of the course
Sentiment Analysis in R
Exercise instructions
- Use
head()
onsb_words
to review top words. - Create a new column
expectations
by dividing the largest word frequency,freq[1]
, by therank
column. - Start
sb_plot
usingmjs_plot()
.- Pass in
sb_words
withx = rank
andy = freq
. - Within
mjs_plot()
setshow_rollover_text
toFALSE
.
- Pass in
- Overwrite
sb_plot
usingmjs_line()
and pass insb_plot
. - Add to
sb_plot
withmjs_add_line()
.- Pass in the previous
sb_plot
object and the vector,expectations
.
- Pass in the previous
- Place a legend on a new
sb_plot
object usingmjs_add_legend()
.- Pass in the previous
sb_plot
object - The legend labels should consist of
"Frequency"
and"Expectation"
.
- Pass in the previous
- Call
sb_plot
to display the plot. Mouseover a point to simultaneously highlight afreq
andExpectation
point. The magic of JavaScript!
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Examine sb_words
___
# Create expectations
sb_words$expectations <- sb_words %$%
{___[___] / ___}
# Create metrics plot
sb_plot <- ___(___, x = ___, y = ___, ___ = ___)
# Add 1st line
sb_plot <- ___(___)
# Add 2nd line
sb_plot <- ___(___, ___)
# Add legend
sb_plot <- ___(___, legend = c("___", "___"))
# Display plot
sb_plot