Exercise 1. Distributions - 1

You may have noticed that numerical data is often summarized with the average value. For example, the quality of a high school is sometimes summarized with one number: the average score on a standardized test. Occasionally, a second number is reported: the standard deviation. So, for example, you might read a report stating that scores were 680 plus or minus 50 (the standard deviation). The report has summarized an entire vector of scores with with just two numbers. Is this appropriate? Is there any important piece of information that we are missing by only looking at this summary rather than the entire list? We are going to learn when these 2 numbers are enough and when we need more elaborate summaries and plots to describe the data.

Our first data visualization building block is learning to summarize lists of factors or numeric vectors. The most basic statistical summary of a list of objects or numbers is its distribution. Once a vector has been summarized as distribution, there are several data visualization techniques to effectively relay this information. In later assessments we will practice to write code for data visualization. Here we start with some multiple choice questions to test your understanding of distributions and related basic plots.

In the murders dataset, the region is a categorical variable and on the right you can see its distribution. To the closet 5%, what proportion of the states are in the North Central region?

Possible answers

75%

50%

25%

Data Types

Quantiles, Percentiles, and Boxplots

Distributions

Normal Distributions

Robust Summaries with Outliers

Introduction to ggplot2

Summarizing with dplyr

Exploring the gapminder dataset

Data Visualization Principles - Part 1

Data Visualization Principles - Part 2

Data Visualization Principles - Part 3

Exercise

Exercise 1. Distributions - 1

Instructions

Possible answers