Session Ready
Exercise

Using the qqPlot() function to see many details in data

A practical limitation of both histograms and density estimates is that, if we want to know whether the Gaussian distribution assumption is reasonable for our data, it is difficult to tell.

The quantile-quantile plot, or QQ-plot, is a useful alternative: we sort our data, plot it against a specially-designed x-axis based on our reference distribution (e.g., the Gaussian "bell curve"), and look to see whether the points lie approximately on a straight line. In R, several QQ-plot implementations are available, but the most convenient one is the qqPlot() function in the car package.

The first part of this exercise applies this function to the 16-week chick weight data considered in the last exercise, to show that the Gaussian distribution appears to be reasonable here. The second part of the exercise applies this function to another variable where the Gaussian distribution is obviously a poor fit, but the results also show the presence of repeated values (flat stretches in the plot) and portions of the data range where there are no observations (vertical "jumps" in the plot).

Instructions 1/2
undefined XP
  • 1
    • Load the car package to make the qqPlot() function available for use.
    • Create the variable index16 using the which() function that selects records from the ChickWeight data frame with Time equal to 16.
    • Create the variable weights that gives the weights of the 16-week old chicks.
    • Apply the qqPlot() function to the weights data, noting that almost all of the points fall within the confidence intervals around the reference line, indicating a reasonable conformance with the Gaussian distribution for this data sequence.
    • 2

      Apply the qqPlot() function to the tax variable from the Boston data frame in the MASS package.