Session Ready
Exercise

Preparing Data For Analysis: Exploration

If we are interested in predicting Arrival Delay by Day of the week, then the first thing we might want to do is to explicitly compute the average arrival delay for each day of the week. We can do this using the rxSummary() function.

rxSummary() provides a summary of variables. In the simplest case, it provides simple univariate summaries of individual variables. In this example, we will use it to create summaries of a numeric variable at different levels of a second, categorical variable.

Instructions
100 XP

Use rxSummary() in order to compute the average delay for each day of the week in the myAirlineXdf dataset.

The basic syntax is: rxSummary(formula, data)

  • formula - A formula indicating the variables for which you want a summary. If there is a variable on the left hand side of this formula, then it will compute summary statistics of that variable for each level of a categorical variable on the right hand side. If there is no variables on the left hand side, it will compute summaries of the variables on the right hand side of the formula across the entire dataset.
  • data - The dataset in which to look for variables specified in formula.

Go ahead use rxSummary() to compute the mean arrival delay for each day of the week.

After you have viewed the average arrival delay for each day of the week, you might also want to view the distribution of arrival delays. We can do this using rxHistogram(). Go ahead and use rxHistogram to visualize the distribution.