Relationship between trip duration and total fare
We would assume that there is a relationship between the total cab fare and the duration of the trip. Since there are too many data points to make a scatterplot, let's use a hexagon-binned plot to investigate this relationship.
tx
is available for you in your workspace.
Cet exercice fait partie du cours
Visualizing Big Data with Trelliscope in R
Instructions
- Use hexagon bins to visualize the bivariate distribution of
total_amount
(y-axis) vs.trip_duration
(x-axis). - Set the
bins
argument ofgeom_hex()
to 75. - Since both variables are highly skewed, rescale both the x and y axes to log base 10. Note that these transformations will generate some warnings about a relatively small number of records with zero trip duration or fare amount.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
library(ggplot2)
# Create a hexagon-binned plot of total_amount vs. trip_duration
ggplot(tx, aes(___, ___)) +
___ +
___ +
___