Data for survival analysis
In the following exercises you are going to work with data about customers of an online shop in order to practice survival analysis. But now it's not about the time until churn, but about the time until the second order.
The data is stored in the object dataNextOrder
. The variable boughtAgain
takes the value 0
for customers with only one order and 1
for customers who have placed a second order already. If a person has ordered a second time, you see the number of days between the first and second order in the variable daysSinceFirstPurch
. For customers without a second order, daysSinceFirstPurch
contains the time since their first (and most recent) order.
The ggplot2
package is already loaded to your workspace.
This exercise is part of the course
Machine Learning for Marketing Analytics in R
Exercise instructions
- Take a look at the data using
head()
. - Plot a histogram of the days since the first purchase separately for customers with vs. without a second order. (If you're not used to
ggplot2
code, don't worry: You just have to use thedaysSinceFirstPurch
as x variable andboughtAgain
as fill and facet variable.)
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Look at the head of the data
___(dataNextOrder)
# Plot a histogram
ggplot(dataNextOrder) +
geom_histogram(aes(x = ___,
fill = factor(___))) +
facet_grid( ~ boughtAgain) + # Separate plots for boughtAgain = 1 vs. 0
theme(legend.position = "none") # Don't show legend