1. Enriching events
Finally, we will discuss how to enrich event data in this chapter.
Enriching event data is nothing more than adding calculated variables to the data. We can do this in two ways.
2. Mutate new variables
The first way is the traditional way of adding new variables to any dataset. In this case, dplyr’s mutate is very helpful. The new variables, can be defined using any available function you can think of.
3. Mutate new variables
For instance, suppose that some of the events has a cost attribute. However, we would like to know the cost of the cases. This can be achieved by grouping on the cases, and then using mutate, summing up all the cost values, possibly ignoring some missing values.
Note how grouping on cases is done directly using the group_by_case function.
4. Mutate new variables
Consequently, we can use this total cost value, to create various cost, or impact, categories. For this we can use case_when, ifelse, or any other function you are familiar with to achieve this categorization.
5. Mutate new variables
Instead of using a sum function, we can also exploit some more advanced functionalities. For instance, consider the str_detect function of the stringr package. This function will detect strings or regular expressions in a character vector and return a logical vector, indicating which of the vector elements contain the pattern.
We can use this to detect certain activities in cases, and use this to create a logical variable which describes a certain aspect of this case. For instance, by detecting the Pay Claim activity in each case, we can create a logical variable indicating whether a refund was made or not.
6. Adding process metrics
The flexibility to create calculated variables in this way is clearly unlimited. However, there exist a list of calculations which are very common in a process context. We might want to add the processing time of each case as an attribute to the data, or the case length, or the frequency of each activity, or any other metric that we saw in the previous chapter.
Instead of doing these calculations ourselves, or performing cumbersome joins with metric output, we can use the metric functions themselves directly to achieve this.
7. Adding process metrics
Each metric has an append argument, which if set to true, will add the metric to the original data instead of creating the familiar output list. The append argument can be used at the granularity levels case, activity, resource and resource activity. This way, any predefined metric can be effortlessly added as a calculated variable, without interrupting the workflow. Consequently, also these continuous metrics can be used to define categorical variables, for instance, was the case finished within one week or not?
8. Let's practice!
Now let's try some examples.