1. Activities as cornerstones of processes
Let's start with looking at some basic information about a process.
2. Example: Online learning
In this lesson, we will look at data from an online learning environment. While students learn, their activity on the platform is captured and can be used to identify different learning styles.
3. A first glimpse of the event log
The first thing to do is typically to look at a few basic counts.
The number of cases will identify how many students there are.
The number of different activities show us the different actions a student can perform.
The number of events gives us an idea about the total number of learning activities performed by the students, while the time period tells us when they happened.
4. A first glimpse of the event log
To analyse process data, we will use the bupaR package. The learning object is a predefined event log object which contains the event data. Later on, we will see how you can create these event data objects yourself!
Basic process statistics can be inspected by printing the summary of an event log, or by using the count-functions such as displayed here.
We can see that we have information on 498 students, who performed 3645 learning actions of 10 different types.
5. Activities
Once we have an idea of the dimensions, we can look at the data in more detail. A good place to start are the activities. The activities are one of the most important characteristics of a process, as they describe both the actions which are executed, and in which order this happens. More than anything else, activities define the process
6. Exploring activities
In our example, there are 10 different activities. We can retrieve the different types using the activity labels_function. We can see that there are 7 exercises and 1 assessment. Furthermore, students can also consult a dictionary and some theory pages.
7. Exploring activities
If we want more information on the activities, we can use the activities function. This will show us the number of times each of them occurs.
In this example, exercise 1 has been done the most. In fact, the frequency is even higher than the number of students, which indicate that this exercise has been performed more than once by some students. On the other hand, consulting the dictionary or the theory pages has been done the least by the students.
8. Exploring sequences of activities
But a process is not only defined by the activities itself, but also by the order in which these activities occur. Indeed, each case is described by a sequence of activities. This is also called the trace of a case, as it describes the trace a process instance leaves behind in our data.
We can have a look at a few examples traces of students. For student 1, we see a very structured path, progressing through the different exercises, and finally doing the assessment. However, student 2 starts with looking at the theory pages, before doing exercise 1, which he executes two times in a row. Student 3 seems to be very confident, as he immediately proceeds to the assessment without looking at the exercises. Finally, student 4 tries exercise 1 two times, but consults the dictionary in between.
Certainly, not all traces are equally frequent or equally desirable. Some will lead to better results than others, in this case: good or bad grades on the assessment.
9. Exploring sequences of activities
A list of the traces can be retrieved with the `traces` function, or they can be visualized using the `trace_explorer`.
10. Let's practice!
In the following exercises, we will have a look at our first process data, which describes the journey of patients in a hospital.