Let's see what's in the basket
1. Let's see what's in the basket
Until now, we haven't produced many graphical visualizations, right? Given the potentially large number of items and extracted rules, we need to be able to visualize items and rules in a convenient and efficient way.2. Visualizing items
Let's go back to the transactions we saw in chapter 1. The first obvious visualization one could think of is a simple barplot. In the Market Basket terminology, this is referred as the "Item Frequency plot". It shows the number of times an item has occurred across all transactions - it enables you to compare the occurrence of the different items. For instance, Butter appears in 6 transactions out of the total 7 transactions while Bread only appears 3 times.3. Visualizing items in R
How does it work in R? The function to use is explicit enough: "ItemFrequencyPlot". The main argument is the transactional dataset, denoted as "data_trx" here. To display the absolute number of occurences of items, set "absolute" in the type argument. This will yield the traditional bar plot you are used to. If however, you want to have on the y-axis the relative counts (or a proportion), then use "relative" in the type argument. As you see, both graphs look similar, except for the y-axis. The item "Butter" appears in 6 transactions out of the total 7 transactions, which represents a proportion of 85%. On the other hand, "Bread" appears in 3 transactions, representing a proportion of 42%.4. Top items
So far, very easy with only 4 items. But what if you have thousands of items? The visualization can become unclear and therefore useless to extract any kind of information. Fortunately, the "topN" argument of the ItemFrequencyPlot function does the trick. On one hand, it allows you to select only the N most frequent items and on the other hand it reorders the barplot from the most frequent items to the least frequent ones. In this example, "topN" equal to 4 sorts the items accordingly but displays all of them given that there are only 4 possible items. The "TopN" argument is very useful in a situation where you would like to display the first 10 most frequent items among thousands or millions of items.5. Further customization
Finally, let's polish and further customize the plot. This is an important step in visualization which you should not neglect. We can control the title of the plot with the argument "main". Adding colours to the bars is controlled via the "col" argument, in this case we create a vector of 4 colors from the rainbow palette. As with the usual R base plots, x and y labels are controlled via "xlab" and "ylab" respectively. For readibility purposes, you can as well increase the font size of labels with "cex.names". To flip the barplot horizontally, set the "horiz" argument to true. If we apply this function, this is output of our customized plot!6. Let's plot items!
Now it's your turn to plot items from the Online Retail dataset. The main difference here is that you will be working with more than just 4 items!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.