1. Reducing data volume
Let's explore a few optimization examples in Alteryx.
Consider a workflow that merges several key tables from our Vertical Inc. relational data model, focusing on order item details. This workflow generates a comprehensive inventory report, including customer, order, and product information, but only for customers in California..
For this exercise, our goal is to enhance the performance of this workflow. Running the workflow, we notice that it runs in under 1 second. However, this is with a small dataset, making performance gains hard to gauge. To create a more realistic scenario, I'll expand the size of our order_items table by a factor of 1000 using the Generate Rows tool and run the workflow.
The Generate Rows tool duplicates each record in our order_items table a thousand times. So, instead of joining to just over 4,700 records, we are now joining to more than 4.7 million records, a significantly more intensive task!
Once finished, we see the workflow takes close to 20 seconds to complete.
There are a few ways we can improve this workflow. One key bottleneck in this specific example will be data volume as we are joining all dimensions for a rather large number of order_items.
We'll start by eliminating any unnecessary or redundant fields. For example, after using Text to Columns tool or Join tools, we might have some redundant fields. By deselecting these redundant fields, we can reduce the number of fields in our output.
We go through each tool and deselect all redundant fields. Depending on the amount of data and complexity of your workflow, it might make sense to deselect all the redundant fields at the end in the last join tool. However, for this example, since we have a rather short workflow, I deselect the fields in each tool.
We run the workflow again and notice that this change cuts the runtime by a few seconds. While this might not seem significant, eliminating just a few fields can lead to substantial performance improvements as data volume increases.
Next thing we want to focus on is to reduce the vertical size of the data.
Currently, we filter out non-CA customers at the end of the workflow, which means all data is processed through each tool until it's filtered. A more efficient approach is to apply the filter at the beginning of the workflow. By doing this, we only process and join the data we need.
For simplicity, let's move the filter from the end of the workflow to just after the Select tool. I will right click on the filter and select the cut and connect around option. Then I will right click on the connection between the select tool and the join and select the paste in connection option to add the filter at this location.
Since we're using inner joins, this change ensures we only join with CA customers, resulting in smaller joins and less processing time.
We run the workflow again and see that this adjustment saves us close to 5 seconds of processing time!
Overall, we see an improvement of over 30% in processing time. Not bad for two simple improvements!
Now let’s see if you can help optimize our workflow even further!
2. Let's practice!