1. Unions and appends in Alteryx
Let's look at how to use the Union and Append tools in Alteryx.
In this example, we want to create a comprehensive list of orders for the store with the highest count of unique orders. We also want to append all store-related information to the final output to share the data with the Finance department.
Let’s load the orders tables. We have three distinct tables separated by order year. We load all three tables dragging them to the canvas.
Next, we will use a union tool, which is found in the Join section. We’ll connect all three tables to the input anchor of the union tool.
Once I have connected all three tables to my union tool I will select the tool. There we see three different options. First I see the Auto Config by Name. This means that we will stack the fields that have matching field names. We also see that we have the option to stack by position or manually select how to stack our fields which could be useful in other cases.
The next two options are regarding how we want to deal with errors, and whether we want to output all fields or only common fields between the inputs. Since we expect our orders tables to look the same, we will keep the default options as they are.
To figure out what store has the most unique number of orders, we will add a summarize tool after our union where we group by store_id and count distinct order_ids. We run the workflow and sort the results in the preview window. Store two is by far the busiest store. Let’s add a filter and only include sales from store two.
We now have a complete list of all orders that came from our busiest store. However, we are still missing the store information. We want to include the store name, phone number, email, and address in our final output.
We will load the stores table. Looking at the preview window, we see that the stores table contains all the store-related data we are looking for. We know we are only looking at store 2, so we will also filter our stores table.
This is a very important step as the append tool in Alteryx by default, performs a cartesian join. A cartesian join combines every row from two tables, resulting in a dataset where each row from the first table is paired with every row from the second table. This is not what we want in this example as it would result in a table three times as big, so we have to purposely filter the stores table before appending to our orders table. We will cover joins more closely in the next chapter.
Next, we’ll add an append tool and attach the orders stream to the target anchor, and the stores stream to the source anchor. A best practice is to connect your large input, in this case the orders, to the target anchor, while your small input to the source anchor.
When selecting the append tool we can see that one field has been renamed to Source_store_id. This is because both tables contain the store_id field. Alteryx is smart enough to rename the duplicate field. However, since this field is redundant, I will go ahead and press the options drop-down and deselect all duplicate fields.
Lastly, I add a browse tool and run my workflow. We can see that we now have a complete list of all orders from store two with the store information appended to every single order.
Now it’s your turn to start using unions and appends in alteryx!
2. Let's practice!