1. The preparation toolset
Now that you are more familiar with the Alteryx Designer interface - let’s get back to our workflow.
We used our Select tool to control which columns we wanted to continue within our workflow. However, there are more configuration options available for our data preparation processes.
When looking at the Type column in the configuration settings, all the columns in the dataset are set to V_String - variable length string. This data type adjusts to accommodate different lengths of text data.
Looking at our dataset, this data type works for columns like Country and Region, but most other columns display numbers and should have more appropriate data types.
We can see all the options when clicking on a column's data type. A country's population will be a large whole number. So, let’s use an integer for this column.
We will pick the same data type for the area column. Next, we will make the population density column easier to read. In the configuration window, there is a section that allows us to rename whichever column we choose. So, let’s change the name from population density per square mile to Pop. density (per sq. mi.).
Now, let’s add a Browse tool and click Run to see the effects of the changes. It is generally better to use as few Browse tools as possible, but as we are still learning the different tools, it is a good way to review the data tasks performed in our workflow.
We can see the column name change we made has been applied. We can also see that next to the Population and Area square mile columns, a new image indicates that these columns have numerical values. The Population column profile shows statistics related to numerical values rather than strings.
The average country's population is just over 28 million! Let’s see if the metadata of our dataset has been affected.
We can see the changes we made to the data type of the two columns.
Now, let’s move on to sorting our dataset. The data was stored and exported in alphabetical order of the country name, but we want to sort it based on population! In the preparation toolset, the Sort tool will allow us to change the sorting order and criteria.
Once connected, access the tool's configuration window. We can choose the column to sort the data by and the order in which it is sorted. Choose the population column and order by descending to order the data by the country with the highest population at the top. It is possible to add more sorting criteria by using the same tool. We only want to sort by population, so let’s now run the workflow.
If we look at the output of the sort tool, we see that China is the first country displayed, as it had the highest population in the world at that point.
The final tool, the Sample tool, simplifies examining extensive datasets by creating manageable samples. This aids in identifying issues with values and facilitates statistical analysis of a subset of the data.
Let’s add the tool to the workflow and connect it to the sort tool.
In the configuration window, we have different options for selecting our sample of data. N allows us to control the sample size created. We have sorted our data by population so we can use the sample tool to create a sample of the data containing the top 10 countries by population. If we run this workflow, the output is the top 10 countries by population, with China first and Japan tenth.
One optional configuration setting is to group the sampling criteria by specific columns. For example, if we select the Region column to group by and click Run, we will get a sample that includes the top 10 countries by population per region!
That’s pretty useful when preparing and examining data from large datasets. Now, it’s time to try out these new tools.
2. Let's practice!