Grouping insights and creating output
1. Grouping insights and creating output
Let’s return to the filter of Attendance greater than or equal to 25,000 to have a larger dataset. We want to find the stadiums with the highest attendance, so we will first sort Attendance by descending order. Now we can obtain a First N Percent of rows. Let’s add a Sample tool and analyze the top 33% of rows based on attendance, grouped by Location. Clicking the output anchor shows 229 records out of the original 693, approximately one-third of the dataset. Sort on the Location field using the Results window. To do this, hover over the Location fieldname, and a vertical ellipse appears on the right. Click the ellipse, and different options will appear. Select Sort, then Sort Descending, then Apply. You can now see there are three records for Barcelona. To add the Sort to the workflow, you can click the green check mark. Another feature is a 1 in N Chance. Let’s do a 1 in 10 chance to include each row. With this technique, the records will be different with each run. Note the RecordIDs. Clicking Run again, you get a different random sample. Next, let’s use the Summarize tool to obtain counts of the home teams playing in each stadium. Group first by the Stadium. Click Add, and these are the options available for string fields. Click Group By; next, group by Home Team Name. We want a unique count, so use RecordID, a numeric data type. Note how the actions change slightly for numeric data. Click Count. We quickly see the Stadium, Home Team Name, and Count. The Summarize tool will output only the fields being utilized, and works hierarchically, applying actions in order from top to bottom. Sort the count in descending order in the Results window. Brazil has played most often as the home team in the Jalisco Stadium and England in the Wembley Stadium. Now, we want the most frequent home team playing at the stadium, which is the highest count per stadium. Add a Sort and sort the Count by descending order, and Stadium by ascending order. In this dataset, Estadio Azteca hosted Mexico five times and Argentina three times. Add a Sample tool and select First N rows where N equals 1, then Group By Stadium. Sort ascending in the Results window, and for Estadio Azteca, the first record is Mexico, having played most often at that location five times. Next, we want to review the possibility of home team advantage, find the Max DateTime, the most recent game, and then add the Average Attendance per stadium. Connect another Summarize tool to the main data, then group by Location and Stadium. Add the Max DateTime. Next, Sum Home Team Goals. For comparison, Sum Away Team Goals. Since Attendance is a numeric format, let’s use the average Attendance. Sort on Home Team Goals, and now we have the results; the highest home team goal count in this dataset was at the Estadio Azteca stadium in Mexico City. This stadium had an average attendance of over 100,000. We can export this, share it with others, and use it in another workflow. Add an Output Data tool, and Select a File. Save as an Alteryx database .yxdb format called “Summarize Dataset”. Start a new workflow and connect to the file. Click run, and you can see the input dataset. Now that you’ve learned to summarize data quickly, let’s begin with some exercises!2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.