Get startedGet started for free

Sampling and Summarizing

1. Sampling and Summarizing

Now that you've mastered the Filter and Formula tools, let's explore the very useful Sample and Summarize tools!

2. Sampling data is like . . .

Sampling data can be thought of as sampling fluids in a chemistry lab. You can take the top layer of fluid (for example, when oil floats on water), the top percent of fluid, or even a random sample.

3. Sample tool options

The Sample tool provides multiple options for taking a sample of your data. These include First or Last N rows, Skip the first N rows, 1 of every N rows, 1 in N chance, and First N% of rows.

4. Top N and Bottom N Records

Combining a Sort tool before the Sample tool is an efficient way to sample the Top N records. For example, sorting total sales in descending order and applying a sample tool where N equals 10 returns the top 10 sales.

5. Skip 1st N rows

The Skip 1st N rows will provide all rows after the first number of rows specified is skipped. This is one method of skipping rows that contain extraneous information before the rows containing data, such as a company description and logo in a spreadsheet.

6. 1 of every N rows

The 1 of every N rows option returns the first row of every N rows in the dataset. For example, 1 of every 10 rows would return the first record in each group of 10 records, in order. This can be used to view patterns in a running total, and 100 initial records would return 10 records. However, this is not a random sample process.

7. 1 in N chance to include each row

The 1 in N chance option is a random dataset sample, with a 1 in N probability of being included in the output. For example, a 1 in 25 chance out of 1000 records creates a random sample of the entire dataset. Note, this option will produce a new randomly sampled dataset with each workflow run. If your dataset has unique identifiers, you will see different IDs with each run.

8. First N% of rows

The First N% of rows option allows you to obtain the first percentage specified of rows, in order from first to last, such as "Top 25% of sales". In a dataset of 1000 records, the first 25% of rows would return rows 1 to 250.

9. Grouping with samples

The Sample tool also allows grouping by column, and one or more fields can be selected in the grouping option. An example would be the Top 10 sales by region.

10. Using the Summarize tool

The Summarize tool in the Transform toolset provides many features for grouping and summarizing data. All actions depend on the data type of each field, such as string and numeric functions. Only the fields with actions applied are output, and multiple actions can be applied to the same field, such as Min and Max Sales Revenue.

11. Summarizing string data

When summarizing string data, there are many options available. A few are Group By (such as Group By State), Count, Count Non Null, Min, Max (such as Max Date), Mode, First, Last, and Concatenate.

12. Summarizing numeric data

Numeric fields have many action options. A few of these are Group By and some of the previous string data type actions, plus Sum (such as Total Sales), Average, Median and Mode (for example, median income), Percentile, Standard Deviation, and many Finance formulas such as IRR and NPV.

13. Order does matter

The Summarize tool actions are applied hierarchically, such as with the Group By function. If you are grouping first by Region, then by Sales Team, and then by Salesperson, the data will be grouped in that order and output with the columns listed left to right.

14. Alteryx file types

There are different Alteryx file types, and three of the main ones utilized are: Alteryx database .yxdb. Data can be output in this native format and input into workflows. Alteryx Designer workflow .yxmd standard format, and an Alteryx packaged workflow, .yxzp, which can zip input and output datasets within the package for ease in sharing.

15. Let's practice!

Now, let's apply your new knowledge in some exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.