1. Filtering and sorting
Great work so far!
2. Exploring data
In this lesson, we'll look at a few other handy built-in functions for data exploration. Let's dive in!
3. Finding unique values
When dealing with categorical data, where a cell can only have one of a finite number of values, like countries or blood groups, it's helpful to know what unique values we have in the column.
The built-in UNIQUE function can do just that!
Let's find the unique countries in this column, which contains over a hundred rows.
4. Finding unique values
Like the other built-in functions we've seen, we select a cell, start our formula, and type the function name.
The UNIQUE function takes a range of values to search across as an argument.
5. Finding unique values
The range starts at cell A2, and we want to go to the end of the column. There's a neat shortcut to prevent us scrolling to find the address of the final cell in the range:
6. Finding unique values
we can pass the column index A, with no row index, which tells UNIQUE that we want to search over every cell in column A, starting at A2.
The beauty of using this shortcut is that the formula automatically updates if new rows are added to the spreadsheet.
7. Finding unique values
The function populates the selected cell and those beneath it with the 17 unique countries in the column.
8. Filtering
Filtering data is also useful for data exploration, because we can extract subsets of our original dataset, which we can then explore in more detail.
In this dataset containing the number of backers for kickstarter games, we may only want to explore games with more than 1000 backers.
9. FILTER()
The FILTER function filters the cell range based upon at least one condition. From the square brackets, we can see that the function only requires one condition. Additional conditions can be specified but they're optional.
In our kickstarter dataset, we want to filter the games stored in column A, from cell A2 downwards. We begin our formula with the function name, and like before,
10. FILTER()
specify the range A2-colon-A, so we're filtering over all game names.
For the condition, we want to filter rows where the Backers column is over 1000,
11. FILTER()
so we use the range B2-colon-B to reference the Backers column,
12. FILTER()
apply the greater-than comparison operator,
13. FILTER()
and finally the value, 1000.
14. FILTER()
This populates the selected cell and those below it with the games with over 1000 backers.
15. Sorting
The final data exploration tool we'll discuss is sorting.
Sorting our data allows us to identify the largest or smallest values, and how many there are. This is useful when trying to determine if there are outliers in the dataset.
16. SORT()
The SORT function has a few different arguments: range, like the other functions we've seen, is the cell range to sort. This can be a single column or multiple columns. sort_column is the column used to sort the range. is_ascending lets us specify whether we want to sort in ascending or descending order.
17. Sorting one column
To sort the Backers column, we can use SORT and pass the range B2-colon-B.
18. Sorting one column
By default, cells are ordered in ascending order.
19. Sorting multiple columns
If we wanted to sort both the name and backers columns based on the number of backers,
20. Sorting multiple columns
we'll need to start by specifying the cell range to sort, A2-colon-B, which spans both columns.
21. Sorting multiple columns
Then, we pass two to the sort_column argument, as we want to sort using the second column in the range.
22. Sorting multiple columns
Finally, we can pass FALSE to the is_ascending argument, to sort in descending order.
23. Sorting multiple columns
Looking at the results, we can see that game name and backers have both been sorted by Backers in descending order.
24. Let's practice!
Head on over to the exercises to try out these new functions!