Get startedGet started for free

Filtering arrays

1. Filtering arrays

We've learned how to select data based on location, but what about selecting based on whether data meets a condition? Enter filtering!

2. Two ways to filter

There are two main ways to filter in NumPy; each is useful in different situations. Let's start with masks and fancy indexing.

3. Boolean masks

The code to create a mask checks whether a condition is true for each element in an array. The mask itself is an array of Booleans with the same shape as the evaluated array. Here we have an array which holds the numbers one to five. To filter the array so that it includes only even numbers, first create a Boolean mask of True and False values based on whether the element is evenly divisible by two.

4. Filtering with fancy indexing

Once we have a Boolean mask indicating which elements the condition holds true for, we can index the array using the mask. This is called fancy indexing, and it's useful when we are only interested in the elements that meet a condition. Think of the mask as providing the indices of all elements where the condition is true. In the case of numbers from one to five, only two and four are evenly divisible by two.

5. 2D fancy indexing

We may want to filter based on a condition in one row or column but return data from another. Let's say we are assigning partners in a school, and we want to know which class ids have an even number of students. Class ids are in the left column and class sizes are in the right. First, create a mask which checks which values in the second column are divisible by two.

6. 2D fancy indexing

Then, index the first column using that mask so that we return class ids for rows where the class size in the second column meets the condition.

7. Fancy indexing vs. np.where()

We've seen that fancy indexing returns a filtered array of elements which meet a condition. np-dot-where returns an array of indices of elements which meet the condition. This can be useful when indices are needed later to direct NumPy where to apply code. np-dot-where can also be used for combining data as well as filtering arrays: it can pull different elements into a new array based on whether a condition is met. More on that later!

8. Filtering with np.where()

Using np-dot-where in the classroom example returns indices indicating that the classrooms at indices zero and three have even numbers of students. Notice the array of indices is enclosed in parentheses: the np-dot-where function actually returns a tuple of arrays. Why? Because when the filtered array is multi-dimensional, each element can only be located by including an index for every dimension.

9. np.where() element retrieval

Let's look at using np-dot-where to return the indices of zeros in our sudoku game.

10. A tuple of indices

Now, np-dot-where returns two sets of indices - one for row indices and one for column indices - since identifying each individual zero requires both a row and column index. Because of this, it's helpful to unpack the results of np-dot-where into different variables.

11. Find and replace

The real power of np-dot-where is its ability to check whether rows, columns, or elements meet a condition and then pull one element if the condition is met and another if not. To replace all zeros in sudoku_game with empty strings, pass an empty string as the second argument to np-dot-where. The third argument specifies how to change the element if it does not meet the condition. Here, we want non-zero elements to remain unchanged, so we pass the original array to signify that.

12. Let's practice!

Okay, let's get coding!