Filtering data
1. Filtering data
Restricting analyses to subsets of your data is important for many data analysis tasks. In this lesson, we are going to learn how to combine comparison operators with boolean indexing to filter data in NumPy arrays and DataFrames.2. Comparison operators and NumPy arrays
You've learned how to use comparison operators on singular values, but, like in MATLAB, Python's comparison operators can also be used to compare multiple values in NumPy arrays and pandas DataFrames. You can compare all elements in an array against a single value or compare two arrays against one another. In this example, "data" is an array with four elements. When we use the greater than operator to compare this array with the "threshold" value, each element is compared against this value and the output is assigned to the "meets_criteria" variable, which is a NumPy array of boolean values where the elements are True or False depending on whether their corresponding element in the "data" array is greater than the "threshold" variable.3. Filtering NumPy arrays
This can be very useful to filter out data that does not meet your criteria for analysis. In this example, we expect all data to be positive, and our data source has indicated invalid samples with a "-1" value. We can use a comparison operator to create a boolean array "is_valid" which has "True" values for elements of the array with valid data and "False" values otherwise. The array "is_valid" can now be used as a boolean index to create a new array which only contains valid data, ready for further analysis.4. Filtering DataFrames
Similarly, you can use comparison operators to create boolean indices for pandas Dataframes, as well. This can be especially convenient to filter values from one column based on the values from another. In this example, we are using the "equal to" operator to create the boolean indices "monkeys" and "bears," each of which indicate rows where the "animal" column equals "monkey" or "bear." These arrays can then be used to filter the DataFrame and get the average weight of monkeys and bears in this dataset.5. Let's practice
Now that you've learned how comparison operators work with NumPy arrays and DataFrames let's practice filtering data.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.