Basic Column Expectations
1. Basic Column Expectations
In this final chapter, we'll be exploring different types of column-level Expectations, which apply to specific columns. This is in contrast to the shape and schema Expectations we learned about previously, which concern the dataset as a whole.2. The Shein Footwear Dataset
We'll be using Kaggle's Shein Footwear Dataset, which contains footwear price data scraped from the SHEIN US website.3. Row-level Expectations
Some column Expectations are row-level, meaning that they are applied to each row of the column independently. The Expectation succeeds if and only if it holds true for each row of the column.4. Row-level Expectations
For example, an Expectation that the values of the `"colour"` column not be null would succeed if and only if each row of that column is non-null. Similarly, an Expectation that the values of the `"review_count"` column be of the string data type would succeed if and only if each row of that column contains a string value. Notice that both of these Expectations require the `column` parameter, which specifies which column to assess. The type Expectation also has another required parameter, `type_`, for comparing to the column values.5. Aggregate-level Expectations: distinct values
Some other column Expectations operate at the aggregate level, meaning that they perform some aggregation on the entire column before assessing the success status of the Expectation. For instance, take an Expectation that the distinct values of the `"seller_name"` column must match a given set of values. This Expectation will first aggregate all of the distinct values in the column into a set, and then compare that set to the provided `value_set` parameter.6. Aggregate-level Expectations: unique value count
For another column, such as `"review_count"`, we may not know the expected distinct values of the column, but instead we may know the total number (or range of numbers) of distinct values the column should contain. This Expectation will again aggregate all of the distinct values in the column into a set, and then compare the length of that set to the provided value or value range.7. Aggregate-level Expectations: uniqueness
Another potential Expectation is that all of the values of a column be unique. For example, the SKU, or Stock Keeping Unit, ID is a unique product identifier, so such an Expectation would be appropriate for the `"sku_id"` column. This Expectation will first aggregate all of the values of the column and then ensure that they are all unique.8. Aggregate-level Expectations: mode
Finally, we may know that the most common value of a column should be equal to a particular value or within a particular set of values. We could write an Expectation, which first extracts the most common value of the column, and then compares it to a given value or value set. Notice how the aggregate-level column Expectations, just like the row-level ones, all require the `column` parameter, and some of them may also have other required parameters.9. Cheat sheet
Here is a list of the row- and aggregate-level Expectations we discussed in this video. Feel free to use this slide as you work through the exercises.10. Let's practice!
Now it's your turn to practice establishing some basic column Expectations!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.