1. Schema Expectations
Now that we've learned how to create and test Expectations, let's explore some other types of Expectations.
2. Shape and schema Expectations
Here, we'll focus on Expectations at the dataset level, regarding the shape or schema of the dataset. As we discussed, the schema is the blueprint of a dataset, including column names and data types. Later in the course, we'll dive into column-level Expectations.
3. Row count
Recall that in the last video we created an Expectation for the row count of the dataset. The Expectation failed because the table's row count is not exactly equal to 118,000, although it is close. We usually have an idea of how many rows are in a dataset we're working with, but it's unlikely that we'll know the exact row count.
4. Row count range
Suppose instead we say that the row count should be near 118,000, plus or minus 1,000. We can write this as an Expectation using the `ExpectTableRowCountToBeBetween` class of the `gx.expectations` submodule, passing in integer values for the `min_value` and `max_value` parameters. Now the `.success` attribute returns a value of `True`.
5. Column count
As we saw in the last couple of exercises, we can create similar Expectations for the column count, just replacing `Row` with `Column` in the class names. In this example, we expect that the column count of the dataset will be equal to 15. We see that the Expectation fails, because the actual column count is 18.
6. Column count range
As with row count, we can create an Expectation for the column count to be within a specific range. Note that all range Expectations have the required `min_value` and `max_value` parameters. Here, we expect the dataset to have between 14 and 18 columns. Looking at the Validation Results, we see that this Expectation succeeds. Also note that the range is inclusive.
7. Column name sets
As far as column Expectations go, we can create Expectations not just about the column count, but also about the column names. For instance, we can expect that the DataFrame columns be equal to a particular set, using the `ExpectTableColumnsToMatchSet` class of the `gx.expectations` submodule. Note that because we are comparing to a set, 1) the order of the columns doesn't matter, and 2) duplicate column names are not an issue (such as the `'Time'` column name).
8. Individual column names
We can also create an Expectation that a particular column name be present in the dataset columns, using the `ExpectColumnToExist` class. The dataset does not have a column called "not_a_column", so this Expectation returns `False`, but it does have a column called "GHI", so this Expectation returns `True`.
Take note that this Expectation is for checking that the dataset has one or a few crucial columns, whereas the Expectation on the previous slide, `ExpectTableColumnsToMatchSet`, is for when we want to check all of the columns in the dataset.
9. Cheat sheet
Here is a list of the Expectations we've learned about so far. You can refer back to this as you work on the exercises.
10. Let's practice!
By now, you've got several Expectations under your belt. Time to practice!