Get startedGet started for free

Conditional Expectations

1. Conditional Expectations

So far, we've learned how to implement a few different types of Expectations. Now let's learn about a special class called Conditional Expectations.

2. What are Conditional Expectations?

Conditional Expectations are Expectations for a subset of the data, rather than the entire dataset. These are useful when some variables in the dataset depend on the values of other variables. For example, the Shein Footwear Dataset has a column for the star rating of a product, from 0 to 5, which is 0 if the product has no reviews. Therefore, we might have an Expectation that the value of the `star_rating` column should be 0 if the value of the `review_count` column is 0.

3. Syntax for Conditional Expectations

A dataset Expectation can be converted into a Conditional Expectation with two additional arguments: `row_condition`, which defines the subset of data to which to apply the Conditional Expectation, and `condition_parser`, which defines the syntax of the row condition. Let's take a closer look at these two arguments.

4. The condition parser

We'll start with the condition parser. When implementing Conditional Expectations with Pandas, the `condition_parser` argument must be set to "pandas". Since we're using Pandas as our only execution engine in this course, we can just keep "pandas" as the value for this argument.

5. The row condition

For the `row_condition` argument, we implement it similarly to how we would subset data in Pandas. For instance, our row condition could be that the value of a column 'foo' must be 'Two Two'. We would simply write that as "foo equals quote Two Two" with a double equal sign. The row condition could also be that the column 'foo' must not be null, or must be earlier than March 13, 2023. We can also use the boolean "and" and "or" operations like we would in Pandas, but we have some more flexibility: as you can see here, we can write out "and" instead of using the ampersand, and we don't need the parentheses like we do in Pandas. We can implement vectorized string functions similarly to how we would in Pandas, too.

6. The row condition

There are two additional rules we need to follow for the row condition. First, we can't use single quotes inside the row condition -- we should only use double quotes. Second, we can't use line breaks inside the row condition -- all row condition strings should be written on a single line.

7. Example Expectation: star rating

Let's go back to our Shein Footwear Dataset and take a look at an example Conditional Expectation. We can set an Expectation for product price to be less than $10, which would fail if applied to the whole dataset. But if we condition the Expectation on only products with a mark price of less than $10, then it succeeds. This is just one example of how conditional Expectations can be used to augment the power of Expectations in GX.

8. Let's practice!

Let's try out some more examples with a few exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.