Sneak Peek into GX
Nice job with creating your Data Context! This is the powerful first step into the world of Great Expectations. Let's take a sneak peek at all of the cool things you'll be able to do by the end of the course.
The code on the right uses the Data Context to create a pandas Data Source and Data Asset, which define the format of the data. Then, it creates a Batch Definition to read in the data. Finally, it creates an Expectation Suite, which contains an Expectation, and a Validation Definition, which runs the Expectation Suite against the Batch of data. Don't worry that you don't understand these terms right now -- it'll all be clear by the end of the course!
Great Expectations has already been imported for you as gx
.
This exercise is part of the course
Introduction to Data Quality with Great Expectations
Exercise instructions
- Press
Run Code
to see the code output.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create Data Context
context = gx.get_context()
# Create pandas Data Source, Data Asset, and Batch Definition
data_source = context.data_sources.add_pandas(
name="my_pandas_datasource"
)
data_asset = data_source.add_dataframe_asset(
name="my_data_asset"
)
batch_definition = data_asset.add_batch_definition_whole_dataframe(
name="my_batch_definition"
)
batch = batch_definition.get_batch(
batch_parameters={"dataframe": dataframe}
)
# Create Expectation Suite and Validation Definition
suite = context.suites.add(
gx.ExpectationSuite(name="my_suite", suite_parameters={})
)
validation_definition = gx.ValidationDefinition(
data=batch_definition, suite=suite, name="validation"
)
# Establish and evaluate an Expectation
expectation = gx.expectations.ExpectTableRowCountToBeBetween(
min_value=50000, max_value=100000
)
validation_results = batch.validate(expect=expectation)
print(validation_results.success)