Create a Data Context
1. Create a Data Context
Hello, and welcome to this introductory course on data quality with Great Expectations! My name is Davina, and I'll be your instructor throughout this course. I'm a data scientist by trade, having worked at companies like Allstate and American Family Insurance. Now, I have my own startup, where I work as the CTO. I use Great Expectations regularly as a data scientist, and I'm excited to share some of my knowledge with you.2. What is data quality?
In this course, we'll be dealing with data quality. Data quality is a measure of how fit a set of data is for its intended purpose. This includes completeness (which means little to no missing values), accuracy (which means that the values themselves are correct), and many other metrics.3. Why is data quality important?
Why is data quality important? Well, if we put garbage into a model, then we'll get garbage out. No matter how advanced or well-trained a model is, if the quality of the input data is poor, then the quality of the model will ultimately be poor, too. A model can only be as good as the data going in! That's why data quality is so important -- it affects everything downstream of it.4. What is Great Expectations?
Great Expectations (abbreviated as GX) is simply a platform for managing data quality. Specifically, it's a framework for describing data using expressive tests and then validating that the data meets the test criteria. It's available as a web-based UI, called GX Cloud, and a Python package, GX Core. In this course, we'll be using GX Core integrated with Python, allowing us to leverage the power and flexibility of Python for data-quality tasks.5. Expectations
Great Expectations revolves around Expectations — explicit assumptions about our data that can be verified. What do we expect of our data? Think intuitively: What should the dataset's shape be? Should any columns have nulls or duplicates? What value ranges, string formats, or distributions do we expect? Are there potential quality issues? Each of these defines an Expectation.6. Data Contexts
Before establishing Expectations, we need to create a Data Context. This is the main entry point for using Great Expectations, similar to a SQL context, which manages and executes SQL queries. Data Contexts provide an API to access and update GX projects. They define the storage location for metadata associated with components like Data Sources, Expectation Suites, Checkpoints, and Data Docs, and contain outputs such as Validation Results and associated metrics. We'll explore all of these components in this course, but for now, remember that creating a Data Context is the first step to connecting to GX and establishing Expectations. Now, let's create one in Python!7. Importing GX
First, after installing the Great Expectations Python package, we can import it in Python with the alias `gx`.8. Creating a Data Context
From there, creating a Data Context is pretty straightforward. We use the `gx` module's 'get_context()' function to create our Context and assign it to a variable called `context`. And that's it! Printing out the Context object also outputs its associated configurations and metadata. With our Data Context in hand, we're ready to take off into the world of GX.9. Let's practice!
Now it's your turn. Time to practice creating your own Data Context in Great Expectations.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.