Get startedGet started for free

Connect to Data

1. Connect to Data

Now that we've learned how to get started with Great Expectations, let's explore how to connect to our data. In this video, we'll focus on two main GX components: Data Sources and Data Assets.

2. Components

First, let's talk about components. In GX Core, components are Python classes representing our data and data validation entities. GX components include the Data Context, which we learned about in the previous video; Data Sources and Data Assets, which we'll discuss now; Batch Definitions and Batches; of course, the Expectations themselves; and others we'll cover throughout the course.

3. Data Sources

A Data Source is one core component that tells GX how to connect to a specific source of external data. It provides a standard API for accessing and interacting with data from various source systems, such as SQL, Spark, or pandas. Creating a Data Source is the first step for connecting to data in Great Expectations.

4. Data Sources

In this course, we'll be focusing on pandas Data Sources.

5. Creating a Data Source

We manage our Data Sources through the Data Context's `.data_sources` attribute. To create a pandas Data Source, we call the attribute's `.add_pandas()` method, passing in whatever name we wish to give our Data Source as a parameter. It's important to note that the `name` parameter of the Data Source we assign inside the method is different from the name of the Python variable we assign the Data Source to. In this case, we assigned the Python object to a variable called `data_source`, while the GX `name` is `"my_pandas_data_source"`. This applies to any other GX objects that take a `name` parameter, which we'll see throughout the course.

6. Data Assets

Data Sources are comprised of Data Assets, which are collections of records within a Data Source -- usually grouped based on the underlying data system (in this case, pandas). You can think of Data Sources like databases, and Data Assets like tables in that database. There are several ways to create a Data Asset. To create a Data Asset that can load in-memory pandas DataFrames, we can use the Data Source's `.add_dataframe_asset()` method, assigning the desired name of the Data Asset to the `name` parameter. Again, notice the difference between the Python variable `data_asset` and the GX `name` `"my_dataframe_asset"`.

7. Cheat sheet

To summarize, GX comprises components, which are Python classes that represent our data and data validation entities. Data Sources are one component that provides a standard API for accessing and interacting with data from a variety of source systems, such as pandas. Data Sources are comprised of Data Assets, which are collections of records that can read data from a pandas DataFrame. Refer back to this overview during the following exercises.

8. Let's practice!

Now it's your turn to create your own Data Sources and Data Assets in Great Expectations.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.