Building a data pipeline
Using factory methods not only makes code easier to read, it also makes it easier to use. In this example, you'll get to practice creating a data pipeline that extracts data from a database. The DataPipeline
class implements a factory method design pattern, and is shown here. Also defined for you are two concrete products of the Database
class; Postgres
and Redshift
.
class DataPipeline:
def _get_database(self, provider):
if provider == "Postgres":
return Postgres()
elif provider == "Redshift":
return Redshift()
def extract_data(self, provider, query):
database = self._get_database(provider)
dataset = database.query_data(query)
print(f"Extracted dataset from {provider} database")
return dataset
This exercise is part of the course
Intermediate Object-Oriented Programming in Python
Exercise instructions
- Create an
items_pipeline
using theDataPipeline
class, extract a dataset from a"Redshift"
database with the querySELECT * FROM items;
. - Update the
items_pipeline
to pull from a"Postgres"
database instead, using the same query as before. - Create an
etl_pipeline
that extracts data from"Redshift"
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create an ETL DataPipeline, query using Redshift
items_pipeline = ____()
____.extract_data("____", "SELECT * FROM items;")
# Now, switch the pipeline to Postgres
____
# Finally, create an etl_pipeline with Redshift
____ = ____()
____.____("____", "SELECT * FROM sales;")