Get Started

The database schema

By now, you know that SQL databases always have a database schema. In the video on databases, you saw the following diagram:

Database Schema for Customer and Order

A PostgreSQL database is set up in your local environment, which contains this database schema. It's been filled with some example data. You can use pandas to query the database using the read_sql() function. You'll have to pass it a database engine, which has been defined for you and is called db_engine.

The pandas package imported as pd will store the query result into a DataFrame object, so you can use any DataFrame functionality on it after fetching the results from the database.

This is a part of the course

“Introduction to Data Engineering”

View Course

Exercise instructions

  • Complete the SELECT statement so it selects the first_name and the last_name in the "Customer" table. Make sure to order by the last name first and the first name second.
  • Use the .head() method to show the first 3 rows of data.
  • Use .info() to show some general information about data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Complete the SELECT statement
data = pd.read_sql("""
SELECT first_name, ____ FROM "____"
ORDER BY ____, ____
""", db_engine)

# Show the first 3 rows of the DataFrame
print(data.head(____))

# Show the info of the DataFrame
print(data.____())
Edit and Run Code

This exercise is part of the course

Introduction to Data Engineering

IntermediateSkill Level
4.5+
28 reviews

Learn about the world of data engineering in this short course, covering tools and topics like ETL and cloud computing.

Now that you know the primary differences between a data engineer and a data scientist, get ready to explore the data engineer's toolbox! Learn in detail about different types of databases data engineers use, how parallel computing is a cornerstone of the data engineer's toolkit, and how to schedule data processing jobs using scheduling frameworks.

Exercise 1: DatabasesExercise 2: SQL vs NoSQLExercise 3: The database schema
Exercise 4: Joining on relationsExercise 5: Star schema diagramExercise 6: What is parallel computingExercise 7: Why parallel computing?Exercise 8: From task to subtasksExercise 9: Using a DataFrameExercise 10: Parallel computation frameworksExercise 11: Spark, Hadoop and HiveExercise 12: A PySpark groupbyExercise 13: Running PySpark filesExercise 14: Workflow scheduling frameworksExercise 15: Airflow, Luigi and cronExercise 16: Airflow DAGs

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free