Extracting data from parquet files
One of the most common ways to ingest data from a source system is by reading data from a file, such as a CSV file. As data has gotten bigger, the need for better file formats has brought about new column-oriented file types, such as parquet files.
In this exercise, you'll practice extracting data from a parquet file.
This exercise is part of the course
ETL and ELT in Python
Exercise instructions
- Read the parquet file at the path
"sales_data.parquet"
into apandas
DataFrame. - Check the data types of the DataFrame via
print()
ing. - Output the shape of the DataFrame, as well as it's head.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
import pandas as pd
# Read the sales data into a DataFrame
sales_data = pd.____("____", engine="fastparquet")
# Check the data type of the columns of the DataFrames
print(sales_data.____)
# Print the shape of the DataFrame, as well as the head
print(sales_data.____)
print(sales_data.____())