Inspecting data in PySpark DataFrame
Inspecting data is very crucial before performing analysis such as plotting, modeling, training etc. In this simple exercise, you'll inspect the data in the people_df
DataFrame that you have created in the previous exercise using basic DataFrame operators.
Remember, you already have a SparkSession spark
and a DataFrame people_df
available in your workspace.
This exercise is part of the course
Big Data Fundamentals with PySpark
Exercise instructions
- Print the first 10 observations in the
people_df
DataFrame. - Count the number of rows in the
people_df
DataFrame. - How many columns does
people_df
DataFrame have and what are their names?
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the first 10 observations
people_df.____(10)
# Count the number of rows
print("There are {} rows in the people_df DataFrame.".format(people_df.____()))
# Count the number of columns and print their names
print("There are {} columns in the people_df DataFrame and their names are {}".format(len(people_df.____), people_df.____))