PySpark DataFrame visualization
Graphical representations or visualization of data is imperative for understanding as well as interpreting the data. In this simple data visualization exercise, you'll first print the column names of names_df
DataFrame that you have created earlier, then convert the names_df
to Pandas DataFrame, and finally plot the contents as horizontal bar plot with names of the people on the x-axis and their age on the y-axis.
Remember, you already have a SparkSession spark
and a DataFrame names_df
available in your workspace.
This exercise is part of the course
Big Data Fundamentals with PySpark
Exercise instructions
- Print the names of the columns in
names_df
DataFrame. - Convert
names_df
DataFrame todf_pandas
Pandas DataFrame. - Use matplotlib's
plot()
method to create a horizontal bar plot with'Name'
on x-axis and'Age'
on y-axis.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Check the column names of names_df
print("The column names of names_df are", names_df.____)
# Convert to Pandas DataFrame
df_pandas = names_df.____()
# Create a horizontal bar plot
____.plot(kind='barh', x='____', y='____', colormap='winter_r')
plt.show()