Get startedGet started for free

PySpark DataFrame visualization

Graphical representations or visualization of data is imperative for understanding as well as interpreting the data. In this simple data visualization exercise, you'll first print the column names of names_df DataFrame that you have created earlier, then convert the names_df to Pandas DataFrame, and finally plot the contents as horizontal bar plot with names of the people on the x-axis and their age on the y-axis.

Remember, you already have a SparkSession spark and a DataFrame names_df available in your workspace.

This exercise is part of the course

Big Data Fundamentals with PySpark

View Course

Exercise instructions

  • Print the names of the columns in names_df DataFrame.
  • Convert names_df DataFrame to df_pandas Pandas DataFrame.
  • Use matplotlib's plot() method to create a horizontal bar plot with 'Name' on x-axis and 'Age' on y-axis.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Check the column names of names_df
print("The column names of names_df are", names_df.____)

# Convert to Pandas DataFrame  
df_pandas = names_df.____()

# Create a horizontal bar plot
____.plot(kind='barh', x='____', y='____', colormap='winter_r')
plt.show()
Edit and Run Code