PySpark DataFrame visualization
Graphical representations or visualization of data is imperative for understanding as well as interpreting the data. In this simple data visualization exercise, you'll first print the column names of names_df DataFrame that you have created earlier, then convert the names_df to Pandas DataFrame, and finally plot the contents as horizontal bar plot with names of the people on the x-axis and their age on the y-axis.
Remember, you already have a SparkSession spark and a DataFrame names_df available in your workspace.
Bu egzersiz
Big Data Fundamentals with PySpark
kursunun bir parçasıdırEgzersiz talimatları
- Print the names of the columns in
names_dfDataFrame. - Convert
names_dfDataFrame todf_pandasPandas DataFrame. - Use matplotlib's
plot()method to create a horizontal bar plot with'Name'on x-axis and'Age'on y-axis.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Check the column names of names_df
print("The column names of names_df are", names_df.____)
# Convert to Pandas DataFrame
df_pandas = names_df.____()
# Create a horizontal bar plot
____.plot(kind='barh', x='____', y='____', colormap='winter_r')
plt.show()