BaşlayınÜcretsiz Başlayın

Part 3: Data visualization

Data visualization is important for exploratory data analysis (EDA). PySpark DataFrame is perfect for data visualization compared to RDDs because of its inherent structure and schema.

In this third part, you'll create a histogram of the ages of all the players from Germany from the DataFrame that you created in the previous exercise. For this, you'll first convert the PySpark DataFrame into Pandas DataFrame and use matplotlib's plot() function to create a density plot of ages of all players from Germany.

Remember, you already have a SparkSession spark, a temporary table fifa_df_table and a DataFrame fifa_df_germany_age available in your workspace.

Bu egzersiz

Big Data Fundamentals with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Convert fifa_df_germany_age to fifa_df_germany_age_pandas Pandas DataFrame.
  • Generate a density plot of the 'Age' column from the fifa_df_germany_age_pandas Pandas DataFrame.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Convert fifa_df to fifa_df_germany_age_pandas DataFrame
fifa_df_germany_age_pandas = fifa_df_germany_age.____()

# Plot the 'Age' density of Germany Players
____.plot(kind='density')
plt.show()
Kodu Düzenle ve Çalıştır