Exercise

Part 1: Create a DataFrame from CSV file

Every 4 years, the soccer fans throughout the world celebrates a festival called “Fifa World Cup” and with that, everything seems to change in many countries. In this 3 part exercise, you'll be doing some exploratory data analysis (EDA) on the "FIFA 2018 World Cup Player" dataset using PySpark SQL which involve DataFrame operations, SQL queries and visualization.

In the first part, you'll load FIFA 2018 World Cup Players dataset (Fifa2018_dataset.csv) which is in CSV format into a PySpark's dataFrame and inspect the data using basic DataFrame operations.

Remember, you already have a SparkSession spark and a variable file_path available in your workspace.

Instructions

100 XP
  • Create a PySpark DataFrame from file_path (which is the path to the Fifa2018_dataset.csv file).
  • Print the schema of the DataFrame.
  • Print the first 10 observations.
  • How many rows are in there in the DataFrame?