Get startedGet started for free

Load in the data

Reading in data is the first step to using PySpark for data science! Let's leverage the new industry standard of parquet files!

This exercise is part of the course

Feature Engineering with PySpark

View Course

Exercise instructions

  • Use the parquet() file reader to read in 'Real_Estate.parq' as described in the video exercise.
  • Print out the list of columns with columns.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Read the file into a dataframe
df = spark.read.____(____)
# Print columns in dataframe
____(df.____)
Edit and Run Code