Get startedGet started for free

Dropping the middle man

Now you know how to put data into Spark via pandas, but you're probably wondering why deal with pandas at all? Wouldn't it be easier to just read a text file straight into Spark? Of course it would!

Luckily, your SparkSession has a .read attribute which has several methods for reading different data sources into Spark DataFrames. Using these you can create a DataFrame from a .csv file just like with regular pandas DataFrames!

The variable file_path is a string with the path to the file airports.csv. This file contains information about different airports all over the world.

A SparkSession named spark is available in your workspace.

This exercise is part of the course

Foundations of PySpark

View Course

Exercise instructions

  • Use the .read.csv() method to create a Spark DataFrame called airports
    • The first argument is file_path
    • Pass the argument header=True so that Spark knows to take the column names from the first line of the file.
  • Print out this DataFrame by calling .show().

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Don't change this file path
file_path = "/usr/local/share/datasets/airports.csv"

# Read in the airports data
airports = ____.____.____(____, ____=____)

# Show the data
____.____()
Edit and Run Code