IniziaInizia gratis

Dropping the middle man

Now you know how to put data into Spark via pandas, but you're probably wondering why deal with pandas at all? Wouldn't it be easier to just read a text file straight into Spark? Of course it would!

Luckily, your SparkSession has a .read attribute which has several methods for reading different data sources into Spark DataFrames. Using these you can create a DataFrame from a .csv file just like with regular pandas DataFrames!

The variable file_path is a string with the path to the file airports.csv. This file contains information about different airports all over the world.

A SparkSession named spark is available in your workspace.

Questo esercizio fa parte del corso

Foundations of PySpark

Visualizza il corso

Istruzioni dell'esercizio

  • Use the .read.csv() method to create a Spark DataFrame called airports
    • The first argument is file_path
    • Pass the argument header=True so that Spark knows to take the column names from the first line of the file.
  • Print out this DataFrame by calling .show().

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Don't change this file path
file_path = "/usr/local/share/datasets/airports.csv"

# Read in the airports data
airports = ____.____.____(____, ____=____)

# Show the data
____.____()
Modifica ed esegui il codice