Get startedGet started for free

Loading census data

Let's start creating your first PySpark DataFrame! The file adult_reduced.csv contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.

We should load the csv and see the resulting schema.

Data dictionary:

Variable Description
age Individual age
education_num Education by degree
marital_status Marital status
occupation Occupation
income Categorical income

This exercise is part of the course

Introduction to PySpark

View Course

Exercise instructions

  • Create a PySpark DataFrame from the"adult_reduced.csv" file using the spark.read.csv() method.
  • Show the resulting DataFrame.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Read in the CSV
census_adult = ____.____.____(____)

# Show the DataFrame
census_adult.____
Edit and Run Code