Loading census data
Let's start creating your first PySpark DataFrame! The file adult_reduced.csv
contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.
We should load the csv and see the resulting schema.
Data dictionary:
Variable | Description |
---|---|
age | Individual age |
education_num | Education by degree |
marital_status | Marital status |
occupation | Occupation |
income | Categorical income |
This exercise is part of the course
Introduction to PySpark
Exercise instructions
- Create a PySpark DataFrame from the
"adult_reduced.csv"
file using thespark.read.csv()
method. - Show the resulting DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Read in the CSV
census_adult = ____.____.____(____)
# Show the DataFrame
census_adult.____