Loading census data
Let's start creating your first PySpark DataFrame! The file adult_reduced.csv contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.
We should load the csv and see the resulting schema.
Data dictionary:
| Variable | Description |
|---|---|
| age | Individual age |
| education_num | Education by degree |
| marital_status | Marital status |
| occupation | Occupation |
| income | Categorical income |
This exercise is part of the course
Introduction to PySpark
Exercise instructions
- Create a PySpark DataFrame from the
"adult_reduced.csv"file using thespark.read.csv()method. - Show the resulting DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Read in the CSV
census_adult = ____.____.____(____)
# Show the DataFrame
census_adult.____