Loading census data
Let's start creating your first PySpark DataFrame! The file adult_reduced.csv contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.
We should load the csv and see the resulting schema.
Data dictionary:
| Variable | Description |
|---|---|
| age | Individual age |
| education_num | Education by degree |
| marital_status | Marital status |
| occupation | Occupation |
| income | Categorical income |
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Create a PySpark DataFrame from the
"adult_reduced.csv"file using thespark.read.csv()method. - Show the resulting DataFrame.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Read in the CSV
census_adult = ____.____.____(____)
# Show the DataFrame
census_adult.____