Loading census data
Let's start creating your first PySpark DataFrame! The file adult_reduced.csv
contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.
We should load the csv and see the resulting schema.
Data dictionary:
Variable | Description |
---|---|
age | Individual age |
education_num | Education by degree |
marital_status | Marital status |
occupation | Occupation |
income | Categorical income |
Cet exercice fait partie du cours
Introduction to PySpark
Instructions
- Create a PySpark DataFrame from the
"adult_reduced.csv"
file using thespark.read.csv()
method. - Show the resulting DataFrame.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Read in the CSV
census_adult = ____.____.____(____)
# Show the DataFrame
census_adult.____