Loading census data
Let's start creating your first PySpark DataFrame! The file adult_reduced.csv contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.
We should load the csv and see the resulting schema.
Data dictionary:
| Variable | Description |
|---|---|
| age | Individual age |
| education_num | Education by degree |
| marital_status | Marital status |
| occupation | Occupation |
| income | Categorical income |
Este ejercicio forma parte del curso
Introduction to PySpark
Instrucciones del ejercicio
- Create a PySpark DataFrame from the
"adult_reduced.csv"file using thespark.read.csv()method. - Show the resulting DataFrame.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Read in the CSV
census_adult = ____.____.____(____)
# Show the DataFrame
census_adult.____