1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to PySpark

Connected

Exercise

Schema writeout

We've loaded Schemas multiple ways now. So lets define a schema directly. We'll use a Data dictionary:

Variable Description
age Individual age
education_num Education by degree
marital_status Marital status
occupation Occupation
income Categorical income

Instructions

100 XP
  • Specify the data schema, giving columns names (age,education_num,marital_status,occupation, and income) and column types, setting a comma for the sep= argument.
  • Read data from a comma-delimited file called adult_reduced_100.csv.
  • Print the schema for the resulting DataFrame.