1. 学ぶ
  2. /
  3. コース
  4. /
  5. Introduction to PySpark

Connected

演習

Loading census data

Let's start creating your first PySpark DataFrame! The file adult_reduced.csv contains a grouping of adults based on a variety of demographic categories. These data have been adapted from the US Census. There are a total of 32562 groupings of adults.

We should load the csv and see the resulting schema.

Data dictionary:

Variable Description
age Individual age
education_num Education by degree
marital_status Marital status
occupation Occupation
income Categorical income

指示

100 XP
  • Create a PySpark DataFrame from the"adult_reduced.csv" file using the spark.read.csv() method.
  • Show the resulting DataFrame.