Aan de slagGa gratis aan de slag

Defining a schema

Creating a defined schema helps with data quality and import performance. As mentioned during the lesson, we'll create a simple schema to read in the following columns:

  • Name
  • Age
  • City

The Name and City columns are StringType() and the Age column is an IntegerType().

Deze oefening maakt deel uit van de cursus

Cleaning Data with PySpark

Cursus bekijken

Oefeninstructies

  • Import * from the pyspark.sql.types library.
  • Define a new schema using the StructType method.
  • Define a StructField for name, age, and city. Each field should correspond to the correct datatype and not be nullable.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Import the pyspark.sql.types library
____

# Define a new schema using the StructType method
people_schema = ____([
  # Define a StructField for each field
  StructField('name', ____, False),
  ____('____', IntegerType(), ____)
  ____
])
Code bewerken en uitvoeren