Get startedGet started for free

Dog parsing

You've done a considerable amount of cleanup on the initial dataset, but now need to analyze the data a bit deeper. There are several questions that have now come up about the type of dogs seen in an image and some details regarding the images. You realize that to answer these questions, you need to process the data into a specific type. Before you can use it, you'll need to create a schema / type to represent the dog details.

The joined_df DataFrame is as you last defined it, and the pyspark.sql.types have all been imported.

This exercise is part of the course

Cleaning Data with PySpark

View Course

Exercise instructions

  • Select the column representing the dog details from the DataFrame and show the first 10 un-truncated rows.
  • Create a new schema as you've done before, using breed, start_x, start_y, end_x, and end_y as the names. Make sure to specify the proper data types for each field in the schema (any number value is an integer).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Select the dog details and show 10 untruncated rows
print(joined_df.____.show(____, truncate=____))

# Define a schema type for the details in the dog list
DogType = ____([
	StructField("breed", ____, False),
    StructField("start_x", ____, False),
    ____,
    ____,
    ____
])
Edit and Run Code