Get startedGet started for free

Split and explode a text column

A dataframe clauses_df with 100 rows is provided. It has a column clause and a row id. Each clause is a string containing one or more words separated by spaces.

This exercise is part of the course

Introduction to Spark SQL in Python

View Course

Exercise instructions

  • Split the clause column into a column called words, containing an array of individual words.
  • Explode the words column into a column called word.
  • Count the resulting number of rows.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split the clause column into a column called words 
split_df = clauses_df.select(____('clause', ' ').____('words'))
split_df.show(5, truncate=False)

# Explode the words column into a column called word 
exploded_df = split_df.____(____('____').____('word'))
exploded_df.show(10)

# Count the resulting number of rows in exploded_df
print("\nNumber of rows: ", ____)
Edit and Run Code