Get startedGet started for free

Split the data

Now that you've done all your manipulations, the last step before modeling is to split the data!

This exercise is part of the course

Foundations of PySpark

View Course

Exercise instructions

  • Use the DataFrame method .randomSplit() to split piped_data into two pieces, training with 60% of the data, and test with 40% of the data by passing the list [.6, .4] to the .randomSplit() method.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split the data into training and test sets
training, test = piped_data.randomSplit(____)
Edit and Run Code