Split the data
Now that you've done all your manipulations, the last step before modeling is to split the data!
This exercise is part of the course
Foundations of PySpark
Exercise instructions
- Use the DataFrame method
.randomSplit()to splitpiped_datainto two pieces,trainingwith 60% of the data, andtestwith 40% of the data by passing the list[.6, .4]to the.randomSplit()method.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split the data into training and test sets
training, test = piped_data.randomSplit(____)