Split the data
Now that you've done all your manipulations, the last step before modeling is to split the data!
This exercise is part of the course
Foundations of PySpark
Exercise instructions
- Use the DataFrame method
.randomSplit()
to splitpiped_data
into two pieces,training
with 60% of the data, andtest
with 40% of the data by passing the list[.6, .4]
to the.randomSplit()
method.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Split the data into training and test sets
training, test = piped_data.randomSplit(____)