Split the data
Now that you've done all your manipulations, the last step before modeling is to split the data!
Latihan ini adalah bagian dari kursus
Foundations of PySpark
Petunjuk latihan
- Use the DataFrame method
.randomSplit()to splitpiped_datainto two pieces,trainingwith 60% of the data, andtestwith 40% of the data by passing the list[.6, .4]to the.randomSplit()method.
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# Split the data into training and test sets
training, test = piped_data.randomSplit(____)