MulaiMulai sekarang secara gratis

Split the data

Now that you've done all your manipulations, the last step before modeling is to split the data!

Latihan ini adalah bagian dari kursus

Foundations of PySpark

Lihat Kursus

Petunjuk latihan

  • Use the DataFrame method .randomSplit() to split piped_data into two pieces, training with 60% of the data, and test with 40% of the data by passing the list [.6, .4] to the .randomSplit() method.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Split the data into training and test sets
training, test = piped_data.randomSplit(____)
Edit dan Jalankan Kode