MulaiMulai sekarang secara gratis

Train/test split

To objectively assess a Machine Learning model you need to be able to test it on an independent set of data. You can't use the same data that you used to train the model: of course the model will perform (relatively) well on those data!

You will split the data into two components:

  • training data (used to train the model) and
  • testing data (used to test the model).

Note: From here on you'll be working with a smaller subset of the flights data, which just makes the exercises run more quickly.

Latihan ini adalah bagian dari kursus

Machine Learning with PySpark

Lihat Kursus

Petunjuk latihan

  • Randomly split the flights data into two sets with 80:20 proportions. For repeatability set a random number seed of 43 for the split.
  • Check that the training data has roughly 80% of the records from the original data.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Split into training and testing sets in a 80:20 ratio
flights_train, flights_test = flights.____(____, ____)

# Check that training set has around 80% of records
training_ratio = flights_train.____() / ____.____()
print(training_ratio)
Edit dan Jalankan Kode