Get startedGet started for free

Split data to training and testing

You are now ready to build an end-to-end machine learning model by following a few simple steps! You will explore modeling nuances in much more detail in the next chapters, but for now you will practice and understand the key steps.

The independent features have been loaded for you as a pandas DataFrame named X, and the dependent values as a pandas Series named Y.

Also, the train_test_split function has been loaded from the sklearn library. You will now create training and testing datasets, and then make sure the data was correctly split.

This exercise is part of the course

Machine Learning for Marketing in Python

View Course

Exercise instructions

  • Split X and Y into train and test sets with 25% of the data split into testing.
  • Ensure that the training dataset has only 75% of original data.
  • Ensure that the testing dataset has only 25% of original data.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split X and Y into training and testing datasets
train_X, test_X, train_Y, test_Y = ___(___, ___, test_size=0.___)

# Ensure training dataset has only 75% of original X data
print(___.shape[0] / X.shape[0])

# Ensure testing dataset has only 25% of original X data
print(___.shape[0] / ___.shape[0])
Edit and Run Code