1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Spark SQL in Python

Connected

Exercise

Split the data

A dataframe df_examples is available having columns endword: string, features: vector, outvec: vector, and label: int. You're going to split it to obtain training and testing set, which you will use to train and test a classifier.

Instructions

100 XP
  • Split the examples into train and test using a 80/20 split.
  • Print the number of training examples.
  • Print the number of test examples.