Split the data
A dataframe df_examples
is available having columns endword
: string, features
: vector, outvec
: vector, and label
: int. You're going to split it to obtain training and testing set, which you will use to train and test a classifier.
Este ejercicio forma parte del curso
Introduction to Spark SQL in Python
Instrucciones del ejercicio
- Split the examples into train and test using a 80/20 split.
- Print the number of training examples.
- Print the number of test examples.
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# Split the examples into train and test, use 80/20 split
df_trainset, df_testset = df_examples.____((____), 42)
# Print the number of training examples
print("Number training: ", ____.____)
# Print the number of test examples
print("Number test: ", ____.____)