Selecting the ideal dataset

Now to get rid of some of the unnecessary features in the ufo dataset. Because the country column has been encoded as country_enc, you can select it and drop the other columns related to location: city, country, lat, long, and state.

You've engineered the month and year columns, so you no longer need the date or recorded columns. You also standardized the seconds column as seconds_log, so you can drop seconds and minutes.

You vectorized desc, so it can be removed. For now you'll keep type.

You can also get rid of the length_of_time column, which is unnecessary after extracting minutes.

Cet exercice fait partie du cours

Preprocessing for Machine Learning in Python

Afficher le cours

Instructions

Make a list of all the columns to drop, to_drop.
Drop these columns from ufo.
Use the words_to_filter() function you created previously; pass in vocab, vec.vocabulary_, desc_tfidf, and keep the top 4 words as the last parameter.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Make a list of features to drop
to_drop = [____]

# Drop those features
ufo_dropped = ufo.____

# Let's also filter some words out of the text vector we created
filtered_words = ____(____, ____, ____, ____)

Modifier et exécuter le code