MulaiMulai sekarang secara gratis

Selecting the ideal dataset

Now to get rid of some of the unnecessary features in the ufo dataset. Because the country column has been encoded as country_enc, you can select it and drop the other columns related to location: city, country, lat, long, and state.

You've engineered the month and year columns, so you no longer need the date or recorded columns. You also standardized the seconds column as seconds_log, so you can drop seconds and minutes.

You vectorized desc, so it can be removed. For now you'll keep type.

You can also get rid of the length_of_time column, which is unnecessary after extracting minutes.

Latihan ini adalah bagian dari kursus

Preprocessing for Machine Learning in Python

Lihat Kursus

Petunjuk latihan

  • Make a list of all the columns to drop, to_drop.
  • Drop these columns from ufo.
  • Use the words_to_filter() function you created previously; pass in vocab, vec.vocabulary_, desc_tfidf, and keep the top 4 words as the last parameter.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Make a list of features to drop
to_drop = [____]

# Drop those features
ufo_dropped = ufo.____

# Let's also filter some words out of the text vector we created
filtered_words = ____(____, ____, ____, ____)
Edit dan Jalankan Kode