CommencerCommencer gratuitement

Exploring text vectors, part 2

Using the return_weights() function you wrote in the previous exercise, you're now going to extract the top words from each document in the text vector, return a list of the word indices, and use that list to filter the text vector down to those top words.

Cet exercice fait partie du cours

Preprocessing for Machine Learning in Python

Afficher le cours

Instructions

  • Call return_weights() to return the top weighted words for that document.
  • Call set() on the returned filter_list to remove duplicated numbers.
  • Call words_to_filter, passing in the following parameters: vocab for the vocab parameter, tfidf_vec.vocabulary_ for the original_vocab parameter, text_tfidf for the vector parameter, and 3 to grab the top_n 3 weighted words from each document.
  • Finally, pass that filtered_words set into a list to use as a filter for the text vector.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

def words_to_filter(vocab, original_vocab, vector, top_n):
    filter_list = []
    for i in range(0, vector.shape[0]):
    
        # Call the return_weights function and extend filter_list
        filtered = ____(vocab, original_vocab, vector, i, top_n)
        filter_list.extend(filtered)
        
    # Return the list in a set, so we don't get duplicate word indices
    return ____(filter_list)

# Call the function to get the list of word indices
filtered_words = ____(____, ____, ____, ____)

# Filter the columns in text_tfidf to only those in filtered_words
filtered_text = text_tfidf[:, list(____)]
Modifier et exécuter le code