Cosine similarity matrix of a corpus
In this exercise, you have been given a corpus
, which is a list containing five sentences. The corpus
is printed in the console. You have to compute the cosine similarity matrix which contains the pairwise cosine similarity score for every pair of sentences (vectorized using tf-idf).
Remember, the value corresponding to the ith row and jth column of a similarity matrix denotes the similarity score for the ith and jth vector.
Este ejercicio forma parte del curso
Feature Engineering for NLP in Python
Instrucciones del ejercicio
- Initialize an instance of
TfidfVectorizer
. Name ittfidf_vectorizer
. - Using
fit_transform()
, generate the tf-idf vectors forcorpus
. Name ittfidf_matrix
. - Use
cosine_similarity()
and passtfidf_matrix
to compute the cosine similarity matrixcosine_sim
.
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Initialize an instance of tf-idf Vectorizer
tfidf_vectorizer = ____
# Generate the tf-idf vectors for the corpus
tfidf_matrix = tfidf_vectorizer.fit_transform(____)
# Compute and print the cosine similarity matrix
cosine_sim = ____(____, tfidf_matrix)
print(cosine_sim)