word vectors with spaCy
In this exercise you'll get your first experience with word vectors! You're going to use the ATIS dataset, which contains thousands of sentences from real people interacting with a flight booking system.
The user utterances are available in the list sentences
, and the corresponding intents in labels
.
Your job is to create a 2D array X
with as many rows as there are sentences in the dataset, where each row is a vector describing that sentence.
This exercise is part of the course
Building Chatbots in Python
Exercise instructions
- Load the
spaCy
English model by callingspacy.load()
with argument'en'
. - Calculate the length of
sentences
usinglen()
and the dimensionality of the word vectors usingnlp.vocab.vectors_length
. - For each sentence, call the
nlp
object with thesentence
as the sole argument. Store the result asdoc
. - Use the
.vector
attribute ofdoc
to get the vector representation of each sentence, and store this vector in the appropriate row ofX
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Load the spacy model: nlp
nlp = ____
# Calculate the length of sentences
n_sentences = ____
# Calculate the dimensionality of nlp
embedding_dim = ____
# Initialize the array with zeros: X
X = np.zeros((n_sentences, embedding_dim))
# Iterate over the sentences
for idx, sentence in enumerate(sentences):
# Pass each each sentence to the nlp object to create a document
doc = ____
# Save the document's .vector attribute to the corresponding row in X
X[idx, :] = ____