Adding special tokens
You will now learn to add sos
(marks the start) and eos
(marks the end) tokens to the sentences. As already discussed, this step is optional for the model you have right now, but these will be required for a model that you'll be implementing in a later chapter.
To add these special tokens, you will use the Python string.join()
function. string.join()
joins a list of strings to a single string using a delimiter. For example, if you want to convert ['datacamp', 'is', 'awesome']
to 'datacamp is awesome'
, you can use " ".join(['datacamp', 'is', 'awesome'])
, where the " "
(i.e. space character) is the delimiter.
For this exercise, a small sample of 10 French sentences has already been imported.
This exercise is part of the course
Machine Translation with Keras
Exercise instructions
- Loop through the list of French sentences (
fr_text
). - Add a
"sos"
token to denote the beginning and an"eos"
token to denote the ending of each sentence using thestring.join()
function. - Append the modified sentence to
fr_text_new
. - Print the modified sentence
sent_new
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
fr_text_new = []
# Loop through all sentences in fr_text
for sent in ____:
print("Before adding tokens: ", sent)
# Add sos and eos tokens using string.join
sent_new = " ".____([____, sent, ____])
# Append the modified sentence to fr_text_new
____.____(____)
# Print sentence after adding tokens
print("After adding tokens: ", ____, '\n')