Get startedGet started for free

Adding special tokens

You will now learn to add sos (marks the start) and eos (marks the end) tokens to the sentences. As already discussed, this step is optional for the model you have right now, but these will be required for a model that you'll be implementing in a later chapter.

To add these special tokens, you will use the Python string.join() function. string.join() joins a list of strings to a single string using a delimiter. For example, if you want to convert ['datacamp', 'is', 'awesome'] to 'datacamp is awesome', you can use " ".join(['datacamp', 'is', 'awesome']), where the " " (i.e. space character) is the delimiter.

For this exercise, a small sample of 10 French sentences has already been imported.

This exercise is part of the course

Machine Translation with Keras

View Course

Exercise instructions

  • Loop through the list of French sentences (fr_text).
  • Add a "sos" token to denote the beginning and an "eos" token to denote the ending of each sentence using the string.join() function.
  • Append the modified sentence to fr_text_new.
  • Print the modified sentence sent_new.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

fr_text_new = []

# Loop through all sentences in fr_text
for sent in ____:
  
  print("Before adding tokens: ", sent)
  
  # Add sos and eos tokens using string.join
  sent_new = " ".____([____, sent, ____])
  # Append the modified sentence to fr_text_new
  ____.____(____)
  
  # Print sentence after adding tokens
  print("After adding tokens: ", ____, '\n')
Edit and Run Code