Get startedGet started for free

Tokenizing the Gettysburg Address

In this exercise, you will be tokenizing one of the most famous speeches of all time: the Gettysburg Address delivered by American President Abraham Lincoln during the American Civil War.

The entire speech is available as a string named gettysburg.

This exercise is part of the course

Feature Engineering for NLP in Python

View Course

Exercise instructions

  • Load the en_core_web_sm model using spacy.load().
  • Create a Doc object doc for the gettysburg string.
  • Using list comprehension, loop over doc to generate the token texts.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

import spacy

# Load the en_core_web_sm model
nlp = ____.____(____)

# Create a Doc object
doc = ____(____)

# Generate the tokens
tokens = [token.____ for token in ____]
print(tokens)
Edit and Run Code