Tokenizing the Gettysburg Address
In this exercise, you will be tokenizing one of the most famous speeches of all time: the Gettysburg Address delivered by American President Abraham Lincoln during the American Civil War.
The entire speech is available as a string named gettysburg.
Diese Übung ist Teil des Kurses
Feature Engineering for NLP in Python
Anleitung zur Übung
- Load the
en_core_web_smmodel usingspacy.load(). - Create a Doc object
docfor thegettysburgstring. - Using list comprehension, loop over
docto generate the token texts.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
import spacy
# Load the en_core_web_sm model
nlp = ____.____(____)
# Create a Doc object
doc = ____(____)
# Generate the tokens
tokens = [token.____ for token in ____]
print(tokens)