CommencerCommencer gratuitement

Tokenizing the Gettysburg Address

In this exercise, you will be tokenizing one of the most famous speeches of all time: the Gettysburg Address delivered by American President Abraham Lincoln during the American Civil War.

The entire speech is available as a string named gettysburg.

Cet exercice fait partie du cours

Feature Engineering for NLP in Python

Afficher le cours

Instructions

  • Load the en_core_web_sm model using spacy.load().
  • Create a Doc object doc for the gettysburg string.
  • Using list comprehension, loop over doc to generate the token texts.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

import spacy

# Load the en_core_web_sm model
nlp = ____.____(____)

# Create a Doc object
doc = ____(____)

# Generate the tokens
tokens = [token.____ for token in ____]
print(tokens)
Modifier et exécuter le code