BaşlayınÜcretsiz Başlayın

Cleaning a blog post

In this exercise, you have been given an excerpt from a blog post. Your task is to clean this text into a more machine friendly format. This will involve converting to lowercase, lemmatization and removing stopwords, punctuations and non-alphabetic characters.

The excerpt is available as a string blog and has been printed to the console. The list of stopwords are available as stopwords.

Bu egzersiz

Feature Engineering for NLP in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Using list comprehension, loop through doc to extract the lemma_ of each token.
  • Remove stopwords and non-alphabetic tokens using stopwords and isalpha().

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Load model and create Doc object
nlp = spacy.load('en_core_web_sm')
doc = nlp(blog)

# Generate lemmatized tokens
lemmas = [token.____ for token in ____]

# Remove stopwords and non-alphabetic tokens
a_lemmas = [lemma for lemma in lemmas 
            if lemma.____ and lemma not in ____]

# Print string after text cleaning
print(' '.join(a_lemmas))
Kodu Düzenle ve Çalıştır