IniziaInizia gratis

Create dictionary and corpus

In order to run an LDA topic model, you first need to define your dictionary and corpus first, as those need to go into the model. You're going to continue working on the cleaned text data that you've done in the previous exercises. That means that text_clean is available for you already to continue working with, and you'll use that to create your dictionary and corpus.

This exercise will take a little longer to execute than usual.

Questo esercizio fa parte del corso

Fraud Detection in Python

Visualizza il corso

Istruzioni dell'esercizio

  • Import the gensim package and corpora from gensim separately.
  • Define your dictionary by running the correct function on your clean data text_clean.
  • Define the corpus by running doc2bow on each piece of text in text_clean.
  • Print your results so you can see dictionary and corpus look like.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Import the packages
import ____
from ____ import ____

# Define the dictionary
dictionary = ____.____(____)

# Define the corpus 
corpus = [dictionary.____(text) for ___ in ____]

# Print corpus and dictionary
print(____)
print(____)
Modifica ed esegui il codice