Get startedGet started for free

Create dictionary and corpus

In order to run an LDA topic model, you first need to define your dictionary and corpus first, as those need to go into the model. You're going to continue working on the cleaned text data that you've done in the previous exercises. That means that text_clean is available for you already to continue working with, and you'll use that to create your dictionary and corpus.

This exercise will take a little longer to execute than usual.

This exercise is part of the course

Fraud Detection in Python

View Course

Exercise instructions

  • Import the gensim package and corpora from gensim separately.
  • Define your dictionary by running the correct function on your clean data text_clean.
  • Define the corpus by running doc2bow on each piece of text in text_clean.
  • Print your results so you can see dictionary and corpus look like.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the packages
import ____
from ____ import ____

# Define the dictionary
dictionary = ____.____(____)

# Define the corpus 
corpus = [dictionary.____(text) for ___ in ____]

# Print corpus and dictionary
print(____)
print(____)
Edit and Run Code