Create dictionary and corpus
In order to run an LDA topic model, you first need to define your dictionary and corpus first, as those need to go into the model. You're going to continue working on the cleaned text data that you've done in the previous exercises. That means that text_clean
is available for you already to continue working with, and you'll use that to create your dictionary and corpus.
This exercise will take a little longer to execute than usual.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Import the gensim package and corpora from gensim separately.
- Define your dictionary by running the correct function on your clean data
text_clean
. - Define the corpus by running
doc2bow
on each piece of text intext_clean
. - Print your results so you can see
dictionary
andcorpus
look like.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import the packages
import ____
from ____ import ____
# Define the dictionary
dictionary = ____.____(____)
# Define the corpus
corpus = [dictionary.____(text) for ___ in ____]
# Print corpus and dictionary
print(____)
print(____)