LDA model
Now it's time to build the LDA model. Using the dictionary
and corpus
, you are ready to discover which topics are present in the Enron emails. With a quick print of words assigned to the topics, you can do a first exploration about whether there are any obvious topics that jump out. Be mindful that the topic model is heavy to calculate so it will take a while to run. Let's give it a try!
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Build the LDA model from gensim models, by inserting the
corpus
anddictionary
. - Save the 5 topics by running
print
topics on the model results, and select the top 5 words.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Define the LDA model
ldamodel = gensim.models.____.____(____, num_topics=5, id2word=____, passes=5)
# Save the topics and top 5 words
topics = ____.____(num_words=____)
# Print the results
for topic in topics:
print(topic)