ComeçarComece de graça

Create dictionary and corpus

In order to run an LDA topic model, you first need to define your dictionary and corpus first, as those need to go into the model. You're going to continue working on the cleaned text data that you've done in the previous exercises. That means that text_clean is available for you already to continue working with, and you'll use that to create your dictionary and corpus.

This exercise will take a little longer to execute than usual.

Este exercício faz parte do curso

Fraud Detection in Python

Ver curso

Instruções do exercício

  • Import the gensim package and corpora from gensim separately.
  • Define your dictionary by running the correct function on your clean data text_clean.
  • Define the corpus by running doc2bow on each piece of text in text_clean.
  • Print your results so you can see dictionary and corpus look like.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Import the packages
import ____
from ____ import ____

# Define the dictionary
dictionary = ____.____(____)

# Define the corpus 
corpus = [dictionary.____(text) for ___ in ____]

# Print corpus and dictionary
print(____)
print(____)
Editar e executar o código