LoslegenKostenlos loslegen

Create dictionary and corpus

In order to run an LDA topic model, you first need to define your dictionary and corpus first, as those need to go into the model. You're going to continue working on the cleaned text data that you've done in the previous exercises. That means that text_clean is available for you already to continue working with, and you'll use that to create your dictionary and corpus.

This exercise will take a little longer to execute than usual.

Diese Übung ist Teil des Kurses

Fraud Detection in Python

Kurs anzeigen

Anleitung zur Übung

  • Import the gensim package and corpora from gensim separately.
  • Define your dictionary by running the correct function on your clean data text_clean.
  • Define the corpus by running doc2bow on each piece of text in text_clean.
  • Print your results so you can see dictionary and corpus look like.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Import the packages
import ____
from ____ import ____

# Define the dictionary
dictionary = ____.____(____)

# Define the corpus 
corpus = [dictionary.____(text) for ___ in ____]

# Print corpus and dictionary
print(____)
print(____)
Code bearbeiten und ausführen