Building a Counter with bag-of-words

In this exercise, you'll build your first (in this course) bag-of-words counter using a Wikipedia article, which has been pre-loaded as article. Try doing the bag-of-words without looking at the full article text, and guessing what the topic is! If you'd like to peek at the title at the end, we've included it as article_title. Note that this article text has had very little preprocessing from the raw Wikipedia database entry.

word_tokenize has been imported for you.

Import Counter from collections.
Use word_tokenize() to split the article into tokens.
Use a list comprehension with t as the iterator variable to convert all the tokens into lowercase. The .lower() method converts text into lowercase.
Create a bag-of-words counter called bow_simple by using Counter() with lower_tokens as the argument.
Use the .most_common() method of bow_simple to print the 10 most common tokens.

script.py

IPython Shell

Regular expressions & word tokenization

Simple topic identification

Named-entity recognition

Building a "fake news" classifier

Exercise

Exercise

Building a Counter with bag-of-words

Instructions