Normalizing reviews
It's time to extract some important words present in your movie review dataset. First, you need to normalize them and then, count their frequency. Part of the normalization implies converting all the words to lowercase, removing special characters and extracting the root of a word so you count the variants as one.
So imagine you have the following reviews: The movie surprises me very much and Marvel movies always surprise their audience. If you count the word frequency, you will count surprises one time and surprise one time. However, the verb surprise appears in both and its frequency should be two.
The text of a movie review for only one example has been already saved in the variable movie. You can use print(movie) to view the variable in the IPython Shell.
Deze oefening maakt deel uit van de cursus
Regular Expressions in Python
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Convert to lowercase and print the result
movie_lower = ____
print(____)