Normalizing reviews

It's time to extract some important words present in your movie review dataset. First, you need to normalize them and then, count their frequency. Part of the normalization implies converting all the words to lowercase, removing special characters and extracting the root of a word so you count the variants as one.

So imagine you have the following reviews: The movie surprises me very much and Marvel movies always surprise their audience. If you count the word frequency, you will count surprises one time and surprise one time. However, the verb surprise appears in both and its frequency should be two.

The text of a movie review for only one example has been already saved in the variable movie. You can use print(movie) to view the variable in the IPython Shell.

Convert the string in the variable movie to lowercase. Print the result.

Basic Concepts of String Manipulation

Formatting Strings

Regular Expressions for Pattern Matching

Advanced Regular Expression Concepts

Exercise

Normalizing reviews

Instructions 1/4