Welcome!
1. Welcome!
Welcome to the course! In this course, we will build upon some of your Python skills and introduce methods for sentiment analysis using movie and product reviews, Twitter data and a lot of literary examples.2. What is sentiment analysis?
Let's start with defining what sentiment analysis is. Sentiment analysis, also called opinion mining, is the process of understanding the opinion of an author about a subject. In other words, "What is the emotion or opinion of the author of the text about the subject discussed?"3. What goes into a sentiment analysis system?
In a sentiment analysis system, depending on the context, we usually have 3 elements: First is the opinion or an emotion. An opinion (also called "polarity") can be positive, neutral or negative. An emotion could be qualitative (like joy, surprise, or anger) or quantitative (like rating a movie on the scale from 1 to 10).4. What goes into a sentiment analysis system?
The second element in a sentiment analysis system is the subject that is being talked about, such as a book, a movie, or a product. Sometimes one opinion could discuss multiple aspects of the same subject. For example: "The camera on this phone is great but its battery life is rather disappointing."" The third element is the opinion holder, or entity, expressing the opinion.5. Why sentiment analysis?
Sentiment analysis has many practical applications. In social media monitoring, we don't just want to know if people are talking about a brand; we want to know how they are talking about it. Social media isn't our only source of information; we can also find sentiment on forums, blogs, and the news. Most brands analyze all of these sources to enrich their understanding of how customers interact with their brand, what they are happy or unhappy about, and what matters most to consumers. Sentiment analysis is thus very important in brand monitoring, and in fields such as customer and product analytics and market research and analysis.6. Let's look at movie reviews!
Let's look at the first dataset we will use in this course: a sample of IMDB movie reviews. We have two columns: one for the text of the review, and a second one called "label", which expresses the overall sentiment: the category or class 1 means positive and 0 means negative.7. How many positive and negative reviews?
Let's find out how many positive and negative reviews we have in the data. To do this, we call the .value_counts() method on the "label" column. The output is the number of negative reviews (the 0 class) and positive reviews (the class 1).8. Percentage of positive and negative reviews
If we want to see the number of positives and negatives as a percentage, we can divide the expression by the number of rows, which we obtain with the len() method. We see that the sample is rather balanced: around half of the reviews are positive and half are negative.9. How long is the longest review?
How long is the longest review? To find that, we create a pandas Series called length_reviews by selecting the review column of the dataset, followed by .str.len(). Str is short for string. We need to call the string function to transform the Series of reviews to a string. If we skip it, we get an AttributeError when the len() function is called. The result returns a pandas Series with the number of characters in each review. To find the length of the longest review, we need to call the max() function on the length_reviews Series.10. How long is the shortest review?
To find the shortest review, we call the min() function on the length_reviews Series, instead of the max() function.11. Let's practice!
Let’s practice what we’ve learned in the exercises!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.