Get startedGet started for free

Bing lexicon with an inner join explanation

1. Bing lexicon with an inner join

The basis for tidytext sentiment analysis is an inner join. This function allows you compare the words in your text with a known list of words in the subjectivity lexicon.

2. Table joins

The inner_join() function comes from the dplyr package, but joins are also often associated with relational databases like SQL. Don't be overwhelmed, the family of join functions are generic functions that accept two tables. Let's call them x and y. The Venn diagrams here show you the various dplyr functions with the exception of the semi-join which is similar to an inner join but a bit more complicated. To learn more about joins try out the dplyr course!

3. Table joins

In order for joins to work you need to specify the “by” parameter. This represents a column the tables have in common such as a customer ID or day of the week. If the names are different, you can still use the by parameter to declare the shared values. This is shown just below the table of functions.

4. Comparing inner and anti joins

In these Venn diagrams you can see both an inner join and another function called an anti-join for comparison. In order for these joins to work, you need your text as a table or tidy format where each word token is a single row. Then the join functions will return the rows in common or with specific rows removed depending on the specific join used. Once organized into a tibble where each word is a distinct row, the inner join is used to find words in common. Much like qdap’s polarity function the inner join can be used to find the polarized words from a subjectivity lexicon. In contrast to polarity, the inner join does not create the context cluster or account for the valence shifters. In this example, Table x would be your text such as a book and table y would be the subjectivity lexicon. Using the inner join and passing in the tables with a shared column, the sliver of terms in common is returned. For comparison, to remove words from a text in the tidy format you could use an anti-join. This is similar to using the tm package's removeWords() function. Recall that removing terms can be helpful in text analysis because many terms are uninformative. Often these terms are called stop words and include words like “the” and “is.” In this example, you pass in your text table and the table of stop words. The anti-join will return only the rows in the text table that do not have a stop word. Although we don't use an anti-join in this course its a great comparison to the inner join which you will do a lot!

5. Starting with positive/negative

The next few exercises you will perform an inner join between some book texts and the “bing” lexicon. The bing lexicon contains words classified as “positive” or “negative”. You will end up tallying the positive and negative terms to arrive at a polarity score in line with qdap’s polarity score. Once you master the inner join with the positive and negative you can transition to inner joins with more emotional states and even emotional intensity. We start with just positive and negative so you can focus on the inner join mechanics.

6. Let's practice!

Have fun with the exercises!