Remove stop words and additional spaces

A corpus of text usually contains many common words like "a", "an", "the", "of", and "but". In natural language processing, these are called stop words.

Stop words are usually removed during text processing so one can focus on more important words in the corpus to derive insights.

Also, the additional spaces created during the removal of special characters, punctuation, numbers, and stop words need to be removed from the corpus.

The corpus that you created in the last exercise has been pre-loaded as twt_corpus_lwr.

The library tm has been pre-loaded for this exercise.

Diese Übung ist Teil des Kurses

Analyzing Social Media Data in R

Kurs anzeigen

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Remove English stop words from the corpus and view the corpus 
twt_corpus_stpwd <- ___(twt_corpus_lwr, ___, stopwords("___"))
head(twt_corpus_stpwd$content)

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

Analyzing Social Media Data in R

Mittlere SchwierigkeitSchwierigkeitsgrad

4.9+

66 reviews

Kurs kostenlos starten

Get started with understanding the power of Twitter data and what you can achieve using social media analysis. In this chapter, you’ll extract your first set of tweets using the Twitter API and functions from the powerful ‘rtweet’ library. Then it’s time to explore how you can use the components from your extracted Twitter data to derive insights for social media analysis.

Exercise 1: Analyzing twitter data Exercise 2: Power of twitter data Exercise 3: Pros and cons of twitter data Exercise 4: Extracting twitter data Exercise 5: Prerequisites to set up the R environment Exercise 6: Search and extract tweets Exercise 7: Search and extract timelines Exercise 8: Components of twitter data Exercise 9: User interest and tweet counts Exercise 10: Compare follower count Exercise 11: Retweet counts

It’s time to go deeper. Learn how you can apply filters to tweets and analyze Twitter user data using the golden ratio and the Twitter lists they subscribe to. You’ll also learn how to extract trending topics and analyze Twitter data over time to identify interesting insights.

Exercise 1: Filtering tweets Exercise 2: Filtering for original tweets Exercise 3: Filtering on tweet language Exercise 4: Filter based on tweet popularity Exercise 5: Twitter user analysis Exercise 6: Extract user information Exercise 7: Explore users based on the golden ratio Exercise 8: Subscribers to twitter lists Exercise 9: Twitter trends Exercise 10: Available trends Exercise 11: Trends by country name Exercise 12: Trends by city and most tweeted trends Exercise 13: Plotting twitter data over time Exercise 14: Visualizing frequency of tweets Exercise 15: Create time series objects Exercise 16: Compare tweet frequencies for two brands

A picture is worth a thousand words! In this chapter, you’ll discover how you can visualize text from tweets using bar plots and word clouds. You’ll learn how to process tweet text and prepare a clean text corpus for analysis. Imagine being able to extract key discussion topics and people's perceptions about a subject or brand from the tweets they are sharing. You’ll be able to do just that using topic modeling and sentiment analysis.

Exercise 1: Processing twitter text Exercise 2: Remove URLs and characters other than letters Exercise 3: Build a corpus and convert to lowercase Exercise 4: Remove stop words and additional spaces

Aktuelle Übung

Exercise 5: Visualize popular terms Exercise 6: Removing custom stop words Exercise 7: Visualize popular terms with bar plots Exercise 8: Word clouds for visualization Exercise 9: Topic modeling of tweets Exercise 10: The LDA algorithm Exercise 11: Create a document term matrix Exercise 12: Create a topic model Exercise 13: Twitter sentiment analysis Exercise 14: Extract sentiment scores Exercise 15: Perform sentiment analysis

Twitter users tweet, like, follow, and retweet creating complex network structures. In this final chapter, you’ll learn how to analyze these network structures and visualize the relationships between these individual people as a retweet network. By extracting geolocation data from the tweets you’ll also discover how to display tweet locations on a map, and answer powerful questions such as which states or countries are talking about your brand the most? Geographic data adds a new dimension to your Twitter data analysis.

Exercise 1: Twitter network analysis Exercise 2: Preparing data for a retweet network Exercise 3: Create a retweet network Exercise 4: Network centrality measures Exercise 5: Calculate out-degree scores Exercise 6: Compute the in-degree scores Exercise 7: Calculate the betweenness scores Exercise 8: Visualizing twitter networks Exercise 9: Create a network plot with attributes Exercise 10: Network plot based on centrality measure Exercise 11: Follower count to enhance the network plot Exercise 12: Putting twitter data on the map Exercise 13: Extract geolocation coordinates Exercise 14: Twitter data on the map Exercise 15: Course wrap-up