Get startedGet started for free

h2o practice

There are several machine learning libraries available in R. However, the h2o library is easy to use and offers a word2vec implementation. h2o can also be used for several other machine learning tasks. In order to use the h2o library however, you need to take additional pre-processing steps with your data. You have a dataset called left_right which contains tweets that were auto-tweeted during the 2016 US election campaign.

Instead of preparing your data for other text analysis techniques, prepare this dataset for use with the h2o library.

This exercise is part of the course

Introduction to Natural Language Processing in R

View Course

Exercise instructions

  • Import the library and initialize and h2o session.
  • Create an h2o object.
  • Tokenize the tweets which are stored in the content column.
  • Transform the words to lowercase and remove all stop words.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Initialize an h2o session
library(___)
___.init()

# Create an h2o object for left_right
h2o_object = as.___(left_right)

# Tokenize the words from the column of text in left_right
tweet_words <- h2o.___(h2o_object$___, "\\\\W+")

# Lowercase
tweet_words <- h2o.___(tweet_words)
# Remove stopwords from tweet_words
tweet_words <- tweet_words[is.na(___) || (!___ %in% stop_words$word),]
tweet_words
Edit and Run Code