h2o practice
There are several machine learning libraries available in R. However, the h2o
library is easy to use and offers a word2vec implementation. h2o
can also be used for several other machine learning tasks. In order to use the h2o
library however, you need to take additional pre-processing steps with your data. You have a dataset called left_right
which contains tweets that were auto-tweeted during the 2016 US election campaign.
Instead of preparing your data for other text analysis techniques, prepare this dataset for use with the h2o
library.
This exercise is part of the course
Introduction to Natural Language Processing in R
Exercise instructions
- Import the library and initialize and
h2o
session. - Create an
h2o
object. - Tokenize the tweets which are stored in the
content
column. - Transform the words to lowercase and remove all stop words.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Initialize an h2o session
library(___)
___.init()
# Create an h2o object for left_right
h2o_object = as.___(left_right)
# Tokenize the words from the column of text in left_right
tweet_words <- h2o.___(h2o_object$___, "\\\\W+")
# Lowercase
tweet_words <- h2o.___(tweet_words)
# Remove stopwords from tweet_words
tweet_words <- tweet_words[is.na(___) || (!___ %in% stop_words$word),]
tweet_words