Exercise

h2o practice

There are several machine learning libraries available in R. However, the h2o library is easy to use and offers a word2vec implementation. h2o can also be used for several other machine learning tasks. In order to use the h2o library however, you need to take additional pre-processing steps with your data. You have a dataset called left_right which contains tweets that were auto-tweeted during the 2016 US election campaign.

Instead of preparing your data for other text analysis techniques, prepare this dataset for use with the h2o library.

Instructions

100 XP
  • Import the library and initialize and h2o session.
  • Create an h2o object.
  • Tokenize the tweets which are stored in the content column.
  • Transform the words to lowercase and remove all stop words.