CommencerCommencer gratuitement

h2o practice

There are several machine learning libraries available in R. However, the h2o library is easy to use and offers a word2vec implementation. h2o can also be used for several other machine learning tasks. In order to use the h2o library however, you need to take additional pre-processing steps with your data. You have a dataset called left_right which contains tweets that were auto-tweeted during the 2016 US election campaign.

Instead of preparing your data for other text analysis techniques, prepare this dataset for use with the h2o library.

Cet exercice fait partie du cours

Introduction to Natural Language Processing in R

Afficher le cours

Instructions

  • Import the library and initialize and h2o session.
  • Create an h2o object.
  • Tokenize the tweets which are stored in the content column.
  • Transform the words to lowercase and remove all stop words.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Initialize an h2o session
library(___)
___.init()

# Create an h2o object for left_right
h2o_object = as.___(left_right)

# Tokenize the words from the column of text in left_right
tweet_words <- h2o.___(h2o_object$___, "\\\\W+")

# Lowercase
tweet_words <- h2o.___(tweet_words)
# Remove stopwords from tweet_words
tweet_words <- tweet_words[is.na(___) || (!___ %in% stop_words$word),]
tweet_words
Modifier et exécuter le code