In the last chapter, you saw some of the feature transformation functionality of Spark MLlib. If that library were a meal, the feature transformations would be a starter; the main course is a sumptuous selection of machine learning modeling functions! These functions all have names beginning with ml_
, and have a similar signature. They take a tibble, a string naming the response variable, a character vector naming features (input variables), and possibly some other model-specific arguments.
a_tibble %>%
ml_some_model("response", c("a_feature", "another_feature"), some_other_args)
Supported machine learning functions include linear regression and its variants, tree-based models (ml_decision_tree()
, and a few others. You can see the list of all the machine learning functions using ls()
.
ls("package:sparklyr", pattern = "^ml")
What arguments do all the machine learning model functions take?