Fit a logistic regression model
Once you have your random training and test sets you can fit a logistic regression model to your training set using the glm()
function. glm()
is a more advanced version of lm()
that allows for more varied types of regression models, aside from plain vanilla ordinary least squares regression.
Be sure to pass the argument family = "binomial"
to glm()
to specify that you want to do logistic (rather than linear) regression. For example:
glm(Target ~ ., family = "binomial", dataset)
Don't worry about warnings like glm.fit: algorithm did not converge
or glm.fit: fitted probabilities numerically 0 or 1 occurred
. These are common on smaller datasets and usually don't cause any issues. They typically mean your dataset is perfectly separable, which can cause problems for the math behind the model, but R's glm()
function is almost always robust enough to handle this case with no problems.
Once you have a glm()
model fit to your dataset, you can predict the outcome (e.g. rock or mine) on the test
set using the predict()
function with the argument type = "response"
:
predict(my_model, test, type = "response")
This is a part of the course
“Machine Learning with caret in R”
Exercise instructions
- Fit a logistic regression called
model
to predictClass
using all other variables as predictors. Use the training set forSonar
. - Predict on the
test
set using that model. Call the resultp
like you've done before.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit glm model: model
# Predict on test: p
This exercise is part of the course
Machine Learning with caret in R
This course teaches the big ideas in machine learning like how to build and evaluate predictive models.
In this chapter, you'll fit classification models with <code>train()</code> and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).
Exercise 1: Logistic regression on sonarExercise 2: Why a train/test split?Exercise 3: Try a 60/40 splitExercise 4: Fit a logistic regression modelExercise 5: Confusion matrixExercise 6: Confusion matrix takeawaysExercise 7: Calculate a confusion matrixExercise 8: Calculating accuracyExercise 9: Calculating true positive rateExercise 10: Calculating true negative rateExercise 11: Class probabilities and predictionsExercise 12: Probabilities and classesExercise 13: Try another thresholdExercise 14: From probabilites to confusion matrixExercise 15: Introducing the ROC curveExercise 16: What's the value of a ROC curve?Exercise 17: Plot an ROC curveExercise 18: Area under the curve (AUC)Exercise 19: Model, ROC, and AUCExercise 20: Customizing trainControlExercise 21: Using custom trainControlWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.