Get Started

Fit a logistic regression model

Once you have your random training and test sets you can fit a logistic regression model to your training set using the glm() function. glm() is a more advanced version of lm() that allows for more varied types of regression models, aside from plain vanilla ordinary least squares regression.

Be sure to pass the argument family = "binomial" to glm() to specify that you want to do logistic (rather than linear) regression. For example:

glm(Target ~ ., family = "binomial", dataset)

Don't worry about warnings like glm.fit: algorithm did not converge or glm.fit: fitted probabilities numerically 0 or 1 occurred. These are common on smaller datasets and usually don't cause any issues. They typically mean your dataset is perfectly separable, which can cause problems for the math behind the model, but R's glm() function is almost always robust enough to handle this case with no problems.

Once you have a glm() model fit to your dataset, you can predict the outcome (e.g. rock or mine) on the test set using the predict() function with the argument type = "response":

predict(my_model, test, type = "response")

This is a part of the course

“Machine Learning with caret in R”

View Course

Exercise instructions

  • Fit a logistic regression called model to predict Class using all other variables as predictors. Use the training set for Sonar.
  • Predict on the test set using that model. Call the result p like you've done before.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit glm model: model


# Predict on test: p

This exercise is part of the course

Machine Learning with caret in R

AdvancedSkill Level
4.5+
17 reviews

This course teaches the big ideas in machine learning like how to build and evaluate predictive models.

In this chapter, you'll fit classification models with <code>train()</code> and evaluate their out-of-sample performance using cross-validation and area under the curve (AUC).

Exercise 1: Logistic regression on sonarExercise 2: Why a train/test split?Exercise 3: Try a 60/40 splitExercise 4: Fit a logistic regression model
Exercise 5: Confusion matrixExercise 6: Confusion matrix takeawaysExercise 7: Calculate a confusion matrixExercise 8: Calculating accuracyExercise 9: Calculating true positive rateExercise 10: Calculating true negative rateExercise 11: Class probabilities and predictionsExercise 12: Probabilities and classesExercise 13: Try another thresholdExercise 14: From probabilites to confusion matrixExercise 15: Introducing the ROC curveExercise 16: What's the value of a ROC curve?Exercise 17: Plot an ROC curveExercise 18: Area under the curve (AUC)Exercise 19: Model, ROC, and AUCExercise 20: Customizing trainControlExercise 21: Using custom trainControl

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free