Aan de slagGa gratis aan de slag

Correlated variables

In this exercise, you will inspect the dataset with respect to correlated variables. It is important to remove them before applying a binary classifier, especially in the case of logistic regression. When two or more variables are highly correlated you should remove all except for one.

First, we will use the corrplot() function in the corrplot package to visualize the correlations. In the correlation plot, blue represents a positive correlation and red a negative correlation. A darker color indicates a higher correlation. Finally, you will remove the highly correlated variables from the data set.

Deze oefening maakt deel uit van de cursus

Predictive Analytics using Networked Data in R

Cursus bekijken

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Remove the Future column from studentnetworkdata 
no_future <- ___

# Load the corrplot package
library(___)

# Generate the correlation matrix
M <- ___(no_future)

# Plot the correlations
___(M, method = "circle")
Code bewerken en uitvoeren