Linear Discriminant analysis
Linear Discriminant analysis is a classification (and dimension reduction) method. It finds the (linear) combination of the variables that separate the target variable classes. The target can be binary or multiclass variable.
Linear discriminant analysis is closely related to many other methods, such as principal component analysis (we will look into that next week) and the already familiar logistic regression.
LDA can be visualized with a biplot. We will talk more about biplots next week. The LDA biplot arrow function used in the exercise is (with slight changes) taken from this Stack Overflow message thread.
This exercise is part of the course
Helsinki Open Data Science
Exercise instructions
- Fit a linear discriminant analysis with the function
lda()
. The function takes a formula (like in regression) as a first argument. Use thecrime
as a target variable and all the other variables as predictors. Hint! You can typetarget ~ .
where the dot means all other variables in the data. - Print the
lda.fit
object - Create a numeric vector of the train sets crime classes (for plotting purposes)
- Use the function
plot()
on thelda.fit
model. The argumentdimen
can be used to choose how many discriminants is used. - Adjust the code: add arguments
col = classes
andpch = classes
to the plot. - Execute the
lda.arrow()
function (if you haven't done that already). Draw the plot with the lda arrows. Note that in DataCamp you will need to select both lines of code and execute them at the same time for thelda.arrow()
function to work. - You can change the
myscale
argument inlda.arrow()
to see more clearly which way the arrows are pointing.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# MASS and train are available
# linear discriminant analysis
lda.fit <- lda("change me!", data = train)
# print the lda.fit object
lda.fit
# the function for lda biplot arrows
lda.arrows <- function(x, myscale = 1, arrow_heads = 0.1, color = "red", tex = 0.75, choices = c(1,2)){
heads <- coef(x)
arrows(x0 = 0, y0 = 0,
x1 = myscale * heads[,choices[1]],
y1 = myscale * heads[,choices[2]], col=color, length = arrow_heads)
text(myscale * heads[,choices], labels = row.names(heads),
cex = tex, col=color, pos=3)
}
# target classes as numeric
classes <- as.numeric(train$crime)
# plot the lda results
plot("change me!", dimen = 2)
lda.arrows(lda.fit, myscale = 1)