Linear Discriminant analysis

Linear Discriminant analysis is a classification (and dimension reduction) method. It finds the (linear) combination of the variables that separate the target variable classes. The target can be binary or multiclass variable.

Linear discriminant analysis is closely related to many other methods, such as principal component analysis (we will look into that next week) and the already familiar logistic regression.

LDA can be visualized with a biplot. We will talk more about biplots next week. The LDA biplot arrow function used in the exercise is (with slight changes) taken from this Stack Overflow message thread.

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

Fit a linear discriminant analysis with the function lda(). The function takes a formula (like in regression) as a first argument. Use the crime as a target variable and all the other variables as predictors. Hint! You can type target ~ . where the dot means all other variables in the data.
Print the lda.fit object
Create a numeric vector of the train sets crime classes (for plotting purposes)
Use the function plot() on the lda.fit model. The argument dimen can be used to choose how many discriminants is used.
Adjust the code: add arguments col = classes and pch = classes to the plot.
Execute the lda.arrow() function (if you haven't done that already). Draw the plot with the lda arrows. Note that in DataCamp you will need to select both lines of code and execute them at the same time for the lda.arrow() function to work.
You can change the myscale argument in lda.arrow() to see more clearly which way the arrows are pointing.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# MASS and train are available

# linear discriminant analysis
lda.fit <- lda("change me!", data = train)

# print the lda.fit object
lda.fit

# the function for lda biplot arrows
lda.arrows <- function(x, myscale = 1, arrow_heads = 0.1, color = "red", tex = 0.75, choices = c(1,2)){
  heads <- coef(x)
  arrows(x0 = 0, y0 = 0, 
         x1 = myscale * heads[,choices[1]], 
         y1 = myscale * heads[,choices[2]], col=color, length = arrow_heads)
  text(myscale * heads[,choices], labels = row.names(heads), 
       cex = tex, col=color, pos=3)
}

# target classes as numeric
classes <- as.numeric(train$crime)

# plot the lda results
plot("change me!", dimen = 2)
lda.arrows(lda.fit, myscale = 1)

Edit and Run Code