Get startedGet started for free

Classification models

1. Classification models

So far we have focused on regression models. But deep learning works similarly for classification, that is for predicting outcomes from a set of discrete options.

2. Classification

For classification, you do a couple of things differently. The biggest changes are: first, you set the loss function as 'categorical_crossentropy' instead of 'mean_squared_error'. This isn't the only possible loss function for classification problems, but it is by far the most common. You may have heard of this before under the name LogLoss. We won't go into the mathematical details of categorical crossentropy here. For categorical crossentropy loss function, a lower score is better. But it's still hard to interpret. So I've added this argument "metrics equals accuracy". This means I want to print out the accuracy score at the end of each epoch, which makes it easier to see and understand the models progress. The second thing we do is you need to modify the last layer, so it has a separate node for each potential outcome. You will also change the activation function to softmax. The softmax activation function ensures the predictions sum to 1, so they can be interpreted as probabilities.

3. Quick look at the data

Here is some data for a binary classification problem. We have data from the NBA basketball league. It includes facts about each shot, and the

4. Quick look at the data

shot result is either 0 or 1, indicating whether the shot went in or not. The outcome here is in a single column, which is not uncommon. But in general, we'll want to convert categoricals in Keras to a format with a separate column for each output. Keras includes a function to do that, which you will see in the code soon. This setup is consistent with the fact that you will have a separate node in the output for each possible class.

5. Transforming to categorical

We have a new column for each value of shot_result. A 1 in any column indicates that this column corresponds to the value from the original data. This is sometimes called one-hot encoding. If the original data had 3 or 4 or 100 different values, the new array for our data would have 3 or 4 or 100 columns respectively.

6. Classification

Here is the code to build a model with that data. First, we import that utility function to convert the data from one column to multiple columns. That is this function to_categorical. We then read in the data. I like reading in the data with pandas, in case I want to inspect it. But this could be done with numpy. I also do a couple of pandas tricks here which you may or may not be familiar with. Here, I use the drop command to get a version of my data without the target column. We then create our target using the to_categorical function. Then we build our model. It looks similar to models you've seen. Except the last line of the model definition has 2 nodes, for the 2 possible outcomes. And it has the softmax activation function.

7. Classification

Lets look at the results now. Both accuracy and loss improve measurably for the first 3 epochs, and then the improvement slows down. Sometimes it gets a little worse for an epoch, sometimes it gets a little better. We will soon see a more sophisticated way to determine how long to train, but training for 10 epochs got us to that flat part of the loss function, so this worked well in this case.

8. Let's practice!