1. Multi-class classification
What about when we have more than two classes to classify? We run into a multi-class classification problem, but don't worry, we just have to make a minor tweak to our neural network architecture.
2. Throwing darts
Identifying who threw which dart in a game of darts is a good example of a multi-class classification problem. Each dart can only be thrown by one competitor. And that means our classes are mutually exclusive since no dart can be thrown by two different competitors simultaneously.
3. The dataset
The darts dataset consist of dart throws by different competitors. The coordinate pairs xCoord and yCoord show where each dart landed.
4. The dataset
Based on the landing position of previously thrown darts we should be able to distinguish between throwers if there's enough variation between their throws. In our pairplot we can see that different players tend to aim at specific regions of the board.
5. The architecture
The model for this dataset has two neurons as inputs, since our predictors are xCoord and yCoord. We will define them using the input_shape argument, just as we've done before.
6. The architecture
In between there will be a series of hidden layers, we are using 3 Dense layers of 128, 64 and 32 neurons each.
7. The architecture
As outputs we have 4 neurons, one per competitor. Let's look closer at the output layer now.
8. The output layer
We have 4 outputs, each linked to a possible competitor.
Each competitor has a probability of having thrown a given dart, so we must make sure the total sum of probabilities for the output neurons equals one. We achieve this with the softmax activation function.
Once we have a probability per output neuron we then choose as our prediction the competitor whose associated output has the highest probability.
9. Multi-class model
You can build this model as we did in the previous lesson; instantiate a sequential model, add a hidden layer, also defining an input layer with the input_shape parameter,and finish by adding the remaining hidden layers and an output layer with softmax activation. You will do all this yourself in the exercises.
10. Categorical cross-entropy
When compiling your model, instead of binary cross-entropy as we used before, we now use categorical cross-entropy or log loss. Categorical cross-entropy measures the difference between the predicted probabilities and the true label of the class we should have predicted. So if we should have predicted 1 for a given class, taking a look at the graph we see we would get high loss values for predicting close to 0 (since we'd be very wrong) and low loss values for predicting closer to 1 (the true label).
11. Preparing a dataset
Since our outputs are vectors containing the probabilities of each class, our neural network must also be trained with vectors representing this concept. To achieve that we make use of the tensorflow.keras.utils to_categorical function.
We first turn our response variable into a categorical variable with pandas Categorical, this allows us to redefine the column using the categorical codes (cat codes) of the different categories.
Now that our categories are each represented by a unique integer, we can use the to_categorical function to turn them into one-hot encoded vectors, where each component is 0 except for the one corresponding to the labeled categories.
12. One-hot encoding
Keras to_categorical essentially perform the process described in the picture above. Label encoded Apple, Chicken and Broccoli turn into a vector of 3 components. A 1 is placed to represent the presence of the class and a 0 to indicate its absence.
13. Let's practice!
Let's further explore these concepts as you build a multi-class model that predicts who threw which dart!