Get startedGet started for free

Batch size and batch normalization

1. Batch size and batch normalization

It’s time to learn the concepts of batch size and batch normalization.

2. Batches

A mini-batch is a subset of data samples. If we were training a neural network with images, each image in our training set would be a sample and we could take mini-batches of different sizes from the training set batch.

3. Mini-batch

Remember that during an epoch we feed our network, calculate the errors and update the network weights. It's not very practical to update our network weights only once per epoch after looking at the error produced by all training samples. In practice, we take a mini-batch of training samples. And that way, if our training set has 9 images and we choose a batch_size of 3, we will perform 3 weight updates per epoch, one per mini-batch.

4. Mini-batches

Networks tend to train faster with mini-batches since weights are updated often. Sometimes datasets are so huge that they would struggle to fit in RAM memory if we didn't use mini-batches. Also, the noise produced by a small batch-size can help escape local minima. A couple of disadvantages are the need for more iterations and finding a good batch size.

5. Effects of batch sizes

Here you can see how different batch sizes converge towards a minimum as training goes by. Training with all samples is shown in blue. Mini-batching is shown in green. Stochastic gradient descent, in red, uses a batch_size of 1. We can see how the path towards the best value for our weights is noisier the smaller the batch_size. They reach the same value after a different number of iterations.

6. Batch size in Keras

You can set your own batch_size with the batch_size parameter on the model's fit method. Keras uses a default batch-size of 32. Increasing powers of two tend to be used. As a rule of thumb, you tend to make your batch size bigger the bigger your dataset.

7. Normalization in machine learning

Normalization is a common pre-processing step in machine learning algorithms, especially when features have different scales. One way to normalize data is to subtract its mean value and divide by the standard deviation. We always tend to normalize our model inputs. This avoids problems with activation functions and gradients.

8. Normalization in machine learning

This leaves everything centered around 0 with a standard deviation of 1.

9. Reasons for batch normalization

Normalizing neural networks inputs improve our model. But deeper layers are trained based on previous layer outputs and since weights get updated via gradient descent, consecutive layers no longer benefit from normalization and they need to adapt to previous layers' weight changes, finding more trouble to learn their own weights. Batch normalization makes sure that, independently of the changes, the inputs to the next layers are normalized. It does this in a smart way, with trainable parameters that also learn how much of this normalization is kept scaling or shifting it.

10. Batch normalization advantages

This improves gradient flow, allows for higher learning rates, reduces weight initializations dependence, adds regularization to our network and limits internal covariate shift; which is a funny name for a layer's dependence on the previous layer outputs when learning its weights. Batch normalization is widely used today in many deep learning models.

11. Batch normalization in Keras

Batch normalization in Keras is applied as a layer. So we can place it in between two layers. We import batch normalization from tensorflow.keras.layers. We then instantiate a sequential model, add an input layer, and then add a batch normalization layer. We finalize this binary classification model with an output layer.

12. Let's practice!

Let's practice!