Deep Convolutional GAN

1. Deep Convolutional GAN

Convolutional layers provide better results when processing images than basic linear layers. Let's learn to use them in GANs!

2. Deep Convolutional GAN intuition

To make a GAN more effective for image data, we could replace the linear layers in the discriminator with convolutional layers. In the generator, in order to upsample feature maps, we could use transposed convolutional layers which we have already seen in the U-Net architecture for semantic segmentation. Unfortunately, it's not that simple. Training GANs is often unstable, and simply swapping linear layers for convolutions is not enough, as more adjustments are needed.

3. DCGAN guidelines

Deep Convolutional GAN, or DCGAN for short, is a famous GAN architecture making use of convolutions. In order to stabilize the training, DCGAN authors suggest following some guidelines. Only strided convolutions are used, which we will discuss shortly. There are no linear or pooling layers, but batch normalization is employed after the convolutions. In the generator, ReLU activations are applied, except for the final layer which uses a tanh activation. Throughout the discriminator, Leaky ReLU activations are used. We will see how to implement these guidelines in practice in a moment. First, let's discuss strided convolutions.

4. Strided convolution

A typical convolution is of stride one. This means that as the kernel slides over the input feature map, it shifts by one pixel at a time. Convolutions with any stride above one are referred to as strided. With a stride of two, for example, the kernel shifts two pixels at a time, both left and down. In PyTorch, we can set the stride of a convolution by passing the stride argument to nn.Conv2d.

5. Convolutional generator block

Just like with the basic GAN before, we will use custom generator and discriminator block functions to define our GAN. The generator block will consist of a transposed convolution to which we will pass a stride parameter, a batch normalization layer, and a ReLU activation.

6. Deep Convolutional Generator

Let's define the generator. As arguments, it accepts the input noise size, in_dim, kernel size, and stride. In the init method, we define a sequential block consisting of three generator blocks followed by a transposed convolution that produces three feature maps, corresponding to the three color channel of the generated image. Finally, we add a tanh activation. In the forward method, before we pass the input to the generator's sequential block, we reshape it with the view method. We make it a tensor of shape len(x), which corresponds to the batch size, by the size of the input noise, by one by one. This reshaping converts the one-dimensional noise vector into a shape compatible with the subsequent convolutional layers in the network.

7. Convolutional discriminator block

Let's look at the discriminator now. First, we define the custom dc-disc-block function. The discriminator block will consist of a strided convolution, a batch norm layer, and a leaky ReLU activation.

8. Deep Convolutional Discriminator

As usual, in the init method, we define the sequential block. It consists of two discriminator blocks we have defined earlier, followed by a convolutional layer that produces the output of size one. Recall this corresponds to the discriminator's prediction of whether its input is a real or a fake image. In the forward method, we pass the input to the discriminator's sequential block. Before we return its output, we reshape it with the view method to len-x, corresponding to the batch size by -1 in order to flatten the output of the convolutional layer.

9. Let's practice!

Now it's your turn to build a Deep Convolutional GAN!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.