Get startedGet started for free

Evaluating GANs

1. Evaluating GANs

We're nearly there! Let's discuss how to evaluate a trained GAN.

2. Generating images

Since GANs produce visual outputs, the first step in evaluating a GAN is to inspect the images it generates. Let's do that! We will generate nine images. First, we create a random noise tensor of shape 9 by 16, where 16 is the noise input size we used during training. Then, with no gradients calculation, we pass the noise to gen, the generator, to obtain fake images. We can plot their shape. We have 9 images, each of 3 color channels and size 96 by 96 pixels. Let's visualize them. We iterate over the number of images to generate. For each, we extract the i-th image by slicing fake with square brackets and taking only the i-th element in the first dimension. Next, in order to visualize the image, we must rearrange its dimensions from color channel, height, width to height, width, color channel. We do that by calling the permute method on the image tensor and passing it the desired order of dimensions: 1, 2, and 0. Finally, we can plot the images.

3. GAN generations

Not bad! They do look a lot like pokemons from the training data. Some make the impression like they might be missing an eye or a leg, but in general, they are okay. But can we have a more precise evaluation method than this visual inspection?

4. Fréchet Inception Distance

A metric commonly used to evaluate GANs is the Fréchet Inception Distance, or FID for short. To understand how it works, we must first mention two related concepts: Inception and Fréchet Distance. Inception is a popular image classification model, while Fréchet distance is a distance measure between two probability distributions. Back to the Fréchet Inception Distance. FID uses a pre-trained Inception model to extract features from both the generated and real images. The extracted features are then used to calculate the mean and covariance for both sets of images (generated and real). These statistics encapsulate the distribution of features across the images. Finally, the FID is calculated using the Fréchet distance between the distributions of real and fake images, each parametrized with the mean and covariance calculated before. A lower FID score suggests that the distributions of generated and real images are closer in the feature space, indicating that the generated images are more similar to the training data and more diverse. While there are no specific guidelines for interpreting the FID scores, typically values below 10 and considered good.

5. FID in PyTorch

Let's compute FID in PyTorch. We start by importing FrechetInceptionDistance from torchmetrics.image.fid. We instantiate the metric passing it feature equals 64 as argument. This means that we want to use the sixty-fourth layer of the inception model for feature extraction but a different one can be used, too. Next, we update the metrics with the sample of fake images. To do this, we call the update method of the metric we defined and pass it the fake images. However, we first need to convert the pixel values to integers between 0 and 255, and the GAN has given us floats between 0 and 1. To fix that, we multiply the image tensor by 255 and call .to(torch.uint8) on it as we pass it to the update method. We also set real equals false to indicate we are passing fake images. Then, we perform a similar update with real images, this time passing real equals true. Finally, we call the compute method on the metric to get its value. 7.5 is pretty low, indicating high-quality and diverse generations.

6. Let's practice!

Let's see how well our GAN performs!