Panoptic segmentation

1. Panoptic segmentation

Welcome back! It's time to look at the last of the three segmentation types: panoptic segmentation.

2. Panoptic segmentation challenge

Consider this image of cabs on a New York street that we would like to segment. We can perform semantic segmentation to detect the cabs, the buildings behind them, and the street. This is not very useful since it doesn't allow us to distinguish between particular cabs. Let's try instance segmentation instead. Now, we can isolate each cab, but we lost the distinction between the street and the buildings which are now combined together as one background. What we need is a panoptic mask where all the cabs are isolated while the street and the buildings are two separate classes. How can we achieve that?

3. Panoptic segmentation workflow

A common approach is to combine the outputs of semantic and instance segmentation. However, creating a proper panoptic mask can be complex and involves dealing with overlaps and ensuring unique IDs are assigned to instances. Here, we discuss a straightforward workflow. First, we generate semantic masks with a U-Net and combine them into a single mask with most likely class for each pixel. This mask will serve as the basis of our panoptic mask. Next, we generate instance masks with Mask R-CNN. Then, we iterate over these instance masks. For each of them, wherever an object is detected with high certainty, we overlay it onto the semantic mask. Let's see how this process looks like in practice.

4. Semantic masks

To generate semantic masks, we load a U-Net and pass it the image tensor. From the shape of the output, we can see that there are three classes, likely corresponding to the cars, the buildings, and the street. We combine these masks into a single mask using torch.argmax along the second dimension, passing dim equal to one. This means that for each pixel, we keep the ID of the class it belongs to with the highest probability. The image shows the semantic mask allowing us to distinguish between the street, buildings, and cars with unique colors defining each; but not between individual cars.

5. Instance masks

Let's look at instance masks now. Similarly to what we have done before, we load the model - this time, Mask R-CNN for instance segmentation. We pass it the image tensor and print the output's shape. From it, we can see that the model has identified 80 different instance classes. A quick look tells us that the cars are isolated correctly, but no differentiation between the street and the buildings is made.

6. Panoptic masks

Let's combine what we have done into a single panoptic mask. We start by instantiating it as the semantic mask. We can do this by copying the semantic mask tensor using torch.clone. Next we iterate over the instance masks. We know that the semantic mask has three classes (cars, street, and buildings) which means it takes the values of 0, 1, and 2. Therefore, we will start labeling the instance classes starting from 3 to avoid collisions. For each instance mask, we find the location where it is larger than the arbitrary threshold of 0.5, indicating a high probability of an object. At this location, we overwrite the panoptic mask with the current instance ID. Finally, we increase the instance ID by 1 for the next iteration of the loop. Visual inspection of the resulting panoptic mask tells us that the method worked fine: we can distinguish both individual vehicles as well the road in the buildings in the background.

7. Let's practice!

It's your turn to create a panoptic segmentation mask!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.