Object detection using R-CNN
1. Object detection using R-CNN
In this video, we will apply what we have learned about bounding boxes and detect objects using R-CNN models.2. Region-based CNN family: R-CNN
R-CNN is a family of region-based convolutional models for object detection. These models consist of three modules. First, the R-CNN model generates many region proposals. These are potential bounding boxes that might contain objects.3. Region-based CNN family: R-CNN
The second module uses a convolutional neural network to extract features from each region.4. Region-based CNN family: R-CNN
In the third module, features from each region proposal are used to predict the class and bounding box for that region.5. R-CNN: backbone
Using a pre-trained model as the backbone is a common strategy. The term backbone refers to the core CNN architecture responsible for feature extraction. The backbone consists of multiple layers of convolutions and pooling operations. These layers extract features for region proposal and object detection.6. R-CNN: backbone with PyTorch
Let's use the backbone from the pre-trained classification model called VGG16. We import torch.nn and the VGG16 model together with weights from torchvision.models, and initialize the model with default pre-trained weights. The VGG model has a features block, a pooling layer for size reduction, and a classifier block with fully connected layers.7. R-CNN: backbone with PyTorch
Our goal is to re-use only the features block from the pre-trained VGG model. The dot-features attribute provides access to these convolutional layers.8. R-CNN: backbone with PyTorch
The children method returns all layers of the features block.9. R-CNN: backbone with PyTorch
To extract the backbone, we convert all layers from the features block into a list and pass to a new sequential block.10. R-CNN: classifier layer
Let's define the classifier layer. It comes on top of the backbone, so its input size must match the backbone's output size. To extract the output size of the VGG backbone, we create a list of all layers in model's original classifier block. We extract the first layer from the list using the index zero and dot-in-features and store this value as input_dimension. Now, we define a new classifier sequential block with two linear layers and the relu activation. The first layer's input dimension is what we defined earlier. The last linear layer has a number of classes as output size.11. R-CNN: box regressor layer
Finally, let's define regressor to predict bounding box coordinates. Similarly to the classifier, it also sits on top of the backbone, so we use the same input size. We define a sequential block with two linear layers and the relu activation. In features is set to the input dimension from the backbone. The second linear layer has an output equal to four, representing the four coordinates to predict.12. Putting it all together: object detection model
Let's put the backbone, classifier and box regressor together into a one model called ObjectDetectorCNN. In the init method, we extract the VGG16 backbone and assign it to self.backbone. Next, we identify the input shape required for the classifier and regressor, and define both of them just like we have seen before.13. Putting it all together: object detection model
We also define the forward method that passes the input through the backbone to extract features. It then processes features separately using the classifier and the bounding box regressor to obtain the two outputs.14. Running object recognition
With the model at hand, let's recap how to run object recognition for an image. We start by loading and transforming an image to a tensor. Remember to unsqueeze it in order to add the batch dimension. Next, we pass the image tensor to the model and run non-max suppression over model's output Finally, we can draw the bounding box on top of the image.15. Let's practice!
It's your turn to build the R-CNN!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.