PaperSummary21 : Mask R-CNN

2 min readJan 21, 2025

The paper presents a framework for object instance segmentation, named Mask R-CNN, extending Faster R-CNN by adding a branch for pixel-wise segmentation masks. Its design integrates classification, bounding box regression and mask prediction in a flexible, efficient manner supporting generalization to additional tasks like human pose estimation.

The keys steps are :

It operates in a two stage process, beginning with a Region Proposal Network (RPN) that generates candidate bounding boxes followed by a mask branch that predicts segmentation masks.
A multi-task loss function is used which combines classification loss,, bounding-box, regression loss and mask-specific loss to optimize the network’s performance.
Fully Convolutional Networks (FCN) are utilized in the mask prediction branch, preserving the spatial layout of objects and ensuring pixel to pixel correspondence.
RoIAlign is introduced as a replacement for RoIPool, addressing spatial misalignment issues by maintaining precise alignment of extracted features with input images, thereby enhancing pixel level accuracy.
The framework is evaluated using COCO dataset benchmark and incorporates advanced architectures like ResNet and ResNeXt as backbones for feature extraction.

Image src : https://arxiv.org/pdf/1703.06870

Mask R-CNN achieved impressive results in instance segmentation and object detection surpassing complex prior approaches (then). Its simple and effective framework with improvements like RoIAlign and flexible mask handling shows robustness and adaptability.

References:

Mask R-CNN

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach…

arxiv.org

GitHub - facebookresearch/Detectron: FAIR's research platform for object detection research…

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet…

github.com

PaperSummary21 : Mask R-CNN

Mask R-CNN

We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach…

GitHub - facebookresearch/Detectron: FAIR's research platform for object detection research…

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet…

Written by Poonam Saini

No responses yet