PaperSummary21 : Mask R-CNN

Poonam Saini
2 min readJan 21, 2025

--

The paper presents a framework for object instance segmentation, named Mask R-CNN, extending Faster R-CNN by adding a branch for pixel-wise segmentation masks. Its design integrates classification, bounding box regression and mask prediction in a flexible, efficient manner supporting generalization to additional tasks like human pose estimation.

The keys steps are :

  1. It operates in a two stage process, beginning with a Region Proposal Network (RPN) that generates candidate bounding boxes followed by a mask branch that predicts segmentation masks.
  2. A multi-task loss function is used which combines classification loss,, bounding-box, regression loss and mask-specific loss to optimize the network’s performance.
  3. Fully Convolutional Networks (FCN) are utilized in the mask prediction branch, preserving the spatial layout of objects and ensuring pixel to pixel correspondence.
  4. RoIAlign is introduced as a replacement for RoIPool, addressing spatial misalignment issues by maintaining precise alignment of extracted features with input images, thereby enhancing pixel level accuracy.
  5. The framework is evaluated using COCO dataset benchmark and incorporates advanced architectures like ResNet and ResNeXt as backbones for feature extraction.
Image src : https://arxiv.org/pdf/1703.06870

Mask R-CNN achieved impressive results in instance segmentation and object detection surpassing complex prior approaches (then). Its simple and effective framework with improvements like RoIAlign and flexible mask handling shows robustness and adaptability.

References:

--

--

Poonam Saini
Poonam Saini

Written by Poonam Saini

PhD Student, Research Associate @ Ulm University

No responses yet