PaperSummary21 : Mask R-CNN
The paper presents a framework for object instance segmentation, named Mask R-CNN, extending Faster R-CNN by adding a branch for pixel-wise segmentation masks. Its design integrates classification, bounding box regression and mask prediction in a flexible, efficient manner supporting generalization to additional tasks like human pose estimation.
The keys steps are :
- It operates in a two stage process, beginning with a Region Proposal Network (RPN) that generates candidate bounding boxes followed by a mask branch that predicts segmentation masks.
- A multi-task loss function is used which combines classification loss,, bounding-box, regression loss and mask-specific loss to optimize the network’s performance.
- Fully Convolutional Networks (FCN) are utilized in the mask prediction branch, preserving the spatial layout of objects and ensuring pixel to pixel correspondence.
- RoIAlign is introduced as a replacement for RoIPool, addressing spatial misalignment issues by maintaining precise alignment of extracted features with input images, thereby enhancing pixel level accuracy.
- The framework is evaluated using COCO dataset benchmark and incorporates advanced architectures like ResNet and ResNeXt as backbones for feature extraction.
Mask R-CNN achieved impressive results in instance segmentation and object detection surpassing complex prior approaches (then). Its simple and effective framework with improvements like RoIAlign and flexible mask handling shows robustness and adaptability.
References: