PaperSummary14 : SupMAE

1 min readJan 14, 2025

The paper introduces SupMAE, a novel approach to supervised pre-training of vision transformers by integrating supervised learning into the Masked Autoencoder (MAE) framework. Traditional MAE is self-supervised which reconstructs masked image patches to learn representations but lacks global feature learning. SupMAE addresses this limitation by incorporating a supervised classification branch, enabling the learning of both local and global features from golden labels (i.e., class labels). This approach uses only a subset of visible image patches for classification, enhancing efficiency.

The methodology steps are:

Framework: SupMAE adds a supervised classification branch alongside the reconstruction objective of MAE. Reconstruction branch reconstructs missing pixels using a lightweight decoder, focusing on local features. Classification branch uses visible patches for supervised classification, enabling global features.
Pre-training Objectives: A weighted combination of reconstruction and classification losses is used, balancing local and global feature learning. Random masking works as a form of augmentation ensuring efficiency and robust feature learning.
Training and Fine-Tuning: Pre-training involves processing 25% of the image patches for classification and reconstructing the rest. During fine-tuning, the encoder (trained on masked patches) is used with uncorrupted images for downstream tasks.

Overall, this hybrid approach shows efficiency, superior performance in few-shot and dense prediction tasks and better transferability.

References:

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners

Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive…

arxiv.org

GitHub - enyac-group/supmae: This is a offical PyTorch/GPU implementation of SupMAE.

This is a offical PyTorch/GPU implementation of SupMAE. - enyac-group/supmae

github.com

PaperSummary14 : SupMAE

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners

Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive…

GitHub - enyac-group/supmae: This is a offical PyTorch/GPU implementation of SupMAE.

This is a offical PyTorch/GPU implementation of SupMAE. - enyac-group/supmae

Written by Poonam Saini

No responses yet