PaperSummary14 : SupMAE

Poonam Saini
1 min readJan 14, 2025

--

The paper introduces SupMAE, a novel approach to supervised pre-training of vision transformers by integrating supervised learning into the Masked Autoencoder (MAE) framework. Traditional MAE is self-supervised which reconstructs masked image patches to learn representations but lacks global feature learning. SupMAE addresses this limitation by incorporating a supervised classification branch, enabling the learning of both local and global features from golden labels (i.e., class labels). This approach uses only a subset of visible image patches for classification, enhancing efficiency.

The methodology steps are:

  1. Framework: SupMAE adds a supervised classification branch alongside the reconstruction objective of MAE. Reconstruction branch reconstructs missing pixels using a lightweight decoder, focusing on local features. Classification branch uses visible patches for supervised classification, enabling global features.
  2. Pre-training Objectives: A weighted combination of reconstruction and classification losses is used, balancing local and global feature learning. Random masking works as a form of augmentation ensuring efficiency and robust feature learning.
  3. Training and Fine-Tuning: Pre-training involves processing 25% of the image patches for classification and reconstructing the rest. During fine-tuning, the encoder (trained on masked patches) is used with uncorrupted images for downstream tasks.

Overall, this hybrid approach shows efficiency, superior performance in few-shot and dense prediction tasks and better transferability.

References:

--

--

Poonam Saini
Poonam Saini

Written by Poonam Saini

PhD Student, Research Associate @ Ulm University

No responses yet