PaperSummary04 : Fine-tuning diffusion models with limited data
The paper proposes Adapter-Augmented Attention Fine-tuning (A3FT) method for efficient fine-tuning of diffusion models on small datasets with minimal overfitting and faster convergence. When fine-tuning pre-trained diffusion models with limited data, overfitting is a critical issue and the A3FT solution involves fine-tuning only attention blocks, not all parameters. It introduces a time-aware adapter module to the pre-trained model for improved learning.
The key points are:
- Time-Fusion Module: Time aware adapter combines feature vectors with time embeddings.
- Cross-Attention Module: It applies attention with time-aware features.
- Time-Scaling Module: It regulates adapter influence based on time steps.
- It adds minimal computational overhead of only ~1.8% of model parameters.
- The proposed method outperforms naive fine-tuning and attention block fine-tuning in terms of cFID and KID scores.
A3FT method provides a faster, and efficient method for generating high-quality images while addressing overfitting and computational challenges.
References: