PaperSummary05 : DreamBooth

Poonam Saini
1 min readJan 5, 2025

--

The paper provides a fine-tuning approach for personalizing large scale text-to-image diffusion models. With just 3–5 reference images, the method allows generating new, photorealistic scenes of a subject in different contexts guided by textual prompts. The method introduces a unique identifier for the subject, binding it with the model’s existing semantic knowledge to allow diverse contextual generations.

The key steps of methodology are:

  1. Fine-tuning setup: It uses a pre-trained diffusion based text-to-image model. It pairs subject images with prompts containing a unique identifier and a class descriptor (e.g., “a [V] dog”). It fine-tunes the model to associate the identifier with the subject while leveraging the class prior.
  2. Rare token identifiers: It ensures minimal prior association with the language model, optimizing for subject-specific learning.
  3. Class-specific prior preservation loss: It regularizes the models to retain knowledge about the class and avoid overfitting to the subject’s few-shot examples. It also adds pose and view diversity for generated outputs.

DreamBooth allows for recontextualization, novel view synthesis and property modification. The limitations of this approach are that it struggles with highly complex subjects and may overfit when prompts mimic the reference image’s context too closely.

References:

--

--

Poonam Saini
Poonam Saini

Written by Poonam Saini

PhD Student, Research Associate @ Ulm University

Responses (1)