Diffusion models

Introduction

Diffusion models (DM’s for short) have taken the world by storm in the year of 2022–from Dalle 2 to Stable Diffusion. It started with unconditional image generation (generate an image after training on an IID image distribution) via DDPM’s as a step up from the unstable GAN’s. Dalle 2 connected large language modeling (via CLIP) and diffusion models (as compared to VQGAN in Dalle 1). Stable diffusion made this open source and more “stable” by using latent diffusion models where latent representations (from a pre-trained autoencoder) of images were connected to text.

Importantly, the above paragraph talks about the DDPM probablistic model framework to diffusion models, which in turn, have deep connections to variational autoencoders. There exists other generative paradigms such as:

  • score-based generative modeling
  • stochastic differential equations

Song’s work (thesis) provides a good overview of the above. See Song’s blog post for an overview. In addition of course, are the old paradigms of generative learning:

  • GAN’s
  • VAE’s

Connecting text to image

The main source of hype around diffusion models came when users could generate impressive, realistic images given any description they could imagine.