Note
This post serves as a personal study note (from a mathematical viewpoint) for the diffusion-based generative models. It is not guaranteed all the post are correct.
🏷️ Introduction
The diffusion model implements an encode/decode process through adding/removing (Gaussian) noises.
Intuitively, the process of adding noises is exactly simulating diffusion through the Brownian motion, the starting distribution of particles will eventually converge to a Gaussian distribution in the end.
The backward process is a bit counter-intuitive. The denoising process sounds like solving the inverse heat equation towards the initial state, which will be super unstable numerically unless the terminal state (density) is in the Gevrey class . Therefore, it seems the more forward steps are taken, the better quality one can expect in the backward process.