Carlos Hernández Oliván

2. Noise Perturbations

2.1. Noise Perturbations, VP and VE

Both Score matching with Langevin dynamics (SMLD) and Denoising Diffusion Probabilistic Models (DDPM) [Ho et al. 2020] add Noise progressively to the data over discrete time steps.
Let's derive the variance exxploding and perserving continuous-time stochastic differential equations (SDEs) for the noise perturbations from the discrete expressions.

2.1.1. Variance Exploding Stochastic Differential Equation (VE-SDE)

Variance Exploding (VE) is used in SMLD and it constists of progressively increasing the variance of the noise over time. This paper leverages VE-SDE to create robust denoising trajectories and stable training. But how is this done? We will define two equations of the stochastic process with standard deviation $σ$ and variance $β$ of the noise distribution. This will allow us to control the noise scale over time with variance exploding or variance preserving noise.
First, let's consider a Markov chain of $N$ steps, each of them containing a noise scale with perturbation kernels $p_{σ_{i}} (x | x_{o})$ or, in other words, the probability distributions of the noisy observations $x$ given the data point $x_{0}$ , where $i$ is the time step index. The state at the $i$ -th time step is given by the Markov chain: $\begin{matrix} (Eq. 1) & x_{i} = x_{i - 1} + \sqrt{σ_{i}^{2} - σ_{i - 1}^{2}} z_{i - 1}, i = 1, \dots, N \end{matrix}$ where:

$z_{i - 1} \sim N (0, I)$ is a standard Gaussian noise.
$σ_{0} = 0$ because it is the noise level or perturbation at the beginning of the process.
$x_{0} \sim p_{d a t a}$ is the dataset distribution.

If we have infinite time steps,

N \to \infty

, the Markov chain

. Let's define an infinitesimal continuous time step

Δ t

and rewrite rewrite Eq. 1 as:

\begin{matrix} (Eq. 2) & x (t + Δ t) = x (t) + \sqrt{σ^{2} (t + Δ t) - σ^{2} (t)} z (t) \overset{1st-order Taylor expansion}{\approx} x (t) + \sqrt{\frac{d [σ^{2} (t)]}{d t}} Δ t z (t), \end{matrix}

Note that we approximate the continuous incremental standard deviation of the noise

\sqrt{σ^{2} (t + Δ t) - σ^{2} (t)}

to the term

\sqrt{\frac{d [σ^{2} (t)]}{d t}} d w

with the 1st-order Taylor expansion which assumes a small

Δ t

. The variable

w

determines the random fluctations of the noise scale over time, which is also called the Wiener process or Brownian motion. Eq. 2 is the Variance Exploding Stochastic Differential Equation (VE-SDE). The naming of VE-SDE comes from the fact that the variance of the noise increases over time, growing undboundly.

2.1.2. Variance Preserving Stochastic Differential Equation (VP-SDE)

Variance Preserving (VP) is used in DDPM and it constists of keeping the variance of the noise constant over time. For VE-SDE, we define $σ$ as the standard deviatiaon of the noise distribution. In a stochastic process, we can also define $β$ as the variance scale of the noise. In VP-SDE, the Markov chain for the perturbation kernels ${p_{α_{i}} (x | x_{0})}_{i = 1}^{N}$ where $α = 1 - β$ , can be written as: $\begin{matrix} (Eq. 3) & x_{i} = \sqrt{1 - β_{i}} x_{i - 1} + \sqrt{β_{i}} z_{i - 1}, i = 1, \dots, N \end{matrix}$ Eq. 3 is the same as Eq. 1 but written in terms of the variance scale $β$ of the noise distribution. As we did with VE-SDE, we would like to obtain the continuous time equation for VP-SDE. First, let's again consider a Markov chain of $N$ steps $N \to \infty$ . In the case of variance, we define a set of auxiliary noise scales ${{\hat{β}}_{i} = N β_{i = 1}^{N}}$ to prevent variance from exploding. We can rewrite Eq. 3 as: $\begin{matrix} (Eq. 4) & x_{i} = \sqrt{1 - \frac{{\hat{β}}_{i}}{N}} x_{i - 1} + \sqrt{\frac{{\hat{β}}_{i}}{N}} z_{i - 1}, i = 1, \dots, N \end{matrix}$ In ... We can rewrite Eq. 4 as: $\begin{aligned} (Eq. 5.1) & x (t + Δ t) & = \sqrt{1 - β (t + Δ t) Δ t} x (t) + \sqrt{β (t + Δ t) Δ t} z (t) \\ (Eq. 5.2) & \overset{1st-order Taylor expansion}{\approx} x (t) - \frac{1}{2} β (t + Δ t) Δ t x (t) + \sqrt{β (t + Δ t) Δ t} z (t) \\ (Eq. 5.3) & \overset{if Δ t is small \to β (t + Δ t) \approx β (t)}{\approx} x (t) - \frac{1}{2} β (t) Δ t x (t) + \sqrt{β (t) Δ t} z (t) \end{aligned}$ where $\sqrt{β (t) Δ t} z (t)$ is the noise level at time $t$ .
The limit when $Δ t \to 0$ of Eq. 5.3 is the Variance Preserving Stochastic Differential Equation (VP-SDE) expressed as: $\begin{matrix} (Eq. 6) & d x = - \frac{1}{2} β (t) x d t + \sqrt{β (t)} d w (t) \end{matrix}$

	VE-SDE	VP-SDE
Discrete SDE	$x_{i} = x_{i - 1} + \sqrt{σ_{i}^{2} - σ_{i - 1}^{2}} z_{i - 1}$	$x_{i} = \sqrt{1 - {\hat{β}}_{i}} x_{i - 1} + \sqrt{{\hat{β}}_{i}} z_{i - 1}$
Continuous SDE	$x (t + Δ t) = x (t) + \sqrt{σ^{2} (t + Δ t) - σ^{2} (t)} z (t)$	$x (t + Δ t) = x (t) - \frac{1}{2} β (t) Δ t x (t) + \sqrt{β (t) Δ t} z (t)$
dx	$d x = \sqrt{\frac{d [σ^{2} (t)]}{d t}} d w$	$d x = - \frac{1}{2} β (t) x d t + \sqrt{β (t)} d w (t)$

Paper Title

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

In International Conference on Learning Representations (ICLR), 2021

Table of Contents

1. Introduction

2. Noise Perturbations

3. Sampling

1. Introduction and Background

1.1. Denoising Score Matching with Langevin Dynamics

2. Noise Perturbations

2.1. Noise Perturbations, VP and VE

2.1.1. Variance Exploding Stochastic Differential Equation (VE-SDE)

2.1.2. Variance Preserving Stochastic Differential Equation (VP-SDE)

3. Probability Flow ODE

3. Sampling