VAE MNIST

28 May 2020

Reading time ~2 minutes

VAE MNIST

Vanilla VAE

Loss function (Negative ELBO, NELBO)
- \[l_i(\theta, \phi) = - \mathbb{E}_{z\sim q_\theta(z\mid x_i)}[\log p_\phi(x_i\mid z)] + \mathbb{KL}(q_\theta(z\mid x_i) \mid\mid p(z))\]
Reconstructed Loss (Negative Log Likelihood, NLL)
- \[- \mathbb{E}_{z\sim q_\theta(z\mid x_i)}[\log p_\phi(x_i\mid z)] = l_{i}(\tilde{x_i}, x_i)=-w_{i}\left[x_i \cdot \log \sigma\left(\tilde{x_i}\right)+\left(1-x_i\right) \cdot \log \left(1-\sigma\left(\tilde{x_i}\right)\right)\right]\]
- \[w_{i} = 1\]
KL Divergence
- \[\mathbb{KL}(q_\theta(z\mid x_i) \mid\mid p(z)) = \frac{1}{2}(\log{\sigma_{p}^2} - \log{\sigma_{q}^2} + \frac{\sigma_{q}^2}{\sigma_{p}^2} + \frac{(\mu_{q} - \mu_{p})^2}{\sigma_{p}^2} - 1)\]
- Assuming the approximate posterior, \(q_\theta(z\mid x_i) = \mathcal{N}{(\mu_{q}, \sigma_{q})}\)
- Set prior as fixed parameter, \(p(z) = \mathcal{N}{(\mu_{p} = 0, \sigma_{p} = 1)}\)

Gaussian Mixture VAE (GM-VAE)

Loss function (Negative ELBO, NELBO)
- same as above
Reconstructed Loss (Negative Log Likelihood, NLL)
- same as above
KL Divergence
- In vanilla VAE, we simply assume that the prior, \(p(z)\), is normally distributed. To get a better performance, in GM-VAE, we instead fit our posterior to \(p(z)\), which is a mixture of gaussian.
- \[\mathbb{KL}(q_\theta(z\mid x_i) \mid\mid p(z)) = \mathcal{LL}{(z, \mu_{q}, \sigma_{q})} - \sum_{j=1}^{k}{\pi_j \cdot \mathcal{LL}{(z, \mu_{p(j)}, \sigma_{p(j)})}}\]
- \(\mathcal{LL}\) means log likelihood
- The second term is log likelihood of gaussian mixture where \(k\) is the number of gaussian components in mixture. In the actual implementation, we simply set the mixture proportion \(\pi = \frac{1}{k}\) for all \(k\) mixture components.
- For each gaussian component, \(\mathcal{N}{(\mu_{p(j)}, \sigma_{p(j)})}\), \(\mu_{p(j)}\) and \(\sigma_{p(j)}\) are randomly sampled from a standard normal distribution, \(\mathcal{N}(0, 1)\) and are scaled down properly.

Experimental Results

Training Curves

VAE

reconstructed loss of vae REC loss of VAE

NELBO of VAE

As shown above, simply increasing the size of \(z\) is not always helpful on improving the performance of VAE. Instead, when using a relatively large \(z\) (i.e 1000), it takes much more itertaions (~8k) for the model to reconstruct images well.

GMVAE

reconstructed loss of gmvae REC loss of GMVAE

nelbo of gmvae NELBO of VAE

Test Results

N=20,000	NELBO	REC
VAE, z=10	98.735	79.388
VAE, z=20	96.104	71.958
VAE, z=50	95.947	70.720
VAE, z=100	95.962	70.753
VAE, z=1000	100.170	75.964

N=20,000, z=20	NELBO	REC
GMVAE, k=1	95.159	71.105
GMVAE, k=10	95.029	71.681
GMVAE, k=100	93.431	70.782
GMVAE, k=250	92.741	69.881
GMVAE, k=500	92.397	69.786

Reconstructed and Generated Samples

Reconstructed

VAE, z=20, N=20,000	GMVAE, z=20, k=500, N=20,000

Generated

VAE, z=20, N=20,000	GMVAE, z=20, k=500, N=20,000

GIFs


Random generated samples from VAE (output w/ Bernoulli)


Reconstructed samples from VAE given deterministic MNIST subset (N=20000, z=25)

Future Works

using IWAE bound
C-VAE
- Semi-supervised and Full-supervised VAE

Project