KL Divergence (Normal Dist vs Standard Normal Dist)
-
-
-
Variational Autoencoders (VAEs) are very popular due to their abilities in data generation. A classic VAE model has two terms in its loss function: reconstruction error and KL-divergence. While the first term provides a similar output for the given input, the second term approximates the made-up posterior to the prior.
In general, this KL-divergence term formulated as below:
\(\frac{1}{2}\left( -\log{\sigma^2}-1+\mu^2+\sigma^2 \right)\)
The ones who applied method without knowing the details might not be aware but it is not a random formula. In VAE, we try to make our approximated posterior \(q_{\phi}(z|x)\) as close as possible to the prior \(p_{\theta}(z)\). In most convenient configuration, posterior is assumed to follow normal distributions with a mean \(\mu\) and standard deviation \(\sigma\) while the prior is a standard normal distribution.
\(q_{\phi}(z|x) \sim N(\mu,\sigma^2)\) \(p_{\theta}(z) \sim N(1,0)\)
The KL-divergence between this two distribution becomes:
\(KL(q_{\phi}(z|x) \Vert p_{\theta}(z))\)
\(= E_{q_{\phi}(z|x)}\left[\log q_{\phi}(z|x)\right]-E_{q_{\phi}(z|x)}\left[\log p_{\theta}(z) \right]\) (1)
\(= E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}\sigma}\exp{-\frac{(z-\mu)^2}{2\sigma^2}} \right]-E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}}\exp{-\frac{z^2}{2}} \right]\) (2)
\(= E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}\sigma} \right]+E_{q_{\phi}(z|x)}\left[-\frac{(z-\mu)^2}{2\sigma^2} \right]-E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}} \right]-E_{q_{\phi}(z|x)}\left[-\frac{z^2}{2} \right]\) (3)
\(= \log \frac{1}{\sqrt{2\pi}\sigma}+E_{q_{\phi}(z|x)}\left[\left(-\frac{(z-\mu)}{2\sigma}\right)^2 \right]-\log \frac{1}{\sqrt{2\pi}}-E_{q_{\phi}(z|x)}\left[-\frac{z^2}{2} \right]\) (4)
\(= -\frac{\log{\sigma^2}}{2}-\frac{E_{q_{\phi}(z|x)}\left[\left(\frac{(z-\mu)}{\sigma}\right)^2 \right]}{2}+\frac{E_{q_{\phi}(z|x)}\left[z^2 \right]}{2}\) (5)
\(= -\frac{\log{\sigma^2}}{2}-\frac{\left(E_{q_{\phi}(z|x)}\left[\frac{(z-\mu)}{\sigma} \right]\right)^2}{2}-\frac{Var_{q_{\phi}(z|x)}\left[\frac{(z-\mu)}{\sigma} \right]}{2}+\frac{\left(E_{q_{\phi}(z|x)}\left[z \right]\right)^2}{2}+\frac{Var_{q_{\phi}(z|x)}\left[z \right]}{2}\) (6)
\(= -\frac{\log{\sigma^2}}{2}-\frac{1}{2}+\frac{\mu^2}{2}+\frac{\sigma^2 }{2}\) (7)
\(= \frac{1}{2}\left( -\log{\sigma^2}-1+\mu^2+\sigma^2 \right)\) (8)