KYCNA

KL Divergence (Normal Dist vs Standard Normal Dist)

Variational Autoencoders (VAEs) are very popular due to their abilities in data generation. A classic VAE model has two terms in its loss function: reconstruction error and KL-divergence. While the first term provides a similar output for the given input, the second term approximates the made-up posterior to the prior.

In general, this KL-divergence term formulated as below:

\(\frac{1}{2}\left( -\log{\sigma^2}-1+\mu^2+\sigma^2 \right)\)

The ones who applied method without knowing the details might not be aware but it is not a random formula. In VAE, we try to make our approximated posterior \(q_{\phi}(z|x)\) as close as possible to the prior \(p_{\theta}(z)\). In most convenient configuration, posterior is assumed to follow normal distributions with a mean \(\mu\) and standard deviation \(\sigma\) while the prior is a standard normal distribution.

\(q_{\phi}(z|x) \sim N(\mu,\sigma^2)\)   \(p_{\theta}(z) \sim N(1,0)\)

The KL-divergence between this two distribution becomes:

\(KL(q_{\phi}(z|x) \Vert p_{\theta}(z))\)

\(= E_{q_{\phi}(z|x)}\left[\log q_{\phi}(z|x)\right]-E_{q_{\phi}(z|x)}\left[\log p_{\theta}(z) \right]\) (1)

\(= E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}\sigma}\exp{-\frac{(z-\mu)^2}{2\sigma^2}} \right]-E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}}\exp{-\frac{z^2}{2}} \right]\)   (2)

\(= E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}\sigma} \right]+E_{q_{\phi}(z|x)}\left[-\frac{(z-\mu)^2}{2\sigma^2} \right]-E_{q_{\phi}(z|x)}\left[\log \frac{1}{\sqrt{2\pi}} \right]-E_{q_{\phi}(z|x)}\left[-\frac{z^2}{2} \right]\)   (3)

\(= \log \frac{1}{\sqrt{2\pi}\sigma}+E_{q_{\phi}(z|x)}\left[\left(-\frac{(z-\mu)}{2\sigma}\right)^2 \right]-\log \frac{1}{\sqrt{2\pi}}-E_{q_{\phi}(z|x)}\left[-\frac{z^2}{2} \right]\)   (4)

\(= -\frac{\log{\sigma^2}}{2}-\frac{E_{q_{\phi}(z|x)}\left[\left(\frac{(z-\mu)}{\sigma}\right)^2 \right]}{2}+\frac{E_{q_{\phi}(z|x)}\left[z^2 \right]}{2}\) (5)

\(= -\frac{\log{\sigma^2}}{2}-\frac{\left(E_{q_{\phi}(z|x)}\left[\frac{(z-\mu)}{\sigma} \right]\right)^2}{2}-\frac{Var_{q_{\phi}(z|x)}\left[\frac{(z-\mu)}{\sigma} \right]}{2}+\frac{\left(E_{q_{\phi}(z|x)}\left[z \right]\right)^2}{2}+\frac{Var_{q_{\phi}(z|x)}\left[z \right]}{2}\) (6)

\(= -\frac{\log{\sigma^2}}{2}-\frac{1}{2}+\frac{\mu^2}{2}+\frac{\sigma^2 }{2}\) (7)

\(= \frac{1}{2}\left( -\log{\sigma^2}-1+\mu^2+\sigma^2 \right)\) (8)

statistics machine-learning

K-Means Clustering Simulation

Tags machine-learning

Mean Squared Error

Tags statistics data-analysis data-science machine-learning probability

Gum Wrappers

Tags statistics probability