The Evidence Lower Bound
ELBO是证据下界的简称,证据指数据或可观测变量的概率密度
preliminaries
Let $\mathbf{x}=\mathrm x_{1:n}$ be a set of observed variables and $\mathbf{z}=\mathrm z_{1:m}$ be a set of latent variables, with joint density $\mathrm{p(z,x)}$.
the conditional density is
$$
\mathrm{p(z|x)=\frac{p(z,x)}{p(x)}}
$$
The denominator contains the marginal density of the observations, also called the evidence.We calculate it by marginalizing out the latent variables from the joint density
$$
\mathrm{p(x)=\int p(\mathbf{z,x})}d\mathbf{z}
$$
计算上述$\mathrm{p(x)}$很困难
变分推断variational inference
变分推断是为了近似难以计算的潜在变量在观测变量下的条件概率
Rather than use sampling, the main idea behind variational inference is to use optimization.
we specify a family $\mathscr{Q}$ of densities over the latent variables. Each $q(\mathbf{z}) \in \mathscr{Q}$ is a candidate approximation to the exact conditional. Our goal is to find the best candidate, the one closest in KL divergence to the exact conditional.
也就是说,变分推断的任务就是优化$q(\mathbf{z})$,使得$q(\mathbf{z})$与$\mathrm{p(z|x)}$的KL散度最小
$$
q^*(\mathbf{z})=\arg \min_{q(\mathbf{z}) \in \mathscr{Q}} kl(q(\mathbf{z})||\mathrm{p(z|x)})
$$
KL散度
$$
kl(q(\mathbf{z})||\mathrm{p(z|x)})=\mathbb{E}[\log q(\mathbf{z})]-\mathbb{E}[\log \mathrm{p(z|x)}]
$$
根据上面的式子可以改写为
$$
kl(q(\mathbf{z})||\mathrm{p(\mathbf{z|x})})=\mathbb{E}[\log q(\mathbf{z})]-\mathbb{E}[\log \mathrm{p(\mathbf{z,x})}]+\log\mathrm{p(\mathbf{x})}
$$
由于无法直接计算KL,而KL散度需要大于0,因此设置elbo
$$
elbo(q)=\mathbb{E}[\log \mathrm{p(\mathbf{z,x})}-\mathbb{E}[\log q(\mathbf{z})]
$$
$$
=\mathbb{E}[\log \mathrm{p(\mathbf{z})}]+\mathbb{E}[\log \mathrm{p(\mathbf{x|z})}]-\mathbb{E}[\log q(\mathbf{z})]
$$
$$
=\mathbb{E}[\log \mathrm{p(\mathbf{x|z})}]-kl(q(\mathbf{z})||p(\mathbf{z}))
$$
$$
elbo=-kl+\log\mathrm{p(\mathbf{x})}
$$
$$
\log p(\mathrm{x})=kl(q(\mathbf{z})||\mathrm{p(z|x)})+elbo(q) \ge elbo(q)
$$
上式就说明了为什么叫elbo,elbo就是证据$\log p(\mathrm{x})$的下界
任务到了最后就是求
$$
q^*(\mathbf{z})=\arg \min_{q(\mathbf{z}) \in \mathscr{Q}} kl(q(\mathbf{z})||\mathrm{p(z|x)})
$$
$$
=\arg \max_{q(\mathbf{z}) \in \mathscr{Q}} elbo(q)
$$
参考文献
Variational Inference: A Review for Statisticians