ELBO

科研

发布日期: 2023-10-26

文章字数: 545

阅读时长: 2 分

The Evidence Lower Bound

ELBO是证据下界的简称，证据指数据或可观测变量的概率密度

preliminaries

Let $\mathbf{x}=\mathrm x_{1:n}$ be a set of observed variables and $\mathbf{z}=\mathrm z_{1:m}$ be a set of latent variables, with joint density $\mathrm{p(z,x)}$.

the conditional density is
$$
\mathrm{p(z|x)=\frac{p(z,x)}{p(x)}}
$$

The denominator contains the marginal density of the observations, also called the evidence.We calculate it by marginalizing out the latent variables from the joint density

$$
\mathrm{p(x)=\int p(\mathbf{z,x})}d\mathbf{z}
$$

计算上述$\mathrm{p(x)}$很困难

变分推断variational inference

变分推断是为了近似难以计算的潜在变量在观测变量下的条件概率

Rather than use sampling, the main idea behind variational inference is to use optimization.

we specify a family $\mathscr{Q}$ of densities over the latent variables. Each $q(\mathbf{z}) \in \mathscr{Q}$ is a candidate approximation to the exact conditional. Our goal is to find the best candidate, the one closest in KL divergence to the exact conditional.

也就是说，变分推断的任务就是优化$q(\mathbf{z})$，使得$q(\mathbf{z})$与$\mathrm{p(z|x)}$的KL散度最小
$$
q^*(\mathbf{z})=\arg \min_{q(\mathbf{z}) \in \mathscr{Q}} kl(q(\mathbf{z})||\mathrm{p(z|x)})
$$

KL散度

$$
kl(q(\mathbf{z})||\mathrm{p(z|x)})=\mathbb{E}[\log q(\mathbf{z})]-\mathbb{E}[\log \mathrm{p(z|x)}]
$$

根据上面的式子可以改写为
$$
kl(q(\mathbf{z})||\mathrm{p(\mathbf{z|x})})=\mathbb{E}[\log q(\mathbf{z})]-\mathbb{E}[\log \mathrm{p(\mathbf{z,x})}]+\log\mathrm{p(\mathbf{x})}
$$
由于无法直接计算KL，而KL散度需要大于0，因此设置elbo
$$
elbo(q)=\mathbb{E}[\log \mathrm{p(\mathbf{z,x})}-\mathbb{E}[\log q(\mathbf{z})]
$$

$$
=\mathbb{E}[\log \mathrm{p(\mathbf{z})}]+\mathbb{E}[\log \mathrm{p(\mathbf{x|z})}]-\mathbb{E}[\log q(\mathbf{z})]
$$

$$
=\mathbb{E}[\log \mathrm{p(\mathbf{x|z})}]-kl(q(\mathbf{z})||p(\mathbf{z}))
$$

$$
elbo=-kl+\log\mathrm{p(\mathbf{x})}
$$

$$
\log p(\mathrm{x})=kl(q(\mathbf{z})||\mathrm{p(z|x)})+elbo(q) \ge elbo(q)
$$

上式就说明了为什么叫elbo，elbo就是证据$\log p(\mathrm{x})$的下界

任务到了最后就是求
$$
q^*(\mathbf{z})=\arg \min_{q(\mathbf{z}) \in \mathscr{Q}} kl(q(\mathbf{z})||\mathrm{p(z|x)})
$$

$$
=\arg \max_{q(\mathbf{z}) \in \mathscr{Q}} elbo(q)
$$