## Kullback-Leibler information and the consistency of the Hellinger metric.

Suppose $p_{\theta_0}$ is the true density of a random sample $X_1, ..., X_n$ while $p_{\theta}$ is the assumed model. The Kullback-Leibler distance is defined as

$K(p_{\theta}, p_{\theta_0})= E log \frac{p_{\theta_0}(X)}{p_{\theta} (X)}=\int log\left( \frac{p_{\theta_0}}{p_{\theta}}\right)p_{\theta_0}d\mu$

As we will show below the Kullback-Leibler information has a very useful property.

We know that $\frac{1}{2}log(w) \leq\sqrt{w}-1$$\forall w>0$. Hence

$\frac{1}{2}log\frac{p_{\theta}(x)}{p_{\theta_0}(x)} \leq \sqrt{\frac{p_{\theta}(x)}{p_{\theta_0}(x)}}-1$

so

$\frac{1}{2}K(p_{\theta}, p_{\theta_0})\geq 1- E\left( \sqrt{\frac{p_{\theta}(x)}{p_{\theta_0}(x)}}\right)$

notice that the rhs of the inequality can be rewritten as $1- \int p_{\theta}^{1/2} p_{\theta_0}^{1/2}d \mu$ which (since a density integrates to one) is equal to

$\frac{1}{2}\int p_{\theta}d \mu +\frac{1}{2}\int p_{\theta_0}d \mu -\int p_{\theta}^{1/2} p_{\theta_0}^{1/2} d \mu= \frac{1}{2}\int (p_{\theta}^{1/2}-p_{\theta_0}^{1/2})^2 d \mu=h^2(p_{\theta},p_{\theta_0})$

where

$h(p_{\theta},p_{\theta_0}) \equiv \left( \frac{1}{2} \int (p_{\theta}^{1/2}-p_{\theta_0}^{1/2})^2 d\mu\right)^{1/2}$

is the Hellinger metric.

We have thus just proved that $h^2(p_{\theta},p_{\theta_0})\leq \frac{1}{2}K(p_{\theta}, p_{\theta_0})$

Hence the convergence of the Kullback-Leibler information always yields consistency in the Hellinger metric.

——

References:

van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge University Press
van der Vaart (2000). Asymptotic Statistics. Cambridge University Press
S. Kullback and R. A. Leibler. On Information and Sufficiency. Ann. Math. Statist. Volume 22, Number 1 (1951), 79-86.