UCSC-CRL-96-27: MUTUAL INFORMATION, METRIC ENTROPY, AND RISK IN ESTIMATION OF PROBABILITY DISTRIBUTIONS

12/01/1996 09:00 AM
Computer Science
Assume $\\{P_\\theta: \\theta \\in \\Theta\\}$ is a set of probability distributions with a common dominating measure on a complete separable metric space $Y$. A state $\\theta^* \\in \\Theta$ is chosen by Nature. A statistician gets $n$ independent observations $Y_1, \\ldots, Y_n$ distributed according to $P_{\\theta^*}$. For each time $t$ between 1 and $n$, based on the observations $Y_1, \\ldots, Y_{t- 1}$, the statistician produces an estimated distribution $Q_t$ for $P_ {\\theta^*}$, and suffers a loss equal to the average of L(P_{\\theta^*},Q_t) $ over the observations. The cumulative risk for the statistician is the average total loss up to time $n$. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical physics is the special case when the loss $L(P_ {\\theta^*},Q_t)$ is the relative entropy between the true distribution $P_ {\\theta^*}$ and the estimated distribution $Q_t$. Here the cumulative Bayes risk is the mutual information between the random parameter $\\Theta^*$ and the observations $Y_1, \\ldots, Y_n$. New bounds on this mutual information are given in terms of the Laplace transform of the Hellinger distance between distributions in $\\Theta$. From these, bounds on the cumulative minimax risk are given in terms of the metric entropy of $\\Theta$ with respect to the Hellinger distance. The assumptions required for these bounds are very general and do not depend on the choice of the dominating measure. They apply to both finite and infinite dimensional $\\Theta$. They apply in some cases where $Y$ is infinite dimensional, in some cases where $Y$ is not compact, in some cases where the distributions are not smooth, and in some parametric cases where asymptotic normality of the posterior distribution fails. Using these bounds for cumulative relative entropy risk, we also examine the Bayes and minimax risk of this game at specific times $t$ for various loss functions $L$, including the relative entropy, the squared Hellinger distance, and the $L_1$ distance.

UCSC-CRL-96-27