Density Estimation in Infinite Dimensional Exponential Families

Bharath Sriperumbudur; Kenji Fukumizu; Arthur Gretton; Aapo Hyv\"{a}rinen; Revant Kumar

In this paper, we consider an infinite dimensional exponential family

$\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space

$\mathcal{H}$ , and show it to be quite rich in the sense that a broad class of densities on

$\mathbb{R}^d$ can be approximated arbitrarily well in Kullback-Leibler (KL) divergence by elements in

$\mathcal{P}$ . Motivated by this approximation property, the paper addresses the question of estimating an unknown density

$p_0$ through an element in

$\mathcal{P}$ . Standard techniques like maximum likelihood estimation (MLE) or pseudo MLE (based on the method of sieves), which are based on minimizing the KL divergence between

$p_0$ and

$\mathcal{P}$ , do not yield practically useful estimators because of their inability to efficiently handle the log-partition function. We propose an estimator

$\hat{p}_n$ based on minimizing the Fisher divergence,

$J(p_0\Vert p)$ between

$p_0$ and

$p\in \mathcal{P}$ , which involves solving a simple finite-dimensional linear system. When

$p_0\in\mathcal{P}$ , we show that the proposed estimator is consistent, and provide a convergence rate of

$n^{-\min\left\{\frac{2}{3},\frac{2\beta+1}{2\beta+2}\right\}}$ in Fisher divergence under the smoothness assumption that

$\log p_0\in\mathcal{R}(C^\beta)$ for some

$\beta\ge 0$ , where

$C$ is a certain Hilbert-Schmidt operator on

$\mathcal{H}$ and

$\mathcal{R}(C^\beta)$ denotes the image of

$C^\beta$ . We also investigate the misspecified case of

$p_0\notin\mathcal{P}$ and show that

$J(p_0\Vert\hat{p}_n)\rightarrow \inf_{p\in\mathcal{P}}J(p_0\Vert p)$ as

$n\rightarrow \infty$ , and provide a rate for this convergence under a similar smoothness condition as above. Through numerical simulations we demonstrate that the proposed estimator outperforms the non- parametric kernel density estimator, and that the advantage of the proposed estimator grows as

$d$ increases.

Density Estimation in Infinite Dimensional Exponential Families

Abstract