Ping Li, Trevor J. Hastie, Kenneth W. Church.
Year: 2007, Volume: 8, Issue: 83, Pages: 2497−2532
For dimension reduction in the l1 norm, the method of Cauchy random projections multiplies the original data matrix A ∈ ℝn×D with a random matrix R ∈ ℝD×k (k≪D) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility result, one can not hope to recover the pairwise l1 distances in A from B=A×R∈ ℝn×k, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computations, information retrieval, learning, and data mining.
We study three types of nonlinear estimators: the sample median estimators, the geometric mean estimators, and the maximum likelihood estimators (MLE). We derive tail bounds for the geometric mean estimators and establish that k = O(log n / ε2) suffices with the constants explicitly given. Asymptotically (as k→∞), both the sample median and the geometric mean estimators are about 80% efficient compared to the MLE. We analyze the moments of the MLE and propose approximating its distribution of by an inverse Gaussian.