Sunday, March 2, 2014

The Gaussian kernel maps data onto the sphere

It is a fact as surprising as trivial that the Gaussian kernel maps your data onto the infinite-dimensional sphere. No computation regarding the RKHS basis are required, since, given the kernel
$$k(x,y)=\exp(-\gamma \| x-y\|^2)$$
defined on the domain $X$, inducing a map $\Phi: X \rightarrow F$, where $F$ is the associated feature space.
We have that $k(x,x)=1$ for all $x \in X$. Therefore what we have is clearly the sphere, since all $x$ are one unit await from zero in the feature space $\|\Phi(x)\|^2 = \sum_i \lambda_i \phi_i(x) \phi_i(x) = k(x,x)=1$. Is there any possible refinement to this? There is! Remember that the Fourier transform of a Gaussian is a Gaussian (with inverted paramers, etc), so we have that the Fourier coefficient 0 (i.e., the power of the constant function, or $cos(0)$) is positive (and maximum among the coefficients). This means that all data have a positive first entry (the constant function is positive and its coefficient is positive), which means that the map actually is from the domain to the positive halve of the infinite hypersphere. Other basis functions (for coefficients other that zero) are sines and cosines and thus may change points. Further characteristics of the mapping depend on the data probability measure.

If you have been trying to apply Differential Geometry to kernel methods and have worked with the Gaussian without noticing it, please stop your research and do something else. A good review and analysis on the induced manifold is Geodesic Analysis on the Gaussian RKHS hypersphere, where the authors make again the same mistake many people do: Naming the feature space as the RKHS (IT IS ITS DUAAAAALLLLLL). In fact, they show that the maximum angle between a pair of points is $\pi/2$, which makes the largest possible embedding a quadrant.

No comments:

Post a Comment