Wednesday, March 26, 2014

Clojure HashMap usage within a Java class

I attended the Hack (Make!) the Bank hackathon that took place on Level 39 (One Canada Square in London). My team decided to make a funds reallocating application that makes automatic payments once an account is topped up. Long story short, we got stuck at interoperability with the Java classes providing Paypal access.

We made the mistake of inserting a Clojure hash map directly to the Java class, which seemed to work but the moment we switched to paper account in the Java class (by modifying the incoming HashMap argument) the system threw an exeption. Ben from Erudine Financial realized that the structure we were passing was inmutable so we just created the proper Java HashMap instance.
(java.util.HashMap. {})
Remember that Clojure's {} is actually a clojure.core.PersistentHashMap.

This situation happens especially when clumsily using proof of concept code in production.

Sunday, March 2, 2014

The Gaussian kernel maps data onto the sphere

It is a fact as surprising as trivial that the Gaussian kernel maps your data onto the infinite-dimensional sphere. No computation regarding the RKHS basis are required, since, given the kernel
$$k(x,y)=\exp(-\gamma \| x-y\|^2)$$
defined on the domain $X$, inducing a map $\Phi: X \rightarrow F$, where $F$ is the associated feature space.
We have that $k(x,x)=1$ for all $x \in X$. Therefore what we have is clearly the sphere, since all $x$ are one unit await from zero in the feature space $\|\Phi(x)\|^2 = \sum_i \lambda_i \phi_i(x) \phi_i(x) = k(x,x)=1$. Is there any possible refinement to this? There is! Remember that the Fourier transform of a Gaussian is a Gaussian (with inverted paramers, etc), so we have that the Fourier coefficient 0 (i.e., the power of the constant function, or $cos(0)$) is positive (and maximum among the coefficients). This means that all data have a positive first entry (the constant function is positive and its coefficient is positive), which means that the map actually is from the domain to the positive halve of the infinite hypersphere. Other basis functions (for coefficients other that zero) are sines and cosines and thus may change points. Further characteristics of the mapping depend on the data probability measure.

If you have been trying to apply Differential Geometry to kernel methods and have worked with the Gaussian without noticing it, please stop your research and do something else. A good review and analysis on the induced manifold is Geodesic Analysis on the Gaussian RKHS hypersphere, where the authors make again the same mistake many people do: Naming the feature space as the RKHS (IT IS ITS DUAAAAALLLLLL). In fact, they show that the maximum angle between a pair of points is $\pi/2$, which makes the largest possible embedding a quadrant.