Machinomics

Thursday, July 19, 2012

Using Whatsapp on your PC

I have never been a huge fan of cell phones. I am one of those people who think that a phone is just a phone, you use it to speak to others and keep in touch, but not to carry Youtube with you all around.

However, my buddies have started to go into silent mode from me and the reason is that they only use Whatsapp now, so no matter where they are they agree to meet or to do anything on short notice. This leaves me in a very miserable position.

What I did is the following:

I instaled the Android SDK that comes with an emulator.
In the AVG Manager, in the image I created to emulate a Smartphone, I added the property hw.keyboard with the value yes so I could use my PC keyboard to write into the virtual phone. You can also edit your confil.ini that is located under your user directory ~\.android\avd\device.avd\, and add hw.keyboard=yes .
I downloaded Whatsapp from the virtual device.
I installed the app (believe it or not it was the most difficult part for me, finding out how to reach the downloaded files, I downloaded it three times).
During the first execution, it asked me for my real phone number and they sent a regular SMS to my (physical) phone.
I then added my friends' phones to my contact in Android and it automatically marked them as using Whatsapp.
Then one of them added me to the groups they were using to meet.

Sunday, July 15, 2012

Plot classification regions in an SVM

I've been really busy. I can't really claim to be an SVM expert since my postgraduate work up to now did not deal very much with machine learning, though since late I've been working with SVM's and kernels.

In fact I have a very advanced technique that really boost the SVM performance, but that will be left until the paper is published.

One of the test I ran dealt with a modified XOR data problem. The problem consist of four groups drawn from bivariate normal distributions. They are assign two classes such that the groups of the same class are always separated from each other by some group of another class.

y=mvrnorm(50,c(3,5),Sigma=diag(c(0.5,1.5)))
y=rbind(y,mvrnorm(50,c(15,13),Sigma=diag(c(1.5,0.5))))
y=rbind(y,mvrnorm(50,c(7,5),Sigma=diag(c(0.5,1.5))))
y=rbind(y,mvrnorm(50,c(15,17),Sigma=diag(c(1.5,0.5))))
labels=c(rep(1,100),rep(-1,100))

I put the figure here so that the problem is clear, the explanation of how to get it follows.

K.svm=svm(Phi, labels, type="C", kernel="linear",probability=T)
X=as.matrix(expand.grid(list(x = seq(0, 20, length.out=100), y = seq(0, 20, length.out=100))))
# compute the kernel on X here
im=predict(K.svm, PhiX,scale=F)
im=matrix(as.numeric(im),nrow=100,byrow=F)
image(seq(0, 20, length.out=100),seq(0, 20, length.out=100),im,xlab="",ylab="",col=c("#FFFCCCFF","#FFF000FF"))#heat.colors(2))
points(y)

We see that we first train the SVM with the kernel features as explained in the previous post.
Then we create a grid spanning all the points of the region we are interested in painting and evaluate the trained SVM it there. Then we recompose the grid of classified points into a 2D plane and plot it along with the original points.

Monday, July 2, 2012

Fun with Mensa puzzles

If you like to do puzzles regularly or occasionally, or simply feel curious about your IQ, Mensa Denmark has left a good Flash applet for you to do it
http://www.iqtest.dk/main.swf

Mensa is an international organization open to people who score at the 98th percentile or higher on a standardized, supervised IQ or other approved intelligence test and, according to its Wikipedia entry, the oldest IQ-related organization.

They also publish puzzle collections at a cheap price. Have a look at this listing from Amazon UK.

It is very likely that you will have a branch close to you. For you to join Mensa, you will need to score properly in a test for which a small fee is required. Google Mensa <your country> or go to its main address.

Wednesday, June 27, 2012

The Fourier transform as a diagonalization

One of the benefits of using the Fourier transform of a function is that convolutions become multiplications. This is important when solving a differential equation with its Green's function. If the Green's function comes from a differential operator $D^*D$, where $D$ is a differential operator and $D^*$ is its adjunct, then the Green's function is not singular at the origin, and is continuous. It expands a function space called a reproducing kernel hilbert space, RKHS, and all functions in this space can be written as linear combinations of the Green's function evaluated on one argument, and the solution to the differential equation $D^*D u = y$ would be of that form. OK, don't digrees anymore... to the cheese...

In the Fourier domain we operate on frequencies $\omega$. For example, to attenuate the noise, we decrease the power in the high omegas, which accounts for a convolution (with a Gaussian, for example). If we see this linear operation as a matrix, the convolution operator that has one (let's say) dimensional Gaussians in its rows (in the time/space domain) becomes a diagonal in the Fourier domain.

The page popped up with much to follow on. In particular, I liked this paragraph

The moral of the story is that the Fourier Transform may be thought of as a change of basis. The Fourier integral projects a function onto the basis functions of a new coordinate system whose basis functions are the complex exponentials. In this new basis, the convolution operator is diagonal and everything is simple. The convolution operator acts on each Fourier component independently by multiplying the component by an associated magnitude and phase.

In Matlab

C=[4 1 2 3; 3 4 1 2; 2 3 4 1; 1 2 3 4]

C =

     4     1     2     3
     3     4     1     2
     2     3     4     1
     1     2     3     4

F=fft(C)

F =

10.0000            10.0000            10.0000            10.0000
   2.0000 - 2.0000i -2.0000 - 2.0000i -2.0000 + 2.0000i   2.0000 + 2.0000i
   2.0000            -2.0000             2.0000            -2.0000
   2.0000 + 2.0000i -2.0000 + 2.0000i -2.0000 - 2.0000i   2.0000 - 2.0000i

F*C*F'

ans =

1.0e+003 *

   4.0000                  0                  0                  0
        0             0.0640 - 0.0640i        0                  0
        0                  0             0.0320                  0
        0                  0                  0             0.0640 + 0.0640i

Tuesday, June 26, 2012

I am an Analytic Bastard and I will fight Delusional Geometers to death

This post is dedicated to AMG: friend and enemy, mentor and destroyer, wise and fool.

AMG is obsessed to equate generalized functions (as in Swartz distribution theory) to probability distributions (as in measure theory). According to AMG I am an Analytic Bastard, I agree. And this was the least thing I could take from a Delusional Geometer. Therefore I left AMG.

Schwarz distributions are NOT probability distributions

I can say it louder but not clearer.

It doesn't matter that they are both called distributions, sometimes it does happen in mathematics that two different things are similarly called. It doesn't matter how hard you try to make them the same, it doesn't matter how proud you are and how little you think of the people that surround you and that are not Field medalists.

The fact that Strichartz's book shows a bell-like $C^{\infty}$, compactly supported function does not imply it is a distribution. What is more, this bell-like $C^{\infty}$, compactly supported function is clearly stated to belong to the set $\mathcal{D}$, the set of test functions. Therefore it is not a distribution itself, but the objects to which the linear functional (the Swartz distribution) is applied to. If $\varphi \in \mathcal{D}$ then there is a test function. It looks similar to a DENSITY function (the Gaussian) but the density is not the distribution, nor the test function is the (other kind of) distribution.

Furthermore, forcing my brains so as to accept $\varphi \in \mathcal{D}'$ and call it a (Swartz) distribution, then you can't write $\varphi(x)$ outside the integral symbol. It is a functional, which means that it is applied to some $\phi \in \mathcal{D}$, so what makes sense is $\varphi(\phi)$, don't get angry with me because of this, this is a fact. If $\varphi(x)$ were a linear functional, it could be written as $\int \varphi(x) \phi(x) dx$, why on Earth do you say $\varphi(x)$ anyway?

At this point, why the hell do you use $\varphi(x)$ to name a "bell-like" function with $x= \arg \max_y \varphi(y)$?

Then it remains going full retarded and try to apply density estimation methods to image analysis following this logic:

David Mumford develops an axiomatic theory that describes images as generalized functions (check)
We have methods that work in the density estimation field fairly well (check)
Since YOU (and only you) say generalized functions = probability distributions, then our methods must be very powerful in image analysis (FAIL)

FAIL! Because the only supporting argument you have is your pride.

So, let me out!

Wednesday, June 20, 2012

Pairs trading and colinearity

We sometimes recognize a couple of assets as being co-related. However, the dependence regime changes over time, making this co-relation non-linear and depending on, let's say, a phase. A more robust concept is multiple co-linearity, which implies that a linear combination of the returns of those assets is linearly related, and has a constant mean and variance.

Let's say that two assets are co-linear and that the returns of one of them have been consistently larger than the other. It makes sense to sell short the asset with larger past returns and buy the asset with smaller past returns. With this, we would have a quantitative model to measure large discrepancies of the return of the linear combination, for example, execute this strategy when the absolute value of the return exceeds twice the standard deviation. This gives a statistical arbitrage oportunity.

One example that I like to use is the pair EURUSD and GC (NYMEX Gold 100oz) and I used that to get a hold on some coins. Another perhaps more interesting application would be corn and wheat. They seem to have periods when one is the loved child of agricultural commodity traders. They are normally worth the around the same, being wheat historically more expensive. Corn catched up and had a period that was more expensive, but wheat had recently a rally and got to be USD100 more expensive per contract. Obviously, the gap close down to a difference of USD20. It then widened and is now sitting around USD40-USD50. This simple model would have yield a potentical change of USD80 per contract.

Saturday, June 16, 2012

SQL Server CE Max Database Size

I am crawling an Internet database so I can apply some spectral graph methods to it. I am using my computer at the university. I have just logged in and found out that my crawler was reporting it had (oddly and partially) finished. Then I found out the error was not that the database owners had detected me (I have implemented the protocols to keep a low profile), but that the local database is capped at 256 MB file size.

Long story short, if you want to use more than 256 MB on a desktop SQL Server CE database, add the parameter Max Database Size=1024 (to increase it to 1 MB) in your connection string. Separate variables with ';'.