Machinomics

This blog has moved!

2015-01-04T02:00:00.003-08:00

I finally moved to Github pages!

Check out my new content so far:

Clojure HashMap usage within a Java class

2014-03-26T16:01:00.001-07:00

I attended the Hack (Make!) the Bank hackathon that took place on Level 39 (One Canada Square in London). My team decided to make a funds reallocating application that makes automatic payments once an account is topped up. Long story short, we got stuck at interoperability with the Java classes providing Paypal access.

We made the mistake of inserting a Clojure hash map directly to the Java class, which seemed to work but the moment we switched to paper account in the Java class (by modifying the incoming HashMap argument) the system threw an exeption. Ben from Erudine Financial realized that the structure we were passing was inmutable so we just created the proper Java HashMap instance.

(java.util.HashMap. {})

Remember that Clojure's {} is actually a clojure.core.PersistentHashMap.

This situation happens especially when clumsily using proof of concept code in production.

The Gaussian kernel maps data onto the sphere

2014-03-02T09:56:00.000-08:00

It is a fact as surprising as trivial that the Gaussian kernel maps your data onto the infinite-dimensional sphere. No computation regarding the RKHS basis are required, since, given the kernel
$$k(x,y)=\exp(-\gamma \| x-y\|^2)$$
defined on the domain $X$, inducing a map $\Phi: X \rightarrow F$, where $F$ is the associated feature space.
We have that $k(x,x)=1$ for all $x \in X$. Therefore what we have is clearly the sphere, since all $x$ are one unit await from zero in the feature space $\|\Phi(x)\|^2 = \sum_i \lambda_i \phi_i(x) \phi_i(x) = k(x,x)=1$. Is there any possible refinement to this? There is! Remember that the Fourier transform of a Gaussian is a Gaussian (with inverted paramers, etc), so we have that the Fourier coefficient 0 (i.e., the power of the constant function, or $cos(0)$) is positive (and maximum among the coefficients). This means that all data have a positive first entry (the constant function is positive and its coefficient is positive), which means that the map actually is from the domain to the positive halve of the infinite hypersphere. Other basis functions (for coefficients other that zero) are sines and cosines and thus may change points. Further characteristics of the mapping depend on the data probability measure.

If you have been trying to apply Differential Geometry to kernel methods and have worked with the Gaussian without noticing it, please stop your research and do something else. A good review and analysis on the induced manifold is Geodesic Analysis on the Gaussian RKHS hypersphere, where the authors make again the same mistake many people do: Naming the feature space as the RKHS (IT IS ITS DUAAAAALLLLLL). In fact, they show that the maximum angle between a pair of points is $\pi/2$, which makes the largest possible embedding a quadrant.

On the computer languages market

2014-02-26T14:21:00.000-08:00

There is an interesting series of articles on Dr. Dobb's essentially about what we can call the computer languages market, where they analyze trends and caracteristics, especially in the Editor in chief's article.

They analyze the trends in language usage, which ordinally had not change since the year before (C, Java, Objective-C, C++, C#, PHP, Visual Basic, Python and Javascript), but that show a number of developments. For instance, Perl is leaving room in favor of Python. Perl seems to be dying and with signs of becoming history. They also mentioned a strong resurgence of Javascript but I would like to add that a large number of developments in the web scripting world have been carried out, starting with CoffeeScript, a tiny language that translates 1-to-1 to JavaScript. See their web page for some use of the language. ClojureScript is another compiler, this time to translate Clojure into JavaScript.

I found the following paragraph of particular interest:

By all measures, C++ use declined last year, demonstrating that C++11 was not enough to reanimate the language's fortunes, despite the significant benefits it provides. I have previously opined that Microsoft's contention of a return to native languages being led by C++ was unsupported by evidence. It is now clearly contradicted by the evidence.

I feel encouraged by this statement. I believed C++ is an awful language, and always has been. Back in the late 90's when I started programming and my teenage budget in a remote outpost included a non-disposable 486 DX2 with DOS, we had no choice but to get mainstream technical stuff, and that included C++ as the only advanced systems language. My experience, and I believe everybody's experience, is that programmer's time is more important than running time, and even hardcore C++ programmers recognize they put themselves in pain when facing non-standard tasks with the language. Some would argue that pain is what you pay for system performance, but that is not true (see the paragraph below). C++ will never grow on Big Data (data processing), mobile development or cloud computing, and I would expect that performance computing to also cut C++ usage in favor of more modern, maintainable system languages, such as the D language (or the one I describe below).

An interesting development they had not talked about in the area of systems programming, more than the D language, is Nimrod. It has cool features such as pointers that are traced by a lightweight garbage collector and others that directly translate to C++ pointers. Templates that can be called as operators are included, supporting metaprogramming, it supports inmutable objects, a number of syntactic sugars (such as being able to call len(x), x.len or x.len if there is only one argument, command and function-like calls like echo("hello") and echo "hello"...), first class functions and a very good mixture between Python indentation-defined blocks and Scala function definitions, as opposed to D's C-like syntax. Nimrod compiles to C, C++, Objective C and JavaScript. Definitely a good candidate to learn and to watch.

On the functional but high-performant side of the market we have Erlang, Haskell, Scala and Clojure. The first three are older, especially the first two ones. In my opinion, Clojure is simplest, especially when compared to Scala. This is patent when examining a source coude listing in both languages. Clojure lifts the programmer's productivity to new highs (I was able to code the Denarius' matching engine core in half a day). Definitely the way to go in the future.

Machine learning in a few words

2014-02-10T04:35:00.000-08:00

Machine learning is becoming a buzzword, everybody talks aboit it and few seem to be interested in the math underneath (I find statements like "I wanted to know more but all sources were too statistical/mathematical and I wanted more practical stuff").

Let me tell you something:
First: You can't really use Machine Learning if you don't know the statistical/mathematical basis
Second: You can't really use Machine Learning if you don't know the statistical/mathematical basis
Third: You can't really use Machine Learning if you don't know the statistical/mathematical basis

Machine Learning is just a fancy word for the statistical/mathematical tools lying underneath, whose objective is to extract something that we may loosely call knowledge (or something that we understand) from data (or something chaotic that we do not understand), so that computers may take action based on the inferred knowledge. An example of this would be a robot arm/humanoid: without programming actions on direction/velocity/acceleration vectors based on an established model, we may put sensonrs on a subject's articulations, and from these datapoints learn a regression model on the manifold of natural movements. Another example is in Business Intelligente: We may learn groups of customers (market segmentation) so that we may engage several groups with specific policies or offers target at them.

Maching Learning is applied Statistics/Mathematics. Is very little and very unpractical without Optimization/Operations Research, from the algorithmic and practical/scalable point of view.

I've come to the conclusion that there exists two large main approaches to ML, disregarding the specific technique we are dealing with and its target (i.e., supervised or unsupervised), plus one in the middleway:

Functional approach (Mathematical)
Neural Network/Deep Learning approach (Middle way)
Probabilistic approach (Statistical)

In the functional approach, one uses the theory of Hilbert spaces (and therefore of differential equations and linear algebra). The goal is to find a set of transformations of the data so as to best perform in a score for the task (called functional). These transformations come from a pre-defined set which is not related to the data in any way and are a combination of possibly orthogonal basis of a space of functions (transformation) defined on the domain of the original data. Examples of this are: Linear/Ridge/Sparse Regression (linear or identity transformation for regression), SVM (non-linear using the kernel trick), PCA (SVD/eig computation) and KPCA, Matrix factorizations (for signal separation/clusering...), K-means, Projection pursuit... The basic idea is:

I have my data set and a buch of (linear or non-linear) transformations, find a solution applying these transformations to my data so that I can maximize a score functional that I like for my problem. If I want to predict something (classification/regression) then combine these transformations so that the combination fits best my target variable. If I want to examine the nature of the data (dimensionality reduction, matrix factorization, clustering), then use a combination of my transformations to lose as least information as possible (measured with a functional, again).

In the probabilistic approach, the prior knowledge is an assupmtion of the prior probability and the likelyhood, and works towards obtaining a posterior probability (that the outcome is a given choice given the data just seen). Examples are: Logistic Regression (simple non-linear transformation for classification), Naive Bayes (classification), SNE, Gaussian Processes... The general idea is

I want to see independence at the end of the process, therefore I can assume Gaussian multivariate variables so that a linear or non-linear transformation gives me components that are as independent as possible (for unsupervised learning) or assume a probability distribution at the output and a likelyhood and compute a model so that I best fit the likelihood of seeing the target variable given the data.

Neural networks and deep learning are a different story. I consider them to be their own field, drawing tools from the above. They are neither functional because they are not dealing directly with functions (transformations) in the functional analytic setting, and they are not probabilistic for obvious reasons, but use any probability or information theoretic tool as needed. The fact that they connect output of transformations to inputs makes this field related to the first approach indeed, seen as a chain of transformations (spaces defined on the image of their predecessors), but the focus here is obviously the algorithms and that changes things.

Reinforment learning is nowadays just a fancy word for techniques widely known, studied and used in Stochastic Processes, HMM (Hidden Markov Models) being the only exception. Sometimes they call it Sequential Learning, but it is not widely considered Machine Learning, neither scholarly nor popularly.

So, as you can see, there is nothing new (not at least as the discovery of the fundamental theorem of calculus, or of quantum mechanics).

Regarding novelty, I am annoyed each time I read about ML techniques from the big data guys, who are normally programmers starting to do something more that data aggregation and querying. It seems that there is something both new and exotic in what they are saying, and it it mostly well known techniques from statistics, such as this article, or the beginning of Sean Owen's presentation at Big Data Spain.

Now, the practical side of things require that ML scales to Big Data. That limits the applicability of matrices and non-linear transformations to adapt linear methods, so let's see what the next breakthrough is.

Use Java JARs within Clojure projects

2014-02-08T05:41:00.000-08:00

I am creating a clojure wrapper for the excellent library from Pekka Enberg Falcon, which will provide Denarius with FIX protocol communication capabilities. The new project's name is, obviously, falcon-clj.

Although the ideal packaging would be to include Falcon as a Git submodule, I decided to pre-compile Denarius into a JAR stored into the /lib directory to avoid Leiningen complications in the starting phase and be able to focus on the implementation.

I compiled the library and pushed it into Github and then I received my new laptop, so I installed everything from scratch, including the JDK, and cloned everything back into the new laptop. The next stage would be to use the REPL to quickly get used to Falcon.

=> (import [falcon.fix Session Versions])
UnsupportedClassVersionError falcon/fix/Session : Unsupported major.minor version 51.0 java.lang.ClassLoader.defineClass1 (:-2)

So I digged up a little bit and found some interesting information here, which led me to here and then to correctly configure Eclipse (Aptana) to use an external JRE to run things.

Basically, you need to explicitly tell Eclipse to use the external JDK you are using. You can find the setting under Window -> Preferences -> Java -> Installed JREs.

What lies underneath malloc

2014-02-01T03:42:00.000-08:00

malloc is C's dynamic memory allocation function: memory is assigned to the process heap and then the malloc dynamic assignation does its magic, to make boundaries between dynamically assigned variables. Here's how it works (bear in mind that a malloc implementation is largely compiler-dependent):

At the beginning of the process' heap free space, a special region is allocated to point to the next available byte address in the free region and a previous pointer to the pointers corresponding to the previously allocated block. In case there is no block allocated next, the pointer is null. Otherwise, the pointer will point as many bytes ahead in the process memory as the corresponding allocated block's size.

The function free updates the following block's previous pointer after the recently liberated block, which causes mismatch with the preceding block's next pointer pointing to the deleted block. This creates an accounted free block. To speed up, free can have its own list of free block and sizes elsewhere on the heap.

Some sad goodbies :-(

2014-01-31T12:28:00.000-08:00

I am sadly taking out several blogs from my read list at the right...

Our teacher (by his book) Larry Wasserman decided to stop writing on his blog. There, he mentioned that sparse regressors, being as useful as they are, have a infinite risk in a region near zero, which could be of no practical consequence for most applications but may be too dangerous for some cutting edge applications (precision robotics, high frequency trading...). He left the door open for a comeback, we certainly hope so!

mathbabe (Cathy O'Neil). She writes a lot and I was finding her blog posts a little bit confusing for and many of them not in line with what I want to read or show in my blog, even though I support some of her views and agree on her reasons to short Facebook, which in turn coincide with a recent Princeton application of an empidemology model to social networks.

Peekaboo (Andreas Müller), seems to have left his blog aside and moved on. He has very interesting material there, though.

Strongly typed external data sources language integration in F#

2014-01-27T03:26:00.001-08:00

I have read something that I bot like and consider important: Data sources integration in the language.

I like it because it is a really interesting feature in the sense that data is integrated into the language just like control flow, for example, quoting the paper by Syme et al. (see below), we may use

let data = Samples.DataStore.Freebase.FreebaseData.GetDataContext()
data.Sports.Baseball.Coaching_Positions

Even when specifying the route to te database Coaching_Positions, the IDE itself will show us all options as it finds them, much like code completion with language keywords and loaded types.

On the other hand, this is important because is of high value for the data scientist. Moreover, in F#, they have made it so as to scale with internet data (Big Data) and especially designed it to word from the cloud, making the model especially appealing for modern use. I hope Clojure catches up in this field and adds something of the like soon.

http://research.microsoft.com/pubs/192598/ddfp-information-rich-fp-themes-v5.pdf

Denarius Exchange organization created on Github

2014-01-25T07:40:00.001-08:00

Dear all,

I've transferred the Denarius' repository to an organization focused on developing the exchange. Anybody is invidted to join and contribute.

Te new address is:
https://github.com/denarius-exchange/denarius

On the technical side, a matching is implemented with hash maps now, and we get slightly more speed, but we are introducing vectors and queue operations on them, which will yield about five times more speed.

Next ahead are the connection nodes and protocols implemented on them.

Monty Hall revisited

2014-01-24T07:44:00.000-08:00

If I tell you: There are 1K bitcoins 1 wallet that will be yours if you guess which wallet out of three is the right one, the rest containing an amount of zero bitcoins, and ask you to point out an initial selection, then showed you that, effectively, one of the remaining wallets contains zero bitcoins... and finally giving you the opportunity to change wallet. Would you change? The awnser is yes.

This is so because there is new evidence now that supports a higher probability that the remaining unseen wallet is the right choice, whereas there is none about your current choice. The fact that you selected wallet 1, and given that choice, I showed you wallet 2, that leaves wallet 3 with a posterior probability of 2/3. This does not happen for our current wallet 1, since choosing 1 influenced my decision to show you 2. More precisely: You chose wrongly with probability 2/3. With that probability, I show you the only possible door that I can, leaving the 2/3 for the remaining unseen and unchosen door. On the contrary, you choose well with 1/3 probability, but then I can choose among 2 doors to show you, each with a probability of 1/2. This is how we include my decision (or necessity) to show you 2 into the math (this is the best explanation you are gonna get from all over the internet):
Let's call R "right choice" V "visible incorrect wallet" and S "your choice". We need to compute $P(R=3|V=2,S=1)$, the probability of 3 being the right wallet, after you selected 1 and I showed you that 2 was not right (remember that all priors are 1/3).
$$P(R=3|V=2,S=1)=\frac{P(V=2,S=1|R=3)P(R=3)}{P(V=2,S=1|R=3)P(R=3)+P(V=2,S=1|R=1)P(R=1)}\\=\frac{1\times 1/3}{1\times 1/3 + 1/2 \times 1/3}=2/3$$$P(V=2,S=1|R=3)=1$ is the probability that, given R=3, then I was forced to show you the incorrect wallet remaining (you already chose one incorrect wallet). $P(V=2,S=1|R=1)=1/2$ because there are two possible incorrect wallets (since you selected the correct one) that I can choose from to show you.

Let's compute the same posterior for the case I decide not to change wallet:
$$P(R=1|V=2,S=1)=\frac{P(V=2,S=1|R=1)P(R=1)}{P(V=2,S=1|R=3)P(R=3)+P(V=2,S=1|R=1)P(R=1)}\\=\frac{1/2\times 1/3}{1\times 1/3 + 1/2 \times 1/3}=1/3$$.

Therefore if you change you have more chances of winning the 1000 bitcoins.

Needless to say, this works for every possible combination of $R$, $S$ and $V$.
This happens, as I mentioned, because of the way I was influenced (forced) to show you the incorrect remaining wallets. To see it intuitively, imagine 100 wallets, and that you chose one amongst them, and I am forced to show you 98 incorrect wallets, leaving your choice and another one. Is it more likely that this particular wallet is the correct one (that your choice forced me to leave it) or that you chose wisely amongst 100 wallets? If you choose 99 incorrect wallets, the set that I show you is the same, except for the chosen incorrect wallets each time, and will never contain the particular correct wallet.

There is a cool Android app in case you want to check how the law of large numbers works for this problem.

Things about Python that I find cool

2014-01-18T04:54:00.000-08:00

Python is a formidable language with functional capabilities that is extraordinarily easy to learn and follow. It is interpreted so that scripting and prototyping are possible.

Commentaries follow UNIX style so that you can use it for building scripts (and therefore program CGIs).

Despite striking as too odd at first sight, Python's syntax overweights its cons by offering a clear and simple look which noticeably increases productivity. Python relies on indentation to define its blocks, which means that you get rid of parentheses, brackets, black keywords such as begind or end, etc. This simplifies code and makes it more readable

for i in vector:
     if i==1:
        print i<<1
     else:
        print i**2
     print "Loop: %d" % i

IPython notebook (which requires Tornado libraries), is an interesting addon over IPython (which itself gives you nice command line ammenities). It is a Javascript/CSS served over a local HTTP server which allows you for nice online prototyping on your web browser. Results are inlined much like a Maple or Mathematica session.

It allows for powerful expressions, such as performing a transformation on filtered data with a near natural language expression:

[x**2 for i, x in enumerate(col) if i!=0 or x]

Here we transform all elements in the vector col squaring them but only those whose index is not zero.

Decorators are a powerful way to add behavior to a function, adding code that performs some task before and after the target function is executed. The canonical use of decorators is adding logging behavior to functions. In Python we can use them easily and non-intrusively by adding the operator @ and the decorator function to the definition of the function

def decorator(func):
    def inner(x):
       print "Arguments were ", x
       y = func(x)
       print "Result is", y
    return innter

The argument operators * and ** allow for generics construction within the functions. By convention, one uses *args for sequential expansion of the argument list and of and **kwargs for named expansion of an argument list. This means that you can use * in the function definition to represent a list of arguments passed in order, therefore *args becomes a list of values that you can then use. On the other hand, calling a function that has named parameters with a list of values with the operator * will result in expanding the values to these arguments.

def test(a=1, b=2):
print a, b
args=(1,2)
test(*args)
1 2

Similarly, **kwargs cab be used to either specify a list of generic argument list to make the behavior of the function dynamical (for example, your function is a relay and just gets arguments that passes on to other functions that will interpret them), or be used to expand a map of named arguments in this way

def test(a=1, b=2):
print a, bs
kwargs = {"a":1,"b":2}
test(**kwargs)
1 2

Althoug the following does not refer to the language itself, it has to do with the community, since there exists a service called PyCloud that enables Python practitioners to use CPU power for their Python numerical needs. As per the PyCloud site:

The PiCloud Platform gives you the freedom to develop your algorithms and software without sinking time into all of the plumbing that comes with provisioning, managing, and maintaining servers.

You have 20 free computing hours.

Finally, a powerful and comprehensive set of libraries has been built on Python, especially for Data Analysis: Python probably has the largest collection of opensource libraries for data analysis. It includes: NumPy, SciPy, Pandas, Sklearn, NLTK and many others.

Denarius financial exchange announcement

2014-01-06T17:36:00.003-08:00

I've been busy on Christmas working on Denarius, a financial exchange. This is the announcement that I've benn posting on several sites (see, for example, on bitcointalk).

Dear all,

I just started out a new open source project to implement a financial exchange project.

The implementation is in Clojure, which has already given the project a kick start with the matching engine concurrency. Interested Clojure programmers are invited to join.

Also, programmers working in finance, parallel systems and databases are kindly invited to join. Contributions to the core system design are highly welcome.

This project stems from unsatisfactory use of other existing projects regarding financial exchanges, and is of interest for the bitcoin community since cryptocoins are among the tradeable assets in the software.

This account is newbie's, so diffusion to other parts of bitcointalk is appreciated.

Subscribe just sending an email to: denarius@librelist.com
Source code: https://github.com/analyticbastard/denarius
Wiki: https://github.com/analyticbastard/denarius/wiki/Introduction

Benchmark functions in Python

2013-12-25T05:15:00.000-08:00

This is a function to benchmark functions in Python using decorators, so that you can use it non-intrusively with your current code, just adding the decorator operator @ to the definition of your function. Feel free to modify it with higher resolution time functions.

Clojure concurrency and some niceties

2013-12-22T11:42:00.000-08:00

I stumbled upon the barber problem at some webpage I don't remember; it was a loose and open approach to it. I then imagined my own version and implemented that in Clojure (I'll put the code as a Gist, even though the style just does not match this blog's). Basically, I want to adjust the rate of customers entering the barber shop to a little faster than the barber can dispatch one customer.

To do that, I create two functions, one that increases the queue of customers (with the shop supporting up to 3 customers) amd one that dispatces customers, decreasing the queue. I assume 4 hours for customers to be able to take a seat (the barber's will close after 4 hours and cut the remaining wating customers).

Notice here that the barber cuts the hair only if a random number meets a condition (here making him slower than his customers). I implement this with a watcher function, that gets called whenever the reference changes. In order to be called every time a customer enter the shop, we need to issue and identity change, that does not change the value of the queue but fires the watcher. This is a very nice feature of Clojure.

The important functional element is this

(let [f (if (< @queue 3) inc identity)]
(dosync (alter queue f)) ))

Notice the conditional assignment to the variable in the let block. This removes the boilerplate code needed in the expression section within the let block.

Why are so many people still using Internet Explorer?

2013-12-19T05:28:00.003-08:00

As somebody that know a little bit of HTML and Javascript, I appreciate standards very much. Internet Explorer has broken (though they've restrained themselves) and continue to break every single one of them. Luckily I develop for mobile platforms when I have an idea and have time.

Internet Explorer never follows standards, the interfaces they expose are always Microsoft-y, cumbersome, intrincated, unpractical. Multimedia and interaction is assured with the latest developments of the triad HTML/Javascript/CSS3, yet in Explorer it always renders badly. I know programmers who have suffered making two versions of their webpage: one for Explorer and another for the rest. These are just bad guys.

And we have the following compelling reasons, that are beyond pure political reasons (of following standards):

You are safer by avoiding software that bad guys target. Mac users benefited from this for years. Windows users can lower their attack surface (be less vulnerable) by avoiding popular software. Internet Explorer is popular, so bad guys exploit known problems with the browser. No thanks.
Microsoft fixes bugs in Internet Explorer on a fixed schedule. But, bugs are not discovered on a schedule which means IE users remain vulnerable to know bugs until the next scheduled bug fix roll-out. Neither Firefox nor Chrome, my preferred browsers, are locked into a schedule.
In addition, Microsoft is just slow in fixing Internet Explorer bugs. The last release of IE patches included a fix to a bug that Microsoft had been told about six months ago. The topic of bugs in popular software brings Adobe's Flash Player to mind. Internet Explorer users with Flash enabled in their browser get notified of new versions of Flash using a very flawed system. And, when they are notified, they need to manually install the new version of Flash.
In this day and age, this is not acceptable; Flash is too popular and too buggy. Firefox fails here too. And speaking of Flash, it exists in Internet Explorer as an ActiveX control. The lack of security in ActiveX is what prompted me to jump on the Firefox bandwagon even prior to version 1.0.
ActiveX may be locked down a bit more than it used to be, but how many Internet Explorer users understand the security related prompts about running an ActiveX control, let alone the configuration options for ActiveX? To me, a browser that doesn't support ActiveX is safer. ActiveX was the first approach to extending browsers with extra features and functions. Now, both Firefox and Chrome have a huge number of available extensions. Internet Explorer has only a handful
Buggy browser extensions/plugins are often targeted by bad guys. Both Firefox and Chrome do some checking for outdated extensions. Internet Explorer does none.

Damn, Clojure is fast!

2013-12-17T13:09:00.001-08:00

Well I feel positive today, I struggled for the past two days to make my logistic regression in Clojure faster. I even made up to four different implementations of the logistic regression, with none of them giving satisfactory results. It all was even more disappointing when comparing to my MyML logistic regression implementation.

Well, I found out what the problem was: Actually I was making two fatal errors.

My input data in Clojure were lists instead of vectors
My input data in Python was 256 datapoints, instead of 1000 as in the Clojure version.

Both points stemmed from me being not so careful. In the first case, I knew already that one should use vectors instead of list when going after performance, but I was assuming that my data was in vector form. Staring at the variable X, I wondered if it was vector or list and voila, performance just got x5 better. Then I went to the Python interpreter, checked whether the number of iterations in the gradient descent object was the same as in Clojure, and then checked the data... well, my data was smaller (from a previous test with logistic regression). I re-generated my data and Python just lagged behind. In particular, I give you the figures (notice that I did not bother to put the wrong results I was getting because of my mistakes):

Python: 0.56 sec
Clojure: 0.19 sec (iterative implementation through loop-recur) 0.04 sec (concurrent implementation through agents).

Bear in mind that I am conduncting the tests on my girlfriend's borrowed machine and that I installed Cristoph Gohlke's Numpy distribution, which shippes with Intel's MKL statically-linked libraries, so it should be pretty fast in terms of algebraic computations. Perhaps the lack of performance comes from Python's interpreter itself (read-interpret-execute...). This is even more supporting of Clojure, since we are focusing on the infrastructure of both systems.

I will be putting everything in order, making my logistic regression more idiomatic and building some tests.

Google Highway 101 brainteaser

2013-11-20T18:43:00.000-08:00

Some days ago I read about the famous Google application invitation in the form of an ad on Highway 101 passing through LA.

The ad shows a conceiled URL that we need to guess by computing the first 10-digit prime that appears in the series of consecutive digits of the irrational e. It is an oldie but goodie.

I thought I could solve this in Clojure. Here is my solution.

First we need to compute the number e with as many digits as necessary.
For that we can implementan unbounded spigot algorithm for the number e. I googled about this and found out a blog with superb material for this exercise, so I implemented the ideas in Clojure.

(defn digitse [N n] (if (= n N) (.setScale 2.5M 200)
                      (if (= n 0)
                        (+ 2.0M (* (.divide
                                   (.setScale 1.0M 200 BigDecimal/ROUND_HALF_UP) 2.0M BigDecimal/ROUND_HALF_UP)
                                   (digitse N (inc n) )) )
                      (+ 1.0M (* (.divide
                                   (.setScale 1.0M 200 BigDecimal/ROUND_HALF_UP) (+ n 2.0M) BigDecimal/ROUND_HALF_UP)
                                 (digitse N (inc n)))) ) ))

We then create a subsequence with sufficient decimal places. It turs out that 150 digits is enough. We also convert it to string to improve the partition of our 10-digit (now characters) strings.

(def e (digitse 150 0))

(def se (.subSequence (.toString e) 0 150 ))

We need to define a function that tells us wheter a given number is a prime. As we know, we need only test up to its square root, but here a simple loop over the 10-digit numbers show us that none of the square roots exceeds 89000, so we take the first 8500 prime numbers, starting in 2, from a prime number generator, that I took from here.

(defn gen-primes "Generates an infinite, lazy sequence of prime numbers"
[]
(let [reinsert (fn [table x prime]
                   (update-in table [(+ prime x)] conj prime))]
    (defn primes-step [table d]
                 (if-let [factors (get table d)]
                   (recur (reduce #(reinsert %1 d %2) (dissoc table d) factors)
                          (inc d))
                   (lazy-seq (cons d (primes-step (assoc table (* d d) (list d))
                                                 (inc d))))))
    (primes-step {} 2)))

Please note that this generator is pretty fast, compared to the answers that I have read on the internet. Now we take the primes we need from this lazy sequence.

(def first-primes (take 8500 (gen-primes) ) )

And define a function to test that the condition that some remainder of a given number between any of the prime denominators is zero does not happen, which means that our number is prime.

(defn prime-restricted? [n first-primes]
(= nil (some #(= 0 (rem n %)) first-primes) ))

Finally, we recursivelly (with no stack overhead) until either a prime is found or we exceed the 150 digits capacity (in which case we would just increase it, but we don't need to).

(defn find-first-prime-in-e [se init first-primes]
(if (<= (- (.length se) init) 10)
    nil
    (let [number (java.lang.Long/parseLong (.subSequence se init (+ init 10)) )]
      (println init number)
      (if (prime-restricted? number first-primes)
        number
        (recur se (inc init) first-primes)
        )))
)

The last output lines and the returned value of the previous function are

96 6642742746
97 6427427466
98 4274274663
99 2742746639
100 7427466391
7427466391

So, it turns out that the first 10-digit prime number is in place 100, which makes the URL

7427466391.com

Upon connection, you would get another quizz, which having in mind the quantity you just computed is fairly straighforward.

Two very simple Python functions to ckeck prime numbers and list divisors

2013-11-17T07:17:00.000-08:00

Here are two simple functions (with no error checking, so watch your inputs) to check whether a number is prime and to list all divisors of a number.

To check for prime numbers:

def prime(x): return not any ( ( x % (np.array(range( int (np.sqrt(x)) ) ) + 2) == 0 ).tolist() )

Remember that the fundamental theorem of arithmetics guarantees that every integer can be decomposed in prime numbers, and that it is necessary to divide up only to a number's square root to know whether it is prime, a fact known in ancient Greece among many facts about integer arithmetic (for instance, Euclid proved that there are infinite prime numbers in Proposition 20 of his Elements)

def divisors(x):
return np.array( [k for k in range( 2, int(np.sqrt(x) ) ) if prime(k) and not x % k] )

In this case we need to check whether primes from 2 to half of it (we could optimize this by finding the minimum factor, and the range of checks we really need to do is (min_factor(x) , x/min_factor(x) ) ), since having a factor larger than its half would imply it is prime by the previous principle.

Naive Bayes with Map Reduce

2013-11-07T15:16:00.000-08:00

A fairly straighforward way of implmenting the Naive Bayes classifier for discrete data is using Map Reduce. This is especially useful if you have a bunch of characteristic or naturally discrete data that you can exploit, such as presence/absence, amount of clicks, page/item visited or not, etc.

This can be achieved by first using the data attributes as the key, and the labels as the values on the mapper, in which we need to process the keys and values in this way:

emit the label as key
for each variable (attribute) emit its index (for example, column index) also as key

We only need to emit the category (attribute value) as the value

In the reducer, we need to scan each category and find out how many of the elements in the current key belong to to a category, and divide by the sum of all its categories (which are our values) all which constitutes $P(X_i=x_{i,0}|y=y_0)$, for which we emit a triplet

emit the label as key
for each variable (attribute) emit its index (for example, column index) also as key
emit the category for this attribute of this example

As value we only need to emit the previous division.

To find out a new instance, we look into the dictionary entry corresponding to its attributes and return the bayes quotient.

I've just implemented this in MyML.

As an example:import numpy as np

Xd=np.random.random((256,2))
X=1*(Xd<.5)
y=1*(Xd.sum(axis=1)<.5)

from myml.supervised import bayes

reload(bayes)
nb = bayes.NaiveBayes()
nb.fit(X, y)
nb.predict(X[0,:])
pred=nb.predict(X)

1.0*np.sum(1.0*(pred>.5).reshape((1,len(y)))[0]==y)/len(y)

0.89453125

print X.T
[[0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0
1 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1
0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0
1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1
1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1
1 1 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 0 1 0 1 1 1 0
0 0 1 1 1 1 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 1 1 0 1]
[0 0 1 0 1 0 0 0 1 1 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0
1 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 1 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1
0 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 0 0
1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 0 1 1 0 0 0
0 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0
1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 1
1 1 1 1 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 0 0 0]]

y
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0])

True random numbers in R

2013-11-06T16:24:00.000-08:00

We know that rnorm() computes the output from an internal seed it keeps as a state variable. The rest are computations and, therefore, it allows up to compute the output if we guess the seed.

According to the CRAN package description, the random package in R provides an interface to the true random number service provided by the random.org website. It operates by sampling atmospheric noise via radio tuned to an unused broadcasting frequency together with a skew correction algorithm due to John von Neumann (which I don't know what it means yet). So if you are ever interested in improving your random numbers to better perform your simulations, bear this package in mind.

Why everything in the classification field is irrelevant but two things

2013-11-05T16:04:00.000-08:00

Yes, everything in the binary classification problem in Machine Learning is irrelevant: Throw your Logistic Regression algorithm away, forget about SVM with universal kernels, to hell with neural networks. All you need is:

Multi-dimensional Gaussianization + Naive Bayes

Gaussianization is the process of making a variable Gaussian. One-dimensional Gaussianization is trivial: just take the inverse Gaussian CDF and apply it to any random variable's CDF. This requires knowing our RV's CDF but one-dimensional RVs estimation offer no problem unless we have too few examples, and can be achieved by Gaussian Mixture Models for most cases. Multi-dimensional Gaussianization is more elaborate and there are several procedures. Let's assume we have a procedure to gaussianize a multi-dimensional RV. It is sufficient that the gaussianize version ends up with the identity covariance matrix (which can be directly the output by the procedure or can be done by just a rotation and scaling). Once we get a RV with identity covariance matrix, we know that the one-dimensional RVs in it are independent. This can be the input to a Naive Bayes Classifier and complies with all its assumptions (independence of variables), which automatically yields the best classifier according to the underlyting probability distributions.

In Chen, Scott Shaobing, and Ramesh A. Gopinath. "Gaussianization." (2000), the authors show a method to gaussianize multi-dimensional RVs. It is based on expectation-maximization iterations, in which one estimates the best gaussian distribution and then finds the parameters and rotations that best describe that distribution. At each iteration, the negentropy (the Kullback-Leibler divergence between our currently transformed RV's distribution and a standard Gaussian) is less than the previous interation's. Firstly, by finding a rotation we achieve less dependence, and then by marginal gaussianization we zero-out marginal neg-entropy. This procedure converges weakly (in distribution) and we end up with a multivariate Gaussian. With the chain of estimated rotations and mixture model parameters we can get the transformation we need for new (test) data. Therefore, classification is straighforward with Naive Bayes, and we certainly know that we fully meet its assumptions.

I will be implementing Gaussianization in MyML.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.7803

Clojure structures, deconstruction and higher-order functions to make our lives better

2013-10-21T13:38:00.000-07:00

Clojure provides an easy and more productive way of writing programs and it is available to us if we want to stay away from the common imperative, sequential, start-to-end way of thinking.

During my time as a developer, both (and especially) in the academia and in production software, I've come across the problem of getting the index of those elements in an array that meet a constrain. This is easy (although after some training, as usual) in Clojure

(defn index-filter [pred coll]
(for [[i elem]
(map-indexed (fn [a b] [a b]) coll) :when (pred elem)]
i))

The way this works is: the for macro associates each i and elem to the pairs in the sequence given by (fn [a b] [a b]) coll). This is filtered by executing the predicate on each element. The predicate, in turn, filters out the elements in which we are not interested. The for body then returns each of the index that passed the condition.

We can separate the functionality into two functions, the first to write the indexing of the original elements as an independent function:

(def make-index (partial map-indexed (fn [a b] [a b]) ))

We use (partial) to make a function that still needs an input argument and associate it into the make-index symbol. Placing it into the general code:

(defn index-filter [pred coll]
(for [[i elem] (make-index coll) :when (pred elem)]
i))

The way you call this function is with a predicate and a collection. For example, Now we have a very elegant solution that is valid for many data types.

MyML: Yeat another Machine Learning library in Python

2013-10-14T10:46:00.000-07:00

Yes I know there are a number of (very) well developed and advance ML libraries already out there, especially for Python. What is the point of starting another one?

Well, first of all, when one starts something, he usually does it for the sake of it. For learning. That is my prime reason. I want to sharpen my Python skills with fairly advanced topics, focusing the library on well designing principles and not-so-mainstream state-of-the art techniques, such as an implementation of

[1] K. Fukumizu, C. Leng - Gradient-based kernel method for feature
extraction and variable selection. NIPS 2012.

that I had already implemented in an ad hoc fashion.

Plus, one cannot help but implementing classical techniques and focus on doing it well for once. Look at this UML chart of the Logistic Regression implementation: The Logistic Regression is just a broker of other classes, it just creates a DifferentiableObjective of subclass Logistic, so that any AbstractGradientDescent method can use this implementation to compute the objective function values and the gradients at the parameter space locations (see the diagram):

The diagram was created with https://www.draw.io/

Therefore, the same logistic regression can be estimated by classical gradient descent such as the current implementation, or one can implement an online, stochastic or natural gradient descent variants (future work) and plug them into the factory, which then uses the user argument values to select the particular algorithm. The same applies to other methods, and one can implement a hige loss or classical regression with quadratic loss and just plug in the gradient descent algorithm.

Github: https://github.com/analyticbastard/myml

Hadoop java.lang.ClassNotFoundException

2013-10-06T13:17:00.002-07:00

Today I ran into some weird Hadoop error. It could not find my mapper class. It turns that I had defined HADOOP_CLASSPATH as only my current directory (where my classes were) and it lacked the generic Mapper class (org.apache.hadoop.mapreduce.Mapper), but instead of Hadoop reporting this later class was missing, it did so with my own class, which was clearly accessible.

So this blog entry is for those who run into this problem too, because there is no help from Stackoverflow regarding this issue.

This is the message you get:

java.lang.RuntimeException: java.lang.ClassNotFoundException: ProcessMapper
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
        at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)