Monday, October 21, 2013

Clojure structures, deconstruction and higher-order functions to make our lives better

Clojure provides an easy and more productive way of writing programs and it is available to us if we want to stay away from the common imperative, sequential, start-to-end way of thinking.

During my time as a developer, both (and especially) in the academia and in production software, I've come across the problem of getting the index of those elements in an array that meet a constrain. This is easy (although after some training, as usual) in Clojure
(defn index-filter [pred coll]
  (for [[i elem]
    (map-indexed (fn [a b] [a b]) coll) :when (pred elem)]
   i))
The way this works is: the for macro associates each i and elem  to the pairs in the sequence given by (fn [a b] [a b]) coll). This is filtered by executing the predicate on each element. The predicate, in turn, filters out the elements in which we are not interested. The for body then returns each of the index that passed the condition.

We can separate the functionality into two functions, the first to write the indexing of the original elements as an independent function:
(def make-index (partial map-indexed (fn [a b] [a b]) ))
We use (partial) to make a function that still needs an input argument and associate it into the make-index symbol. Placing it into the general code:
(defn index-filter [pred coll]
  (for [[i elem] (make-index coll) :when (pred elem)]
   i))
The way you call this function is with a predicate and a collection. For example, Now we have a very elegant solution that is valid for many data types.

Monday, October 14, 2013

MyML: Yeat another Machine Learning library in Python

Yes I know there are a number of (very) well developed and advance ML libraries already out there, especially for Python. What is the point of starting another one?

Well, first of all, when one starts something, he usually does it for the sake of it. For learning. That is my prime reason. I want to sharpen my Python skills with fairly advanced topics, focusing the library on well designing principles and not-so-mainstream state-of-the art techniques, such as an implementation of

  [1] K. Fukumizu, C. Leng - Gradient-based kernel method for feature
      extraction and variable selection. NIPS 2012.

that I had already implemented in an ad hoc fashion.

Plus, one cannot help but implementing classical techniques and focus on doing it well for once. Look at this UML chart of the Logistic Regression implementation: The Logistic Regression is just a broker of other classes, it just creates a DifferentiableObjective of subclass Logistic, so that any AbstractGradientDescent method can use this implementation to compute the objective function values and the gradients at the parameter space locations (see the diagram):


The diagram was created with https://www.draw.io/

Therefore, the same logistic regression can be estimated by classical gradient descent such as the current implementation, or one can implement an online, stochastic or natural gradient descent variants (future work) and plug them into the factory, which then uses the user argument values to select the particular algorithm. The same applies to other methods, and one can implement a hige loss or classical regression with quadratic loss and just plug in the gradient descent algorithm.

Github: https://github.com/analyticbastard/myml

Sunday, October 6, 2013

Hadoop java.lang.ClassNotFoundException


Today I ran into some weird Hadoop error. It could not find my mapper class. It turns that I had defined HADOOP_CLASSPATH as only my current directory (where my classes were) and it lacked the generic Mapper class (org.apache.hadoop.mapreduce.Mapper), but instead of Hadoop reporting this later class was missing, it did so with my own class, which was clearly accessible.



So this blog entry is for those who run into this problem too, because there is no help from Stackoverflow regarding this issue.

This is the message you get:

java.lang.RuntimeException: java.lang.ClassNotFoundException: ProcessMapper
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:857)
        at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:199)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:718)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)