Well I feel positive today, I struggled for the past two days to make my logistic regression in Clojure faster. I even made up to four different implementations of the logistic regression, with none of them giving satisfactory results. It all was even more disappointing when comparing to my MyML logistic regression implementation.
Well, I found out what the problem was: Actually I was making two fatal errors.
- My input data in Clojure were lists instead of vectors
- My input data in Python was 256 datapoints, instead of 1000 as in the Clojure version.
Both points stemmed from me being not so careful. In the first case, I knew already that one should use vectors instead of list when going after performance, but I was assuming that my data was in vector form. Staring at the variable X, I wondered if it was vector or list and voila, performance just got x5 better. Then I went to the Python interpreter, checked whether the number of iterations in the gradient descent object was the same as in Clojure, and then checked the data... well, my data was smaller (from a previous test with logistic regression). I re-generated my data and Python just lagged behind. In particular, I give you the figures (notice that I did not bother to put the wrong results I was getting because of my mistakes):
Python: 0.56 sec
Clojure: 0.19 sec (iterative implementation through loop-recur)
0.04 sec (concurrent implementation through agents).
Bear in mind that I am conduncting the tests on my girlfriend's borrowed machine and that I installed Cristoph Gohlke's Numpy distribution, which shippes with Intel's MKL statically-linked libraries, so it should be pretty fast in terms of algebraic computations. Perhaps the lack of performance comes from Python's interpreter itself (read-interpret-execute...). This is even more supporting of Clojure, since we are focusing on the infrastructure of both systems.
I will be putting everything in order, making my logistic regression more idiomatic and building some tests.
No comments:
Post a Comment