Wednesday, August 28, 2013

Clojure for project development

In line with my last Clojure post, and with several comments I have come across the internet, I am going to post a guide to build a Clojure piece of software runnable from the beginning (i.e., outside of the REPL).

First of all, you need a JVM, obviously. If you haven't done that, install the latest version of Java SDK.

Now you want to install Leinigen. Leinigen is a building and dependency management tool favored by the Clojure community. Download Leinigen script here https://raw.github.com/technomancy/leiningen/stable/bin/lein.  Now you need to place it on a directory within your PAT (for example, /bin) and set it to be executable with chmod 755 ~/bin/lein.

Once we have done that, we need to tell the script to download the Leinigen system. You can easiy do that with lein self-install.

Now you can create a Clojure project, called hello:
lein new app hello
This uses the template app to create your new project. Now cd into the new directory, collect the dependencies and run the tests.
cd hello
lein deps
lein test
You'll see a single testcase which deliberately fails:
Testing hello.core-test FAIL in (replace-me) (core_test.clj:6) expected: false   actual: false Ran 1 tests containing 1 assertions. 1 failures, 0 errors.
Great! Clojure is installed in this project and working! To get a feel for Clojure, let's try out some basic stuff by starting a script console:
lein repl
You'll see something like:

nREPL server started on port 59654 on host 127.0.0.1
REPL-y 0.3.0
Clojure 1.5.1
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=>
Type  
(println "Hello World!") 
and press return. You should get:
Hello World!
nil
user=>
Now let's define a function that does that:
(defn greet [] (println "Hello World!"))
The console will respond:
#'user/greet user=>
Run the function:  
(greet)
Hello World! nil user=>
Returning to the project, edit src/hello/core.clj (the basic source skeleton that Leiningen created for you above). Add our greet function to it and call it, so core.clj reads:
(ns hello.core) (defn greet [] (println "Hello World!"))

(defn -main[] (greet "Sean"))
The (ns hello.core) line declares the namespace (think Java package) in which the code lives. The -main function will be the regular Java main function and we tell Clojure by writing the - prefix.
We can run this via Leiningen:
lein run -m hello.core
The -m argument specifies the namespace in which -main is defined.

Now let's modify our script so we can compile it and run it via the JVM. First we need to update the namespace declaration to tell Clojure we want to generate a (Java) class file, we remove the spaces in the output by call str to construct a single string (so we need a space after Hello), and we change our main method to accept an argument:
(ns hello.core (:gen-class))
(defn greet[who] (println (str "Hello " who "!")))
(defn -main[who] (greet who))
We also need to tell Leiningen about our main class. Edit project.clj and add a :main declaration so it looks like this:
(defproject hello "1.0.0-SNAPSHOT" 
   :description "FIXME: write"
   :dependencies [[org.clojure/clojure "1.2.1"]]
     :main hello.core)
Don't worry about the rest of it, that's part of the Leiningen/Maven magic used to ensure the right libraries are available. Now tell Leiningen to compile your script and create a JAR that we can execute via Java:
lein uberjar
If you look in the current directory, you'll see hello-1.0.0-SNAPSHOT.jar and hello-1.0.0-SNAPSHOT-standalone.jar and it's the second one we'll use:
java -cp hello-1.0.0-SNAPSHOT-standalone.jar hello.core
 You have now a functional project and you are ready to write some code for production software.

Sunday, August 25, 2013

The perils of the REPL

Functional [programming] people are proud of their new toy called the REPL almost as if interactive development was a new concept. I guess that coming from Java and the generalist software development languages makes you think it is (although generalist but interpreted languages such as Python have always had an interactive interpreter). People who have worked with scientific modelling software such as Matlab or R (myself included) are used to this way of developing: rapidly modelling an idea into a few lines that could be recalled and modified according to one's needs.

This, however, becomes dangerous when developing software. When making a software product, one is one step ahead from bare modeling, in the sense that full working conditions are taken into account, one of them being program startup. I say this because I've read some books on Clojure and always found them to work with the REPL to describe de language, obviating the classical software bulding cicle of write a source code file which includes a main function or entry point, compile it and execute the result. The REPL gives you the advantage of quick modeling, but it is very different from writing Clojure source files and integrate them into a whose system intended for production.

Despite not being so difficult being Clojure a JVM system, almost none of them explain the entry point to the program, and they stick to explaining language sintax and basic libraries on the REPL, forgetting about entry points and other production software issues such as multiple file integration. Some of them don't even include a section to Leiningen or Maven, and jump to using advanced features such as databases (Redis, MySQL, HBase) or web toolkits. Even those that come with brief introduction don't even put the reader in a context of making a deployable piece of software. Therefore, readers must resort to blogs to find that kind of information.

The REPL is useful. It is as useful as it is in the scientific/modeling world, but as software developers with deployable product, programmers must deal with things further than testing live, more important to their business.

Friday, August 2, 2013

Installing Theano on Windows 64 bit (x86_64) with GPU capabilities

Since Theano team works under Linux, those of us that bought a laptop with a fancy Windows version pre-installed and decided that we wanted some compatibility with technology-reluctant friends and family (therefore assuming difficulties with everything else), we are doomed to hack our way into getting Theano up and running.

In this post I assume you are going with Cristoph Gohlke's packages (for reasons, read a previous post)

Make sure you also have MS Visual C++ and the NVidia CUDA Toolkit. If you don't have it, add the Visual C++ cl.exe compiler's directory to the path. Mine was under C:\Program Files (x86)\Microsoft Visual Studio 10\VC\bin.

First think you need, after installing Theano, is the nose package, since Gohlke's build needs it at initialization time. Download it and install it from Gohlke's site along with Theano.

Next, you need this .theanorc to be put under your home directory under C:\USER\<yourname>
[global]device = gpu
[nvcc]compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin# flags=-m32 # we have this hard coded for now
[blas]ldflags =# ldflags = -lopenblas # placeholder for openblas support
I am not very sure how to use OpenBLAS from here. I assume that if all CPU operations are done via Numpy and SciPy, then their default BLAS routines are used, and no direct call to a third BLAS implementation is made, but who knows! (Well, I looked into it a little bit and it seems Theano calls BLAS directly, I guess you may want to install OpenBLAS).

OK, we have the NVidia compiler and tools, the MS compiler that nvcc needs and the configuration. The last thing we need is to install a GNU C and C++ compiler that supports 64 bit Windows binary creation. There is a project called MinGW-w64 that does that. I recommend to download a private build from the user rubenvb that does not come along with the Python environment embedded as the more official build does. Put the bin directory (where GCC is located) of that installation in the Path (Control panel, etc). Theano needs this to compile the symbolic operations to object code and then to CUDA kernels if applicable, I presume.

If you run into errors of type "GCC: sorry, unimplemented: 64-bit mode not compiled in", then your MinGW is not x86_64 compliant. The NVidia compiler nvcc can also complain if it finds no cl.exe in the path.

By the way, all of this was to use deep learning techniques for Kaggle competitions, so the intended consequence was to install PyLearn2. This is not listed under Gohlke's libraries, but it is not low level and all is based on Theano and maybe other numerical packages such as Numpy. Being a pure Python package, you need to clone it from Github:
git clone git://github.com/lisa-lab/pylearn2.git
And then perform
cd pylearn2
python setup.py install
There is an easier procedure that will not require you to manually perform the git operations, and it is through pip
pip install git+git://github.com/lisa-lab/pylearn2.git
You have pip under your Python installation, within the Scripts directory, in the case it came with Python, or if you got Gohlke's installer.

This will also leave the module correctly accessible through Python.

Edit: Pylearn2's tutorial test is a little bit complicated to be a "hello world" test, so I looked for another quick example to see if my installation was finished. A very nice one popped up in this link, which I reproduce here. But first I have to tell that this made me realize that Gohlke's Theano is missing three files, something very, very strange since they are called from within Theano. In particular, the module missing is everything under theano.compat. In this case, just copy the contents from Theano's Github repository directory compat to a compat directory created on your local theano installation under Python 2.7 (mine C:\Python27\Lib\site-packages\theano).

After that, run the code in this link, which is a neural network solving the XOR problem. And we are done.

MinGW-w64: rubenvb build.
Python libraries and builds for Windows: Cristoph Gohlke.
Link to a "truer" hello world Pylearn2 program: here.