Machinomics: January 2014

Friday, January 31, 2014

Some sad goodbies :-(

I am sadly taking out several blogs from my read list at the right...

Our teacher (by his book) Larry Wasserman decided to stop writing on his blog. There, he mentioned that sparse regressors, being as useful as they are, have a infinite risk in a region near zero, which could be of no practical consequence for most applications but may be too dangerous for some cutting edge applications (precision robotics, high frequency trading...). He left the door open for a comeback, we certainly hope so!

mathbabe (Cathy O'Neil). She writes a lot and I was finding her blog posts a little bit confusing for and many of them not in line with what I want to read or show in my blog, even though I support some of her views and agree on her reasons to short Facebook, which in turn coincide with a recent Princeton application of an empidemology model to social networks.

Peekaboo (Andreas Müller), seems to have left his blog aside and moved on. He has very interesting material there, though.

Monday, January 27, 2014

Strongly typed external data sources language integration in F#

I have read something that I bot like and consider important: Data sources integration in the language.

I like it because it is a really interesting feature in the sense that data is integrated into the language just like control flow, for example, quoting the paper by Syme et al. (see below), we may use

let data = Samples.DataStore.Freebase.FreebaseData.GetDataContext()
data.Sports.Baseball.Coaching_Positions

Even when specifying the route to te database Coaching_Positions, the IDE itself will show us all options as it finds them, much like code completion with language keywords and loaded types.

On the other hand, this is important because is of high value for the data scientist. Moreover, in F#, they have made it so as to scale with internet data (Big Data) and especially designed it to word from the cloud, making the model especially appealing for modern use. I hope Clojure catches up in this field and adds something of the like soon.

http://research.microsoft.com/pubs/192598/ddfp-information-rich-fp-themes-v5.pdf

Saturday, January 25, 2014

Denarius Exchange organization created on Github

Dear all,

I've transferred the Denarius' repository to an organization focused on developing the exchange. Anybody is invidted to join and contribute.

Te new address is:
https://github.com/denarius-exchange/denarius

On the technical side, a matching is implemented with hash maps now, and we get slightly more speed, but we are introducing vectors and queue operations on them, which will yield about five times more speed.

Next ahead are the connection nodes and protocols implemented on them.

Friday, January 24, 2014

Monty Hall revisited

If I tell you: There are 1K bitcoins 1 wallet that will be yours if you guess which wallet out of three is the right one, the rest containing an amount of zero bitcoins, and ask you to point out an initial selection, then showed you that, effectively, one of the remaining wallets contains zero bitcoins... and finally giving you the opportunity to change wallet. Would you change? The awnser is yes.

This is so because there is new evidence now that supports a higher probability that the remaining unseen wallet is the right choice, whereas there is none about your current choice. The fact that you selected wallet 1, and given that choice, I showed you wallet 2, that leaves wallet 3 with a posterior probability of 2/3. This does not happen for our current wallet 1, since choosing 1 influenced my decision to show you 2. More precisely: You chose wrongly with probability 2/3. With that probability, I show you the only possible door that I can, leaving the 2/3 for the remaining unseen and unchosen door. On the contrary, you choose well with 1/3 probability, but then I can choose among 2 doors to show you, each with a probability of 1/2. This is how we include my decision (or necessity) to show you 2 into the math (this is the best explanation you are gonna get from all over the internet):
Let's call R "right choice" V "visible incorrect wallet" and S "your choice". We need to compute $P(R=3|V=2,S=1)$, the probability of 3 being the right wallet, after you selected 1 and I showed you that 2 was not right (remember that all priors are 1/3).
$$P(R=3|V=2,S=1)=\frac{P(V=2,S=1|R=3)P(R=3)}{P(V=2,S=1|R=3)P(R=3)+P(V=2,S=1|R=1)P(R=1)}\\=\frac{1\times 1/3}{1\times 1/3 + 1/2 \times 1/3}=2/3$$$P(V=2,S=1|R=3)=1$ is the probability that, given R=3, then I was forced to show you the incorrect wallet remaining (you already chose one incorrect wallet). $P(V=2,S=1|R=1)=1/2$ because there are two possible incorrect wallets (since you selected the correct one) that I can choose from to show you.

Let's compute the same posterior for the case I decide not to change wallet:
$$P(R=1|V=2,S=1)=\frac{P(V=2,S=1|R=1)P(R=1)}{P(V=2,S=1|R=3)P(R=3)+P(V=2,S=1|R=1)P(R=1)}\\=\frac{1/2\times 1/3}{1\times 1/3 + 1/2 \times 1/3}=1/3$$.

Therefore if you change you have more chances of winning the 1000 bitcoins.

Needless to say, this works for every possible combination of $R$, $S$ and $V$.
This happens, as I mentioned, because of the way I was influenced (forced) to show you the incorrect remaining wallets. To see it intuitively, imagine 100 wallets, and that you chose one amongst them, and I am forced to show you 98 incorrect wallets, leaving your choice and another one. Is it more likely that this particular wallet is the correct one (that your choice forced me to leave it) or that you chose wisely amongst 100 wallets? If you choose 99 incorrect wallets, the set that I show you is the same, except for the chosen incorrect wallets each time, and will never contain the particular correct wallet.

There is a cool Android app in case you want to check how the law of large numbers works for this problem.

Saturday, January 18, 2014

Things about Python that I find cool

Python is a formidable language with functional capabilities that is extraordinarily easy to learn and follow. It is interpreted so that scripting and prototyping are possible.

Commentaries follow UNIX style so that you can use it for building scripts (and therefore program CGIs).

Despite striking as too odd at first sight, Python's syntax overweights its cons by offering a clear and simple look which noticeably increases productivity. Python relies on indentation to define its blocks, which means that you get rid of parentheses, brackets, black keywords such as begind or end, etc. This simplifies code and makes it more readable

for i in vector:
     if i==1:
        print i<<1
     else:
        print i**2
     print "Loop: %d" % i

IPython notebook (which requires Tornado libraries), is an interesting addon over IPython (which itself gives you nice command line ammenities). It is a Javascript/CSS served over a local HTTP server which allows you for nice online prototyping on your web browser. Results are inlined much like a Maple or Mathematica session.

It allows for powerful expressions, such as performing a transformation on filtered data with a near natural language expression:

[x**2 for i, x in enumerate(col) if i!=0 or x]

Here we transform all elements in the vector col squaring them but only those whose index is not zero.

Decorators are a powerful way to add behavior to a function, adding code that performs some task before and after the target function is executed. The canonical use of decorators is adding logging behavior to functions. In Python we can use them easily and non-intrusively by adding the operator @ and the decorator function to the definition of the function

def decorator(func):
    def inner(x):
       print "Arguments were ", x
       y = func(x)
       print "Result is", y
    return innter

The argument operators * and ** allow for generics construction within the functions. By convention, one uses *args for sequential expansion of the argument list and of and **kwargs for named expansion of an argument list. This means that you can use * in the function definition to represent a list of arguments passed in order, therefore *args becomes a list of values that you can then use. On the other hand, calling a function that has named parameters with a list of values with the operator * will result in expanding the values to these arguments.

def test(a=1, b=2):
print a, b
args=(1,2)
test(*args)
1 2

Similarly, **kwargs cab be used to either specify a list of generic argument list to make the behavior of the function dynamical (for example, your function is a relay and just gets arguments that passes on to other functions that will interpret them), or be used to expand a map of named arguments in this way

def test(a=1, b=2):
print a, bs
kwargs = {"a":1,"b":2}
test(**kwargs)
1 2

Althoug the following does not refer to the language itself, it has to do with the community, since there exists a service called PyCloud that enables Python practitioners to use CPU power for their Python numerical needs. As per the PyCloud site:

The PiCloud Platform gives you the freedom to develop your algorithms and software without sinking time into all of the plumbing that comes with provisioning, managing, and maintaining servers.

You have 20 free computing hours.

Finally, a powerful and comprehensive set of libraries has been built on Python, especially for Data Analysis: Python probably has the largest collection of opensource libraries for data analysis. It includes: NumPy, SciPy, Pandas, Sklearn, NLTK and many others.

Monday, January 6, 2014

Denarius financial exchange announcement

I've been busy on Christmas working on Denarius, a financial exchange. This is the announcement that I've benn posting on several sites (see, for example, on bitcointalk).

Dear all,

I just started out a new open source project to implement a financial exchange project.

The implementation is in Clojure, which has already given the project a kick start with the matching engine concurrency. Interested Clojure programmers are invited to join.

Also, programmers working in finance, parallel systems and databases are kindly invited to join. Contributions to the core system design are highly welcome.

This project stems from unsatisfactory use of other existing projects regarding financial exchanges, and is of interest for the bitcoin community since cryptocoins are among the tradeable assets in the software.

This account is newbie's, so diffusion to other parts of bitcointalk is appreciated.

Subscribe just sending an email to: denarius@librelist.com
Source code: https://github.com/analyticbastard/denarius
Wiki: https://github.com/analyticbastard/denarius/wiki/Introduction

Machinomics