Machinomics: Naive Bayes with Map Reduce

Thursday, November 7, 2013

Naive Bayes with Map Reduce

A fairly straighforward way of implmenting the Naive Bayes classifier for discrete data is using Map Reduce. This is especially useful if you have a bunch of characteristic or naturally discrete data that you can exploit, such as presence/absence, amount of clicks, page/item visited or not, etc.

This can be achieved by first using the data attributes as the key, and the labels as the values on the mapper, in which we need to process the keys and values in this way:

emit the label as key
for each variable (attribute) emit its index (for example, column index) also as key

We only need to emit the category (attribute value) as the value

In the reducer, we need to scan each category and find out how many of the elements in the current key belong to to a category, and divide by the sum of all its categories (which are our values) all which constitutes

$P(X_i=x_{i,0}|y=y_0)$ , for which we emit a triplet

emit the label as key
for each variable (attribute) emit its index (for example, column index) also as key
emit the category for this attribute of this example

As value we only need to emit the previous division.

To find out a new instance, we look into the dictionary entry corresponding to its attributes and return the bayes quotient.

I've just implemented this in MyML.

As an example:import numpy as np

Xd=np.random.random((256,2))
X=1*(Xd<.5)
y=1*(Xd.sum(axis=1)<.5)

from myml.supervised import bayes

reload(bayes)
nb = bayes.NaiveBayes()
nb.fit(X, y)
nb.predict(X[0,:])
pred=nb.predict(X)

1.0*np.sum(1.0*(pred>.5).reshape((1,len(y)))[0]==y)/len(y)

0.89453125

print X.T
[[0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 1 0
1 0 0 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1
0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0
1 0 1 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1
1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 0 1 0 1 0 1 0 0 1
1 1 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 0 1 0 1 0 1 1 1 0
0 0 1 1 1 1 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 1 1 0 1]
[0 0 1 0 1 0 0 0 1 1 1 1 1 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 0 0
1 0 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 1 1 1 0 1 0 1 0 1 0 0 1 0 1 1 1 1 1 1
0 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 0 0
1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 0 1 1 0 0 0
0 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 1 0
1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 1
1 1 1 1 1 0 1 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 0 1 0 0 0]]

y
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0])

Machinomics

Thursday, November 7, 2013

Naive Bayes with Map Reduce

No comments:

Post a Comment

About Me

Visit my blog about Economics and Markets