Monday, July 29, 2013

Mathematical analysis of MapReduce

Everybody is talking about MapReduce. They talk a lot about it even though they barely know what it is. I guess they talk about it because of all Google hype.

To put it simply, MapReduce is
$$\left. F (f_y) \right|_{y=k}$$

where the function $f$ is the map, $F$ is the reduce and $k$ is the key. In the special case the reduce just adds the values, the above becomes
$$\left. \int f_y (x) dx \right|_{y=k}$$
where $x$ are the values and $F$ is a linear functional (i.e., an element of the algebraic dual of the space where $f_y$, for all $y$ -the keys-, live.

The prominent example of computing the maximum temperatures from "Hadoop: The definitive guide" is the operation
$$\left. \| f_y \|_{\infty} \right|_{y=k}$$

It is "just" an abstraction of a basic operation found ubiquitously.

1 comment:

1. Well I post this several days ago with the intention that somebody pointed me out that the inf norm is not equivalent to map reduce with the aggregation of all mapped values.

It is only equivalent in the positive octant of the values only. I don't want any more time to pass without this pointed out correctly, since unintended harm may come out of it.