## Monday, May 28, 2012

### Hypergeometric brain teaser

The statement of the problem is
A bag of N balls has black and white balls. We define a function f such that it is 1 for a black ball and 0 for a white ball, and $\frac{1}{N}\sum_i f(x_i) = p$, where $x_i$ could be either a black or white ball. If we draw balls in groups $G$ of n balls, what would the mathematical expectation of the random variable
$$\frac{1}{n}\sum_{i\in G} f(x_i)$$
Now, $\frac{1}{n}\sum_{i\in G} f(x_i)$ is a random variable because the withdrawal of balls is random. Its expectation is $E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]$. Now, since f marks balls as 0 or 1, this is a variable that counts black withdrawals. Let's see how we can make groups of black and white balls.
Within a group of n balls, how can we withdrawn $k$ black balls? There are $Np$ black balls in the bag. Therefore
$$\left(\begin{array}{c} Np \\ k \end{array}\right)$$
To fill up the rest of the group, we need to pick up white balls. There are $N-Np$ white balls and we need $n-k$. Therefore
$$\left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$
We combine the two parts of a possible group.
$$\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$

Now, how many groups of n balls can we withdraw altogether, disregarding the color of the balls?
$$\left(\begin{array}{c} N \\ n \end{array}\right)$$

To build the probability of a specific group of n balls, k black and n-k white drawn from the bag, we compute its frequency
$$\frac{\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)}{\left(\begin{array}{c} N \\ n \end{array}\right)}$$

If we look for a probability distribution with this print, we find that it is the Hypergeometric distribution, which meets the description of the problem
[...] describes the probability of k successes in n draws from a finite population of size N containing m successes without replacement.
Therefore, since cumulative successes $\sum_{i\in G} f(x_i)$ would have a hypergeometric distribution with mean $\frac{nNp}{N}=np$. But we want $\frac{1}{n}\sum_{i\in G} f(x_i)$, so that $E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]=\frac{np}{n}=p$.

I wrote a program that takes N,n,p as inputs (and a random bag of white and black balls) and computes the expectation, which is always around $p$ (depending on the generated random bag). Do not go very far with $N$, keep it under 30!

push <- function(i,n,N,ind,v) {
es=0;
res=list()
res[[1]]=0
res[[2]]=0

if (i==1) {
for (j in seq(1,N)) {
ind[1]=j
res2=push(i+1,n,N,ind,v)
res[[1]]=res[[1]]+res2[[1]]
res[[2]]=res[[2]]+res2[[2]]
}
print(res[[1]])
print(res[[2]])
return (res[[1]]/(res[[2]]*n))
}
if (i>=n+1) {
es=sum(v[ind])
#print(ind)
res[[1]]=es
res[[2]]=1
return (res)
}
if (ind[i-1]+1<=N) {
for (j in seq(ind[i-1]+1,N)) {
ind[i]=j
res2=push(i+1,n,N,ind,v)
res[[1]]=res[[1]]+res2[[1]]
res[[2]]=res[[2]]+res2[[2]]
}
}
return (res)
}

N=20
p=.2
v=(runif(N)<=p) * 1
ind=rep(0,N)
n=5

push(1,n,N,ind,v)