Monday, May 28, 2012

Hypergeometric brain teaser

The statement of the problem is
A bag of N balls has black and white balls. We define a function f such that it is 1 for a black ball and 0 for a white ball, and $\frac{1}{N}\sum_i f(x_i) = p$, where $x_i$ could be either a black or white ball. If we draw balls in groups $G$ of n balls, what would the mathematical expectation of the random variable
$$\frac{1}{n}\sum_{i\in G} f(x_i) $$
Now, $\frac{1}{n}\sum_{i\in G} f(x_i) $ is a random variable because the withdrawal of balls is random. Its expectation is $E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]$. Now, since f marks balls as 0 or 1, this is a variable that counts black withdrawals. Let's see how we can make groups of black and white balls.
Within a group of n balls, how can we withdrawn $k$ black balls? There are $Np$ black balls in the bag. Therefore
$$\left(\begin{array}{c} Np \\ k \end{array}\right)$$
To fill up the rest of the group, we need to pick up white balls. There are $N-Np$ white balls and we need $n-k$. Therefore
$$\left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$
We combine the two parts of a possible group.
 $$\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$

Now, how many groups of n balls can we withdraw altogether, disregarding the color of the balls?
$$\left(\begin{array}{c} N \\ n \end{array}\right) $$

To build the probability of a specific group of n balls, k black and n-k white drawn from the bag, we compute its frequency
$$\frac{\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)}{\left(\begin{array}{c} N \\ n \end{array}\right)}$$

If we look for a probability distribution with this print, we find that it is the Hypergeometric distribution, which meets the description of the problem
[...] describes the probability of k successes in n draws from a finite population of size N containing m successes without replacement.
Therefore, since cumulative successes $\sum_{i\in G} f(x_i) $ would have a hypergeometric distribution with mean $\frac{nNp}{N}=np$. But we want $\frac{1}{n}\sum_{i\in G} f(x_i)$, so that $E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]=\frac{np}{n}=p$.

I wrote a program that takes N,n,p as inputs (and a random bag of white and black balls) and computes the expectation, which is always around $p$ (depending on the generated random bag). Do not go very far with $N$, keep it under 30!


push <- function(i,n,N,ind,v) {
 es=0;
 res=list()
 res[[1]]=0
 res[[2]]=0

 if (i==1) {
  for (j in seq(1,N)) {
   ind[1]=j
   res2=push(i+1,n,N,ind,v)
   res[[1]]=res[[1]]+res2[[1]]
   res[[2]]=res[[2]]+res2[[2]]
  }
  print(res[[1]])
  print(res[[2]])
  return (res[[1]]/(res[[2]]*n))
 }
 if (i>=n+1) {
  es=sum(v[ind])
  #print(ind)
  res[[1]]=es
  res[[2]]=1
  return (res)
 }
 if (ind[i-1]+1<=N) {
  for (j in seq(ind[i-1]+1,N)) {
   ind[i]=j
   res2=push(i+1,n,N,ind,v)
   res[[1]]=res[[1]]+res2[[1]]
   res[[2]]=res[[2]]+res2[[2]]
  }
 }
 return (res)
}



N=20
p=.2
v=(runif(N)<=p) * 1
ind=rep(0,N)
n=5

push(1,n,N,ind,v) 

No comments:

Post a Comment