Processing math: 100%

Monday, May 28, 2012

Hypergeometric brain teaser

The statement of the problem is
A bag of N balls has black and white balls. We define a function f such that it is 1 for a black ball and 0 for a white ball, and \frac{1}{N}\sum_i f(x_i) = p, where x_i could be either a black or white ball. If we draw balls in groups G of n balls, what would the mathematical expectation of the random variable
\frac{1}{n}\sum_{i\in G} f(x_i)
Now, \frac{1}{n}\sum_{i\in G} f(x_i) is a random variable because the withdrawal of balls is random. Its expectation is E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]. Now, since f marks balls as 0 or 1, this is a variable that counts black withdrawals. Let's see how we can make groups of black and white balls.
Within a group of n balls, how can we withdrawn k black balls? There are Np black balls in the bag. Therefore
\left(\begin{array}{c} Np \\ k \end{array}\right)
To fill up the rest of the group, we need to pick up white balls. There are N-Np white balls and we need n-k. Therefore
\left(\begin{array}{c} N-Np \\ n-k \end{array}\right)
We combine the two parts of a possible group.
 \left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)

Now, how many groups of n balls can we withdraw altogether, disregarding the color of the balls?
\left(\begin{array}{c} N \\ n \end{array}\right)

To build the probability of a specific group of n balls, k black and n-k white drawn from the bag, we compute its frequency
\frac{\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)}{\left(\begin{array}{c} N \\ n \end{array}\right)}

If we look for a probability distribution with this print, we find that it is the Hypergeometric distribution, which meets the description of the problem
[...] describes the probability of k successes in n draws from a finite population of size N containing m successes without replacement.
Therefore, since cumulative successes \sum_{i\in G} f(x_i) would have a hypergeometric distribution with mean \frac{nNp}{N}=np. But we want \frac{1}{n}\sum_{i\in G} f(x_i), so that E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]=\frac{np}{n}=p.

I wrote a program that takes N,n,p as inputs (and a random bag of white and black balls) and computes the expectation, which is always around p (depending on the generated random bag). Do not go very far with N, keep it under 30!


push <- function(i,n,N,ind,v) {
 es=0;
 res=list()
 res[[1]]=0
 res[[2]]=0

 if (i==1) {
  for (j in seq(1,N)) {
   ind[1]=j
   res2=push(i+1,n,N,ind,v)
   res[[1]]=res[[1]]+res2[[1]]
   res[[2]]=res[[2]]+res2[[2]]
  }
  print(res[[1]])
  print(res[[2]])
  return (res[[1]]/(res[[2]]*n))
 }
 if (i>=n+1) {
  es=sum(v[ind])
  #print(ind)
  res[[1]]=es
  res[[2]]=1
  return (res)
 }
 if (ind[i-1]+1<=N) {
  for (j in seq(ind[i-1]+1,N)) {
   ind[i]=j
   res2=push(i+1,n,N,ind,v)
   res[[1]]=res[[1]]+res2[[1]]
   res[[2]]=res[[2]]+res2[[2]]
  }
 }
 return (res)
}



N=20
p=.2
v=(runif(N)<=p) * 1
ind=rep(0,N)
n=5

push(1,n,N,ind,v) 

No comments:

Post a Comment