Thursday, April 19, 2012

Shape recognition and shape spaces in computer vision

One of the problems in computer vision and pattern recognition is that of shape recognition (classification, clustering and retrieval). It is important in statistical analysis of medical images (shape outliers might indicate disease)  and machine vision (digital recording and analysis based on planar views of 3D objects).

Classical statistical shape analysis is due to Kendall, Bookstein and Mardia. This classical shape analysis treats shapes as shape spaces, whose geometries are those of differentiable manifolds often with appropriate Riemannian structures.

Imagine we are given a set of variables in which each variable is a position of a prearranged landmark (interesting point with a load of information) and our setting is 2D. Then we can model the points as complex numbers $p \in \mathbb{C}$. Therefore each point can be written as $p_i=x_i + iy_i$ where $i=\sqrt{-1}$. A k-ad $p={p_i}_k$ is a set of $k$ shape landmarks.

We now center these points so as to make them translation invariant.
$$w_T=p-\bar{p}$$
where $\bar{p} \in \mathbb{C}^k$ is a vector with all elements set to the k-ad mean, this is $p{_Tj}=\frac{1}{k}\sum_k{p_i}$.

Now we make them scale invariant by dividing each element in $w_T \in \mathbb{C}^k$ by the absolute value of $w_T$, this is $w_S = \frac{w_T}{|w_T|}$.

Finally given the rotation matrix with angle $\theta$
$$\left[\begin{array}{cc}cos(\theta) & -sin(\theta) \\ sin(\theta) & cos(\theta)\end{array}\right]$$
one can rotate the centered shape with $Rw_S$ (if we had modeled it in $\mathbb{R}^{2\times k}$, but we are in the complex domain and it is done by multiplying each element by $e^{i\theta}$. Now, this is when it gets interesting and useful (and where I see the true magic). To make our representation rotation invariant, we can use the Veronese-Whitney embedding, so that the rotation invariant representation is the rank one matrix $w_R=w_S w_S^*$, where $\cdot^*$ is the complex conjugate. This means that, by what we have seen before, any rotation of $w_S$, $e^{i\theta} w_S$ generates
$$e^{i\theta}w_S e^{-i\theta}w_S^* = w_S w_S^*$$
since $e^{-i\theta}$ is the complex conjugate of the rotation (meaning that we undo the rotation by rotating $-\theta$ degrees. This is readily seen by the (complex) exponential arithmetic $e^{i\theta}e^{-i\theta} =  = e^{i(\theta - \theta)} = 1$.

This space is that of the complex hermitian matrices and we can define the norm of a shape and the distance between two shapes by
$$\|A\| = trace(A) \\
\rho(A,B) = \|A-B\|$$

Now we can compute the mean shape of populations and do some inference. For example, populations of known healthy livers against livers with disease, and get confidence intervals so as to accept or reject null hypothesis given a new liver shape.

We see that modeling in $\mathbb{R}^{2\times k}$ and using the matrix $R$, we achieve the same result:
$$wR(wR)^T = wRR^T w^T = ww^T$$

Check out: http://stat.duke.edu/~ab216/duke-talk.pdf

No comments:

Post a Comment