Machinomics: April 2012

Monday, April 30, 2012

Miscellaneous errors with Visual C++

If you compile the project and receive the following error:

fatal error LNK1104: cannot open file 'C:\Program.obj'

This particular issue is caused by specifying a dependency to a lib file that had spaces in its path. The path needs to be surrounded by quotes for the project to compile correctly.

On the Configuration Properties -> Linker -> Input tab of the project’s properties, there is an Additional Dependencies property. This issue was fixed by changing this property from:

C:\Program Files\sofware sdk\lib\library.lib
To:
" C:\Program Files\sofware sdk\lib\library.lib"

Thursday, April 19, 2012

Shape recognition and shape spaces in computer vision

One of the problems in computer vision and pattern recognition is that of shape recognition (classification, clustering and retrieval). It is important in statistical analysis of medical images (shape outliers might indicate disease) and machine vision (digital recording and analysis based on planar views of 3D objects).

Classical statistical shape analysis is due to Kendall, Bookstein and Mardia. This classical shape analysis treats shapes as shape spaces, whose geometries are those of differentiable manifolds often with appropriate Riemannian structures.

Imagine we are given a set of variables in which each variable is a position of a prearranged landmark (interesting point with a load of information) and our setting is 2D. Then we can model the points as complex numbers $p \in \mathbb{C}$. Therefore each point can be written as $p_i=x_i + iy_i$ where $i=\sqrt{-1}$. A k-ad $p={p_i}_k$ is a set of $k$ shape landmarks.

We now center these points so as to make them translation invariant.
$$w_T=p-\bar{p}$$
where $\bar{p} \in \mathbb{C}^k$ is a vector with all elements set to the k-ad mean, this is $p{_Tj}=\frac{1}{k}\sum_k{p_i}$.

Now we make them scale invariant by dividing each element in $w_T \in \mathbb{C}^k$ by the absolute value of $w_T$, this is $w_S = \frac{w_T}{|w_T|}$.

Finally given the rotation matrix with angle $\theta$
$$\left[\begin{array}{cc}cos(\theta) & -sin(\theta) \\ sin(\theta) & cos(\theta)\end{array}\right]$$
one can rotate the centered shape with $Rw_S$ (if we had modeled it in $\mathbb{R}^{2\times k}$, but we are in the complex domain and it is done by multiplying each element by $e^{i\theta}$. Now, this is when it gets interesting and useful (and where I see the true magic). To make our representation rotation invariant, we can use the Veronese-Whitney embedding, so that the rotation invariant representation is the rank one matrix $w_R=w_S w_S^*$, where $\cdot^*$ is the complex conjugate. This means that, by what we have seen before, any rotation of $w_S$, $e^{i\theta} w_S$ generates
$$e^{i\theta}w_S e^{-i\theta}w_S^* = w_S w_S^*$$
since $e^{-i\theta}$ is the complex conjugate of the rotation (meaning that we undo the rotation by rotating $-\theta$ degrees. This is readily seen by the (complex) exponential arithmetic $e^{i\theta}e^{-i\theta} = = e^{i(\theta - \theta)} = 1$.

This space is that of the complex hermitian matrices and we can define the norm of a shape and the distance between two shapes by
$$\|A\| = trace(A) \\
\rho(A,B) = \|A-B\|$$

Now we can compute the mean shape of populations and do some inference. For example, populations of known healthy livers against livers with disease, and get confidence intervals so as to accept or reject null hypothesis given a new liver shape.

We see that modeling in $\mathbb{R}^{2\times k}$ and using the matrix $R$, we achieve the same result:
$$wR(wR)^T = wRR^T w^T = ww^T$$

Check out: http://stat.duke.edu/~ab216/duke-talk.pdf

Setting up CUDA in Windows and Visual C++ 2008, Addendum II

After everything is installed, you will need to set additional options. Credit for this goes to the user Tom of Stackoverflow. I will reproduce here the steps to configure CUDA in newer Toolkit versions. Users interested in older ones should check the link at the end of the post.

In addition to the previous CUDA posts, where you can learn how to install the required tools and make Visual C++ understand what you are using, you need to:

Add the NvCudaRuntimeApi.rules to the project file (right click on the project, Custom Build Rules, tick the relevant box).
Add the CUDA runtime library (right click on the project and choose Properties, then in Linker -> General add $(CUDA_PATH)\lib\$(PlatformName) to the Additional Library Directories and in Linker -> Input add cudart.lib to the Additional Dependencies).
Optionally add the CUDA include files to the search path, required if you include any CUDA files in your .cpp files (as opposed to .cu files) (right click on the project and choose Properties, then in C/C++ -> General add $(CUDA_PATH)\include to the Additional Include Directories).
Then just build your project and the .cu files will be compiled to .obj and added to the link automaticaly
Change the code generation to use statically loaded C runtime to match the CUDA runtime; right click on the project and choose Properties, then in C/C++ -> Code Generation change the Runtime Library to /MT (or /MTd for debug, in which case you will need to mirror this in Runtime API -> Host -> Runtime Library). You might get error LNK4098 otherwise.

Error tips: Having mismatched version of the C runtime can cause a variety of problems; in particular if you have any errors regarding LIBCMT (e.g. LNK4098: defaultlib 'LIBCMT' conflicts with use of other libs) or multiply defined symbols for standard library functions, then this should be your first suspect. Also, you might see LNK2001, LNK2005 when some conflict with the C++ runtime exists. A possible workaround might be using C functions.

If you have Visual Studio 2010, here are the details for you:
http://stackoverflow.com/questions/3778799/how-do-i-start-a-cuda-app-in-visual-studio-2010/7285235#7285235

For older versions of CUDA:
http://stackoverflow.com/questions/2046228/how-do-i-start-a-new-cuda-project-in-visual-studio-2008

Monday, April 16, 2012

Setting up CUDA in Windows and Visual C++ Express 2008 Addendum

With CUDA Toolkit 4.x, things are slightly different.

Syntax highlighting

The necessary files are under
<CUDA_SDK>\C\doc\syntax_highlighting

Building

The following procedure is credited to the user sdtougao from NVIDIA forums. This is what you need to build your projects correctly.

Copy the rules files, <NVIDIA GPU Computing Toolkit>\CUDA\v4.0\extras\visual_studio_integration\ rules to the VS2008 installation directory, <Microsoft Visual Studio 9.0>\VC\VCProjectDefaults
Open a SDK project in VS2008, for example clock.sln, open "custom build rule",select "NvCudaRuntimeApi.rules".

I don't even need to do the second step and SDK examples compile without errors. I can't know if that is related to my previous installation, though.

Intellisense

A registry key file is provided so that double-clicking on it adds the necessary keys to the registry without any extra effort.

Sunday, April 8, 2012

Setting up CUDA on Windows and Visual C++ Express 2008

Assuming we already have Visual C++ 2008 Express installed, we need to perform the following
steps:

Download and install the latest nVidia driver with CUDA that is compatible with the system’s graphics card. Then reboot your system so that the driver loads to memory. Note that nVidia publishes a special series of drivers for desktops and for notebooks cards (the later with some time lag).
Download and install both the CUDA Toolkit and the CUDA SDK. These must be the same version as the driver before. If the examples in the bin directory of the SDK do not work, this indicates you have a wrong version of the driver (and of the whole package).

Syntax highlight and Intellisense

Since the extension .cu is not recognized as C code by Visual C++, it will not highlight the language’s reserved words. NVidia supply a usertype.dat under: “<CUDA SDK install dir>\doc\syntax_highlighting\visual_studio_8”.

Copy this file to: “\Program Files\Microsoft Visual Studio 9.0\Common7\IDE”. Now open Visual Studio and navigate to: tools->options->text editor->file extension. In the extension box type “cu”, make sure you select “Microsoft Visual C++”.

For Intellisense, the process is not automatic as the previous ones. It involves dealing with internal Visual C++ registry settings:

On Windows 7, open the registry editor and navigate to “HKEY_USERS\<user code>\Software\Microsoft\VCExpress\9.0\Languages\Language Services\C/C++” then go to the string entry “NCB Default C/C++ Extensions” and add “;.cu;.cuh” to the end of the list. The <user code> is machine-specific, the best thing to do is look for “NCB Default C/C++ Extensions” in the registry editor. Restart Visual Studio and Intellisense will work for your CUDA files. If there is a specific CUDA C feature, it will not recognize it, but all the normal C keywords will be benefited.

Building

nVidia supply a rules file which you will find under the “<CUDA SDK install dir>\common”
directory.

Select it under “project -> custom build rules -> find existing”. This will allow the IDE to find the compiler nvcc for the .cu files.

After Visual C++ instructs nvcc to compile .cu files, it links the object files (with the native compiler) into an executable file. Inspect the following output:

Debug Win32 ------
1>Compiling with CUDA Build Rule...
1>"D:\my\CUDA\bin\nvcc.exe" -arch sm_10 -ccbin "C:\Program Files\Microsoft Visual Studio
9.0\VC\bin" -use_fast_math -Xcompiler "/EHsc /W3 /nologo /O2 /Zi /MT " -
I"D:\my\CUDA_SDK\common\inc" -maxrregcount=32 --compile -o "Debug\kernel.cu.obj"
"c:\Users\Gabo\Documents\Visual Studio 2008\Projects\more\cuda\kernel.cu"
1>kernel.cu
1>tmpxft_00001314_00000000-3_kernel.cudafe1.gpu
1>tmpxft_00001314_00000000-8_kernel.cudafe2.gpu
1>tmpxft_00001314_00000000-3_kernel.cudafe1.cpp
1>tmpxft_00001314_00000000-13_kernel.ii
1>Compilando...
1>main.cpp
1>math.cpp
1>Building code...
1>Linking...
1>main.obj : warning LNK4075: se omite '/EDITANDCONTINUE' due to the specification
'/INCREMENTAL:NO'
1>Incrustando manifiesto...
1>El registro de compilación se guardó en el "file://c:\Users\Gabo\Documents\Visual Studio
2008\Projects\more\cuda\Debug\BuildLog.htm"
1>cuda - 0 errors, 3 warnings

Notice that the compilation of the .cu files (with nvcc), then the .cpp files (with the Visual C++
compiler) and, last, the linking process.
Remark: nVidia compiler, nvcc, treats differently the objects it finds in the .cu files. If it is a kernel, it compiles it into PTX instructions and stores them into a data section of the object file. On the other hand, if it finds a plain C function, it compiles it as any C compiler would do, storing native hardware instructions into the executable text section of the object file. Remark: Since the device code is executed in the GPU, the kernels live as data in the host code and are copied to the device memory in an I/O operation initiated by the code nvcc put when the C function in the .cu file invoked the kernel. This is explained later.

Linking difficulties

For a seamless compilation, the best thing to do is to copy an existing CUDA project from the SDK to a separate folder, including the libraries (from the “common” directory), rename and redistribute everything and start from there

However, if an output like this is experienced:

1>MSVCRTD.lib(MSVCR90D.dll) : error LNK2005: ya se definió _free en LIBCMT.lib(free.obj)
1>MSVCRTD.lib(MSVCR90D.dll) : error LNK2005: ya se definió _malloc en LIBCMT.lib(malloc.obj)
1>MSVCRTD.lib(MSVCR90D.dll) : error LNK2005: ya se definió _printf en LIBCMT.lib(printf.obj)
1>MSVCRTD.lib(ti_inst.obj) : error LNK2005: ya se definió "private: __thiscall type_info::type_info(class type_info const &)" (??0type_info@@AAE@ABV0@@Z) en LIBCMT.lib(typinfo.obj)
1>MSVCRTD.lib(ti_inst.obj) : error LNK2005: ya se definió "private: class type_info & __thiscall type_info::operator=(class type_info const &)" (??4type_info@@AAEAAV0@ABV0@@Z) en LIBCMT.lib(typinfo.obj)

Adding /NODEFAULTLIB:LIBCMT.lib to “project -> Configuration Properties -> Linker ->
Command Line” in the Additional Options textbox.
Since the .cu files are compiled by nvcc and kinked to the nVidia libraries (according to the
supplied build rules), a general Visual C++ project links to the standard set of libraries, and
some of them get in conflict with nVidia’s. The previous fix avoids using the conflicting
standard libraries.

Friday, April 6, 2012

Generate random matrices with a given condition number (or spectrum)

It is really easy to perform this task. Just generate a random matrix, for example, in Matlab:

n=4
A=rand(n)

then we need no extract the (random) basis with the QR decomposition.

[Q,R]=qr(A)

and then we fix the spectrum. The best possibly conditioned matrix would have a condition number of one. Therefore we can achieve that with

D=diag(ones(n,1))
M=Q*D*Q'

This would produce a normal matrix $MM^T = M^T M$. If we want a non-normal matrix, we just need another random basis, that we can take from another random matrix $B$.

B=rand(n)
[V,R]=qr(A)
M=Q*D*V'

If we want to fix only the first (largest) eigenvalue to one, but leaving a random spectrum, we generate a random sequence

sp=randn(n,1)
sp=sp/max(sp)
D=diag(sp)
M=Q*D*V'

Sunday, April 1, 2012

Practical kernel trick

When using an SVM, a naive usage of the software may lead us to use the standard kernels that are offered to us and perform the analysis as well as we can.

However, one can choose to build his own kernels given some prior information about the problem. For example, we could have a dataset with values of instantiated variables for each datum, but we might also have correlations telling us when those data simultaneously appear. For example, in credit rating, one may have variables such as income for each of the customers. One might also have some kind of information linking customers, such as marital status, housing status, professional category and so on. Therefore, apart from the points in $\mathbb{R}^n$ (wages), we also have some kind of coexpression graph, where two individuals score higher with each other if their categorical variables coincide, so that we build a matrix $S_{i,j}=\mbox{#}(k,j)$, where $\mbox{#}(i,j)$ is the coincidence of the vales for all variables between individual $i$ and individual $j$. We then could make a custom kernels in this way:
$$
K=K_{\sigma} + \gamma S^*
$$
Where $K_{\sigma}$ is the classical Gaussian kernel build from the euclidean distances of the wages, with parameter $\sigma$, and $S^*$ is a projection of the previous $S$ onto the cone of positive semi-definite matrices. $\gamma$ is a tuning parameter. Now, how can we use it with our SVM software?

LibSVM can be fed with the previous kernel matrix $K$ if you work with the C/C++ or the Matlab interface. Just make sure to insert an index from one to the number of data as the first column. If you are working with R with the library e1071, you won't be able to do this.

It is well known to practitioners that the kernel-trick can help us in this case. Since the Mercer's kernel expansion is
$$
k(x,y)=\sum_i \lambda_i \phi_i(x) \phi_i(y) = \Phi(x)^T \Phi(y)
$$
Where $\lambda_i$ and $\phi_i$ are the eigenvalues and eigenfunctions of the integral operator whose positive definite kernel is $k$. $\Phi(x) = [\sqrt{\lambda_1} \phi_1(x), \sqrt{\lambda_2} \phi_2(x),\ldots]$ is, therefore, the "lifting" map from $\mathbb{R}^n$ to the Hilbert space of square-summable infinite sequences $l^2$. Since we are working with a finite approximation to this operator, the eigen-approximation is given by the eigen values and eigenvectors of the kernel matrices above. We will always be able to compute these. Therefore, we substitute $ \Phi(x)^T \Phi(y)$ by its finite approximation using the eigenvalues and eigenvectors of the matrix, so that using a standard linear kernel with the lifting map (the images under the approximated $\Phi$) we achieve the values in $K$.

The following example is in R. Assume we want to compute some credit score input, and the individuals' salaries are the ones given in the variable salaries. The experience with these customers is +1 if they were good customers, and -1 if they were bad customers, and stored in the variable y.

library(e1071)
salaries=c(20000,30000,55000,50000,30000,25000)
y=c(+1,+1,+1,-1,-1,-1)

It is clear that the experience and the salaries are not very correlated. Imagine betting is a huge problem for some segment of the population, and we have the individuals' record of using betting facilities, in this matrix

betting=c(0,0,0,1,1,1)

S=betting%*%t(betting)
Seig=eigen(S)
Seig$vectors
values=Seig$values
values[values<0]=0
D=diag(values)
Ss=Ss=U%*%D%*%t(U)
s=0.000000005
Ks=as.matrix(exp(-s*dist(salaries)^2))
diag(Ks)<-1

The gaussian distances are:
2 0.606530660
3 0.002187491 0.043936934
4 0.011108997 0.135335283 0.882496903
5 0.606530660 1.000000000 0.043936934 0.135335283
6 0.882496903 0.882496903 0.011108997 0.043936934 0.88249690

We now see that the true correlations between these individuals are not very strong. Therefore, supplementing Ks with S is a good idea. Ks+gamma*S is:
1 2 3 4 5 6
1 0.000000000 0.60653066 0.002187491 0.01110900 0.60653066 0.88249690
2 0.606530660 0.00000000 0.043936934 0.13533528 1.00000000 0.88249690
3 0.002187491 0.04393693 0.000000000 0.88249690 0.04393693 0.01110900
4 0.011108997 0.13533528 0.882496903 1.00000000 1.13533528 1.04393693
5 0.606530660 1.00000000 0.043936934 1.13533528 1.00000000 1.88249690
6 0.882496903 0.88249690 0.011108997 1.04393693 1.88249690 1.00000000

We see that there is some more correlation between the betting public. Following the R program

K=Ks+gamma*S
Keig=eigen(K)
phi=Keig$vectors%*%diag(sqrt(Keig$values))

Now we train the SVM with this embedding, or lifting map, $\Phi$. Observe that if $K=VEV^T=VE_q E_q V^T = \Phi^T \Phi$ then the embedding is $\Phi^T=VE_q$, where $E_q$ has the square root of the eigenvalues.

svm(phi, y, type="C", kernel="linear", cost=1)

Subsequent customers (after retraining) will be given a more fitting credit rating in this fashion than using only their salaries.

This trick is also the basis of the celebrated technique of random features (check out the paper).

Machinomics

Monday, April 30, 2012

Miscellaneous errors with Visual C++

Thursday, April 19, 2012

Shape recognition and shape spaces in computer vision

Setting up CUDA in Windows and Visual C++ 2008, Addendum II

Monday, April 16, 2012

Setting up CUDA in Windows and Visual C++ Express 2008 Addendum

Syntax highlighting

Building

Intellisense

Sunday, April 8, 2012

Setting up CUDA on Windows and Visual C++ Express 2008

Syntax highlight and Intellisense

Building

Linking difficulties

Friday, April 6, 2012

Generate random matrices with a given condition number (or spectrum)

Sunday, April 1, 2012

Practical kernel trick

About Me

Visit my blog about Economics and Markets