For the non-believers: it is also valid for numbers with different number of digits. Just be careful to align the most significant digits accordingly.

## Wednesday, May 30, 2012

### Nice little tick for two-number multiplication

For the non-believers: it is also valid for numbers with different number of digits. Just be careful to align the most significant digits accordingly.

## Tuesday, May 29, 2012

### A tale of exponential interest

I have a savings account at ING bank which gives me 4% interest monthly. Each month I receive payment of both the principal I deposited plus the compound interest they gave me over the past months.

Notice that the interest, or the present increment or my balance, $\frac{d x(t)}{dt}$, is equal to the current balance itself times time interest $k$.

$$\frac{d x(t)}{dt} = kx(t)$$

We write each function as a variable for convenience.

$$\frac{d x}{x} = kdt$$

And solve by integration

$$\int \frac{d x}{x} = k \int dt$$

$$\log x = kt + C_0$$

(we abuse notation and name the upper limit of integration with the same name than the integrating variable) yielding

$$x(t) = C_1e^{kt}$$

What does it mean? It means that everybody would see their savings accounts grow exponentially. Now, would it correspond to exponential wealth? Would the currency be debased at the same speed for the interest to be payable or can we extract oil fast enough to cover our land with things and thus effectively increasing our wealth?

$C_1=e^{C_0}$

Here is an interesting discussion at Do the Math.

Notice that the interest, or the present increment or my balance, $\frac{d x(t)}{dt}$, is equal to the current balance itself times time interest $k$.

$$\frac{d x(t)}{dt} = kx(t)$$

We write each function as a variable for convenience.

$$\frac{d x}{x} = kdt$$

And solve by integration

$$\int \frac{d x}{x} = k \int dt$$

$$\log x = kt + C_0$$

(we abuse notation and name the upper limit of integration with the same name than the integrating variable) yielding

$$x(t) = C_1e^{kt}$$

What does it mean? It means that everybody would see their savings accounts grow exponentially. Now, would it correspond to exponential wealth? Would the currency be debased at the same speed for the interest to be payable or can we extract oil fast enough to cover our land with things and thus effectively increasing our wealth?

$C_1=e^{C_0}$

Here is an interesting discussion at Do the Math.

## Monday, May 28, 2012

### Hypergeometric brain teaser

The statement of the problem is

Within a group of n balls, how can we withdrawn $k$ black balls? There are $Np$ black balls in the bag. Therefore

$$\left(\begin{array}{c} Np \\ k \end{array}\right)$$

To fill up the rest of the group, we need to pick up white balls. There are $N-Np$ white balls and we need $n-k$. Therefore

$$\left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$

We combine the two parts of a possible group.

$$\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$

Now, how many groups of n balls can we withdraw altogether, disregarding the color of the balls?

$$\left(\begin{array}{c} N \\ n \end{array}\right) $$

To build the probability of a specific group of n balls, k black and n-k white drawn from the bag, we compute its frequency

$$\frac{\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)}{\left(\begin{array}{c} N \\ n \end{array}\right)}$$

If we look for a probability distribution with this print, we find that it is the Hypergeometric distribution, which meets the description of the problem

I wrote a program that takes N,n,p as inputs (and a random bag of white and black balls) and computes the expectation, which is always around $p$ (depending on the generated random bag). Do not go very far with $N$, keep it under 30!

A bag of N balls has black and white balls. We define a function f such that it is 1 for a black ball and 0 for a white ball, and $\frac{1}{N}\sum_i f(x_i) = p$, where $x_i$ could be either a black or white ball. If we draw balls in groups $G$ of n balls, what would the mathematical expectation of the random variableNow, $\frac{1}{n}\sum_{i\in G} f(x_i) $ is a random variable because the withdrawal of balls is random. Its expectation is $E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]$. Now, since f marks balls as 0 or 1, this is a variable that counts black withdrawals. Let's see how we can make groups of black and white balls.

$$\frac{1}{n}\sum_{i\in G} f(x_i) $$

Within a group of n balls, how can we withdrawn $k$ black balls? There are $Np$ black balls in the bag. Therefore

$$\left(\begin{array}{c} Np \\ k \end{array}\right)$$

To fill up the rest of the group, we need to pick up white balls. There are $N-Np$ white balls and we need $n-k$. Therefore

$$\left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$

We combine the two parts of a possible group.

$$\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)$$

Now, how many groups of n balls can we withdraw altogether, disregarding the color of the balls?

$$\left(\begin{array}{c} N \\ n \end{array}\right) $$

To build the probability of a specific group of n balls, k black and n-k white drawn from the bag, we compute its frequency

$$\frac{\left(\begin{array}{c} Np \\ k \end{array}\right) \left(\begin{array}{c} N-Np \\ n-k \end{array}\right)}{\left(\begin{array}{c} N \\ n \end{array}\right)}$$

If we look for a probability distribution with this print, we find that it is the Hypergeometric distribution, which meets the description of the problem

[...] describes the probability of k successes in n draws from a finite population of size N containing m successes without replacement.Therefore, since cumulative successes $\sum_{i\in G} f(x_i) $ would have a hypergeometric distribution with mean $\frac{nNp}{N}=np$. But we want $\frac{1}{n}\sum_{i\in G} f(x_i)$, so that $E[Y]=\frac{1}{n} E[\sum_{i\in G} f(x_i)]=\frac{np}{n}=p$.

I wrote a program that takes N,n,p as inputs (and a random bag of white and black balls) and computes the expectation, which is always around $p$ (depending on the generated random bag). Do not go very far with $N$, keep it under 30!

push <- function(i,n,N,ind,v) { es=0; res=list() res[[1]]=0 res[[2]]=0 if (i==1) { for (j in seq(1,N)) { ind[1]=j res2=push(i+1,n,N,ind,v) res[[1]]=res[[1]]+res2[[1]] res[[2]]=res[[2]]+res2[[2]] } print(res[[1]]) print(res[[2]]) return (res[[1]]/(res[[2]]*n)) } if (i>=n+1) { es=sum(v[ind]) #print(ind) res[[1]]=es res[[2]]=1 return (res) } if (ind[i-1]+1<=N) { for (j in seq(ind[i-1]+1,N)) { ind[i]=j res2=push(i+1,n,N,ind,v) res[[1]]=res[[1]]+res2[[1]] res[[2]]=res[[2]]+res2[[2]] } } return (res) } N=20 p=.2 v=(runif(N)<=p) * 1 ind=rep(0,N) n=5 push(1,n,N,ind,v)

## Sunday, May 27, 2012

### Least squares in C

Just for fun.

I show here how to perform simple regression with least squares.

We implement the closed formulae that result from

$$\min_{\alpha,\,\beta}Q(\alpha,\beta)$$

where

$$Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2} = \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2 $$

We thus compute

$$\frac{\partial}{\partial \beta} Q(\alpha,\beta)$$

and

$$\frac{\partial}{\partial \alpha} Q(\alpha,\beta)$$

Which results in

$$\hat\beta = \frac{ \sum_{i=1}^{n} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{n} (x_{i}-\bar{x})^2 }

= \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 }

= \frac{ \operatorname{Cov}[x,y] }{ \operatorname{Var}[x] }

$$

$$\hat\alpha = \bar{y} - \hat\beta\,\bar{x}$$

So here is the code:

And you can download the Visual C++ project here.

I show here how to perform simple regression with least squares.

We implement the closed formulae that result from

$$\min_{\alpha,\,\beta}Q(\alpha,\beta)$$

where

$$Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2} = \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2 $$

We thus compute

$$\frac{\partial}{\partial \beta} Q(\alpha,\beta)$$

and

$$\frac{\partial}{\partial \alpha} Q(\alpha,\beta)$$

Which results in

$$\hat\beta = \frac{ \sum_{i=1}^{n} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{n} (x_{i}-\bar{x})^2 }

= \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 }

= \frac{ \operatorname{Cov}[x,y] }{ \operatorname{Var}[x] }

$$

$$\hat\alpha = \bar{y} - \hat\beta\,\bar{x}$$

So here is the code:

#include "stdafx.h" #include <stdlib.h> double sum(double *x, int n) { double s=0; for (int i=0; i<n; i++) { s+=x[i]; } return s; } double dot(double *x, double *y, int n) { double s=0; for (int i=0; i<n; i++) { s+=x[i]*y[i]; } return s; } void ols(double *x, double *y, int n, double *beta, double *alpha) { double sx = sum(x,n); double sy = sum(y,n); double sx2 = dot(x,x,n); double sxy = dot(x,y,n); double sxx = sx*sx; *beta = (sxy-sx*sy/n) / (sx2-sxx/n); *alpha = sy/n - *beta*sx/n; }

And you can download the Visual C++ project here.

## Sunday, May 20, 2012

### Brainteaser with Montecarlo simulation

Here's a brain teaser.

We work with the assumption of independence, so $P(A,B)=P(A)P(B)=\frac{3}{4}\frac{1}{6} = \frac{3}{24}=\frac{1}{8}$.

We decompose the event B in the intersection with A and the intersection with the complementary of A (being the last one defined as "he did not get a 6"), so that $P(B)=P(B,A) + P(B,\bar{A})$. We already have the first one so we compute $P(B,\bar{A})=\frac{1}{4}\frac{5}{6}=\frac{5}{24}$, making $P(B)=\frac{3}{24} + \frac{5}{24} = \frac{8}{24} = \frac{1}{3}$.

Therefore, $P(A|B) = \frac{P(A,B)}{P(B)} = \frac{\frac{1}{8}}{\frac{1}{3}}=\frac{3}{8} = 0.375$

Now we simulate the mentioned events in R. See that we generate the number of times the man tells the truth independently of the times he gets a six. We then count the times that a real 6 coincide with the man telling the truth in $num$. In the denominator $den$ we count the previous amount plus the number of times he says 6 but does not get a dice with a 6. This means that, among all the times the man reports a 6 (both true or false), we select only those times they are true.

Actually we know that the man has said he either got a 6 or he didn't. So the probability that we are interested in is $P(A|B)$, where the events A and B are A: "The man said 6", B: "He actually got a 6". This conditional probability is $P(A|B) = \frac{P(A,B)}{P(B)}$.A man speaks the truth 3 out of 4 times. He throws a die and reports it to be a 6. What is the probability of it being a 6?

We work with the assumption of independence, so $P(A,B)=P(A)P(B)=\frac{3}{4}\frac{1}{6} = \frac{3}{24}=\frac{1}{8}$.

We decompose the event B in the intersection with A and the intersection with the complementary of A (being the last one defined as "he did not get a 6"), so that $P(B)=P(B,A) + P(B,\bar{A})$. We already have the first one so we compute $P(B,\bar{A})=\frac{1}{4}\frac{5}{6}=\frac{5}{24}$, making $P(B)=\frac{3}{24} + \frac{5}{24} = \frac{8}{24} = \frac{1}{3}$.

Therefore, $P(A|B) = \frac{P(A,B)}{P(B)} = \frac{\frac{1}{8}}{\frac{1}{3}}=\frac{3}{8} = 0.375$

Now we simulate the mentioned events in R. See that we generate the number of times the man tells the truth independently of the times he gets a six. We then count the times that a real 6 coincide with the man telling the truth in $num$. In the denominator $den$ we count the previous amount plus the number of times he says 6 but does not get a dice with a 6. This means that, among all the times the man reports a 6 (both true or false), we select only those times they are true.

N=100000 dice=sample(1:6,N,replace=T) lie=rep(0,N*3/4) lie=c(lie, rep(1,N*1/4)) lie=lie[sample(1:N,replace=F)] num=sum((dice==6)*(lie==0)) den=sum((dice==6)*(lie==0)) + sum((dice!=6)*(lie==1)) res=num/den res > res [1] 0.3739862Note that the output is very close to the theoretical result of 0.375.

## Friday, May 18, 2012

### Batch-processing data files in a directory tree with Matlab

You may find the following code useful to process data with Matlab in a batch fashion. This example processes all the PNG files within the first level of directories that hang under the current directory, that additionaly have "noise" as a part of its name and "gif_" is not to be found within the name.

for i=1:N name=files(i).name; if (files(i).isdir == 0) && strcmp(name(end-3:end),'.png') && _

isempty(strfind(name,'nois')) && _

~isempty(strfind(name,['gif_' num2str(s)]))

im=double(imread(name));

[your processing here]

end end

## Thursday, May 10, 2012

### Conformal mapping in shape representation

If a shape is considered to be a simple closed smooth curve in the complex plane, meaning that they have derivatives of all orders and that do not cross with themselves, then the Riemannian mapping theorem states that it is possible to map the unit disc conformally (meaning that infinitesimal angle between each two crossing curves is equal to the infinitesimal angle between the transformed curves) to the interior of any such shape. The conformal transformation is unique up to any preceding MÃ¶bius transformations mapping the unit disc to itself, this is, maps of the form

$$z \rightarrow \frac{az + b}{\bar{b}z + \bar{a}}$$

If we identify $\nabla_{-}$ with the interior of the shape, and $\Gamma_{-}$ with the interior of the unit disk, then by the Riemannian mapping theorem there exists a map

$$\Phi_{-} : \nabla_{-} \rightarrow \Gamma_{-}$$

With the previous characteristics. Furthermore, if we identify $\nabla_{+}$ with the exterior of the shape, and $\Gamma_{+}$ with the exterior of the unit disk, the same conditions apply

$$\Phi_{+} : \nabla_{+} \rightarrow \Gamma_{+}$$

Now, we construct

$$\Psi = \Phi_{+} \circ \Phi_{-}$$

This provides a fingerprint of the shape unique up to a Moebius transformation.

In practice, we use the Swarz-Christoffel Matlab toolbox, having a complex variable with the given vertices coordinates in $pol$, we compute the map $m$ by

pol=polygon(dots);

fc=diskmap(pol);

ex=extermap(pol);

iex=scmapinv(ex);

m=composite(fc,iex);

To extract the fingerprint, we need to evaluate the map in the $z$ coordinate given the angle, so that $z=exp(-i \theta)$, and the evaluation is $m(z)$ for all $\theta \in [0,2\pi)$. Now, the fingerprint is the angle of a given evaluation $m(z)$, this is, $\hat{\theta}=-\frac{log(m(z))}{i}$. plotting $(\theta,\hat{\theta})$, we see the fingerprint.

Checkout the paper here.

$$z \rightarrow \frac{az + b}{\bar{b}z + \bar{a}}$$

If we identify $\nabla_{-}$ with the interior of the shape, and $\Gamma_{-}$ with the interior of the unit disk, then by the Riemannian mapping theorem there exists a map

$$\Phi_{-} : \nabla_{-} \rightarrow \Gamma_{-}$$

With the previous characteristics. Furthermore, if we identify $\nabla_{+}$ with the exterior of the shape, and $\Gamma_{+}$ with the exterior of the unit disk, the same conditions apply

$$\Phi_{+} : \nabla_{+} \rightarrow \Gamma_{+}$$

Now, we construct

$$\Psi = \Phi_{+} \circ \Phi_{-}$$

This provides a fingerprint of the shape unique up to a Moebius transformation.

In practice, we use the Swarz-Christoffel Matlab toolbox, having a complex variable with the given vertices coordinates in $pol$, we compute the map $m$ by

pol=polygon(dots);

fc=diskmap(pol);

ex=extermap(pol);

iex=scmapinv(ex);

m=composite(fc,iex);

To extract the fingerprint, we need to evaluate the map in the $z$ coordinate given the angle, so that $z=exp(-i \theta)$, and the evaluation is $m(z)$ for all $\theta \in [0,2\pi)$. Now, the fingerprint is the angle of a given evaluation $m(z)$, this is, $\hat{\theta}=-\frac{log(m(z))}{i}$. plotting $(\theta,\hat{\theta})$, we see the fingerprint.

Checkout the paper here.

## Tuesday, May 1, 2012

### Denoising by convolution using CuFFT

In this post we will create a CUDA program that removes some noise with simple isotropic diffusion. We are going to diffuse the function (image) according to the heat equation

$$u_t = \Delta u$$

Where $\Delta$ is the laplacian operator and $u_t$ is the derivative with respect time. The solution at time $t$ is

$$u(x) = \int G_t (x-y) u_o(y) dy$$

Where $x,y\in \mathbb{R}^2$, $u_0$ is the initial noisy image and $G_t$ is the bivariate (normalized) gaussian kernel with standard deviation $t$.

Since a convolution is equivalent to a multiplication of the Fourier transforms as below (by the convolution theorem, and transforming back), we can use the CuFFT library to perform these steps.

$$\hat{u}(\omega) = \hat{G_t}(\omega)\hat{u_o}(\omega)$$

With this project, I want to hunt down two birds with one shot. I want to make a managed code application with Windows forms that reads and visualizes a JPG file, adds noise and then blurs it to remove some noise.

Now, you must know that you cannot mix managed code and CUDA. The solution to this is to separate the CUDA computation and the presentation application in two different parts. We will encapsulate the CUDA computations into a DLL and the Windows program in a Windows Forms application. Both will be part of the same Visual Studio solution so everything will work in the final step. The Windows Forms will be the main project within the VS solution.

Under the DLL project, you must set up CUDA libraries. For the compiler to look for CUDA and CUDAutils header files, we must add

Then our DLL function takes a matrix of values and performs the Fourier transform of both (previously padded) the input data and the kernel. We can precompute the kernel with Matlab (with fspecial) and copy the values. In this case, for a kernel with $t=1$, we store the values in h_kernel. Then we perform the multiplication and compute the inverse Fourier transform. This code excerpt contains the important lines:

DLL int blur(float *h_Data, int dataH, int dataW, float **d, int *dH, int *dW) {

float

*d_Data,

*d_PaddedData,

*d_Kernel,

*d_PaddedKernel;

fComplex

*d_DataSpectrum,

*d_KernelSpectrum;

cufftHandle

fftPlanFwd,

fftPlanInv;

const int kernelH = 7;

const int kernelW = 6;

...

float *h_ResultGPU = (float *)malloc(fftH * fftW * sizeof(float));

cutilSafeCall( cudaMalloc((void **)&d_Data, dataH * dataW * sizeof(float)) );

...

float h_Kernel[] = {0.0001F,0.0006F,0.0016F,0.0016F,0.0006F,0.0001F,

0.0009F,0.0070F,0.0190F,0.0190F,0.0070F,0.0009F,

0.0043F,0.0314F,0.0854F,0.0854F,0.0314F,0.0043F,

0.0070F,0.0518F,0.1407F,0.1407F,0.0518F,0.0070F,

0.0043F,0.0314F,0.0854F,0.0854F,0.0314F,0.0043F,

0.0009F,0.0070F,0.0190F,0.0190F,0.0070F,0.0009F,

0.0001F,0.0006F,0.0016F,0.0016F,0.0006F,0.0001F};

...

cutilSafeCall( cutilDeviceSynchronize() );

cutilSafeCall( cutilDeviceSynchronize() );

cutilSafeCall( cudaMemcpy(h_ResultGPU,

d_PaddedData, fftH * fftW * sizeof(float), cudaMemcpyDeviceToHost) );

...

*d=h_ResultGPU;

*dH=fftH;

*dW=fftW;

return 1;

}

I have highlighted the lines where the Fourier transform, the multiplication and the inverse Fourier transform with the induced blur take place.

In the application, we must have a way to call the DLL function. This code loads the DLL, creates a function pointer and casts it to the DLL function.

private: int CallMyDLL(float *f, int h, int w, float **d, int *dh, int *dw)

{

HINSTANCE hGetProcIDDLL = LoadLibrary(L"cudaint.dll");

FARPROC lpfnGetProcessID =

GetProcAddress(HMODULE (hGetProcIDDLL),"blur");

typedef int (__stdcall * pICFUNC)

(float *, int, int, float**, int *, int*);

pICFUNC blur;

blur = pICFUNC(lpfnGetProcessID);

int MyReturnVal = blur(f, h, w, d, dh, dw);

FreeLibrary(hGetProcIDDLL);

return MyReturnVal;

}

Then we can call with the pixels as a linearized matrix and the computations will be made within the GPU. Conversions between .Net objects and float matrices are straighforward.

$$u_t = \Delta u$$

Where $\Delta$ is the laplacian operator and $u_t$ is the derivative with respect time. The solution at time $t$ is

$$u(x) = \int G_t (x-y) u_o(y) dy$$

Where $x,y\in \mathbb{R}^2$, $u_0$ is the initial noisy image and $G_t$ is the bivariate (normalized) gaussian kernel with standard deviation $t$.

Since a convolution is equivalent to a multiplication of the Fourier transforms as below (by the convolution theorem, and transforming back), we can use the CuFFT library to perform these steps.

$$\hat{u}(\omega) = \hat{G_t}(\omega)\hat{u_o}(\omega)$$

With this project, I want to hunt down two birds with one shot. I want to make a managed code application with Windows forms that reads and visualizes a JPG file, adds noise and then blurs it to remove some noise.

Now, you must know that you cannot mix managed code and CUDA. The solution to this is to separate the CUDA computation and the presentation application in two different parts. We will encapsulate the CUDA computations into a DLL and the Windows program in a Windows Forms application. Both will be part of the same Visual Studio solution so everything will work in the final step. The Windows Forms will be the main project within the VS solution.

Under the DLL project, you must set up CUDA libraries. For the compiler to look for CUDA and CUDAutils header files, we must add

`$(NVSDKCOMPUTE_ROOT)\C\common\inc;"$(CUDA_PATH)/include";./;../../common/inc;../../../shared/inc`

to project properties -> C/C++ -> General. Then we must add the directory where additional support header files can be found, in this case,
`$(NVSDKCOMPUTE_ROOT)\C\common\inc`

. This must be done under project properties -> CUDA Runtime API -> Additional Include Directories. In project properties ->Linker -> Additional library directories you must add` $(CUDA_PATH)/lib/$(PlatformName);../../common/lib/$(PlatformName);$(NVSDKCOMPUTE_ROOT)\C\common\lib\Win32`

Then our DLL function takes a matrix of values and performs the Fourier transform of both (previously padded) the input data and the kernel. We can precompute the kernel with Matlab (with fspecial) and copy the values. In this case, for a kernel with $t=1$, we store the values in h_kernel. Then we perform the multiplication and compute the inverse Fourier transform. This code excerpt contains the important lines:

DLL int blur(float *h_Data, int dataH, int dataW, float **d, int *dH, int *dW) {

float

*d_Data,

*d_PaddedData,

*d_Kernel,

*d_PaddedKernel;

fComplex

*d_DataSpectrum,

*d_KernelSpectrum;

cufftHandle

fftPlanFwd,

fftPlanInv;

const int kernelH = 7;

const int kernelW = 6;

...

float *h_ResultGPU = (float *)malloc(fftH * fftW * sizeof(float));

cutilSafeCall( cudaMalloc((void **)&d_Data, dataH * dataW * sizeof(float)) );

...

float h_Kernel[] = {0.0001F,0.0006F,0.0016F,0.0016F,0.0006F,0.0001F,

0.0009F,0.0070F,0.0190F,0.0190F,0.0070F,0.0009F,

0.0043F,0.0314F,0.0854F,0.0854F,0.0314F,0.0043F,

0.0070F,0.0518F,0.1407F,0.1407F,0.0518F,0.0070F,

0.0043F,0.0314F,0.0854F,0.0854F,0.0314F,0.0043F,

0.0009F,0.0070F,0.0190F,0.0190F,0.0070F,0.0009F,

0.0001F,0.0006F,0.0016F,0.0016F,0.0006F,0.0001F};

...

**cufftSafeCall( cufftExecR2C(fftPlanFwd,****(cufftReal *)d_PaddedKernel, (cufftComplex *)d_KernelSpectrum) );**cutilSafeCall( cutilDeviceSynchronize() );

**cufftSafeCall( cufftExecR2C(fftPlanFwd,****(cufftReal *)d_PaddedData, (cufftComplex *)d_DataSpectrum) );****modulateAndNormalize(d_DataSpectrum, d_KernelSpectrum, fftH, fftW, 1);**

cufftSafeCall( cufftExecC2R(fftPlanInv,cufftSafeCall( cufftExecC2R(fftPlanInv,

**(cufftComplex *)d_DataSpectrum, (cufftReal *)d_PaddedData) );**cutilSafeCall( cutilDeviceSynchronize() );

cutilSafeCall( cudaMemcpy(h_ResultGPU,

d_PaddedData, fftH * fftW * sizeof(float), cudaMemcpyDeviceToHost) );

...

*d=h_ResultGPU;

*dH=fftH;

*dW=fftW;

return 1;

}

I have highlighted the lines where the Fourier transform, the multiplication and the inverse Fourier transform with the induced blur take place.

In the application, we must have a way to call the DLL function. This code loads the DLL, creates a function pointer and casts it to the DLL function.

private: int CallMyDLL(float *f, int h, int w, float **d, int *dh, int *dw)

{

HINSTANCE hGetProcIDDLL = LoadLibrary(L"cudaint.dll");

FARPROC lpfnGetProcessID =

GetProcAddress(HMODULE (hGetProcIDDLL),"blur");

typedef int (__stdcall * pICFUNC)

(float *, int, int, float**, int *, int*);

pICFUNC blur;

blur = pICFUNC(lpfnGetProcessID);

int MyReturnVal = blur(f, h, w, d, dh, dw);

FreeLibrary(hGetProcIDDLL);

return MyReturnVal;

}

Then we can call with the pixels as a linearized matrix and the computations will be made within the GPU. Conversions between .Net objects and float matrices are straighforward.

I made the source code available here.

Subscribe to:
Posts (Atom)