## Tuesday, May 1, 2012

### Denoising by convolution using CuFFT

In this post we will create a CUDA program that removes some noise with simple isotropic diffusion. We are going to diffuse the function (image) according to the heat equation
$$u_t = \Delta u$$
Where $\Delta$ is the laplacian operator and $u_t$ is the derivative with respect time. The solution at time $t$ is
$$u(x) = \int G_t (x-y) u_o(y) dy$$
Where $x,y\in \mathbb{R}^2$, $u_0$ is the initial noisy image and $G_t$ is the bivariate (normalized) gaussian kernel with standard deviation $t$.

Since a convolution is equivalent to a multiplication of the Fourier transforms as below (by the convolution theorem, and transforming back), we can use the CuFFT library to perform these steps.
$$\hat{u}(\omega) = \hat{G_t}(\omega)\hat{u_o}(\omega)$$

With this project, I want to hunt down two birds with one shot. I want to make a managed code application with Windows forms that reads and visualizes a JPG file, adds noise and then blurs it to remove some noise.

Now, you must know that you cannot mix managed code and CUDA. The solution to this is to separate the CUDA computation and the presentation application in two different parts. We will encapsulate the CUDA computations into a DLL and the Windows program in a Windows Forms application. Both will be part of the same Visual Studio solution so everything will work in the final step. The Windows Forms will be the main project within the VS solution.

Under the DLL project, you must set up CUDA libraries. For the compiler to look for CUDA and CUDAutils header files, we must add
$(NVSDKCOMPUTE_ROOT)\C\common\inc;"$(CUDA_PATH)/include";./;../../common/inc;../../../shared/inc
to  project properties -> C/C++ -> General. Then we must add the directory where additional support header files can be found, in this case,
$(NVSDKCOMPUTE_ROOT)\C\common\inc . This must be done under project properties -> CUDA Runtime API -> Additional Include Directories. In project properties ->Linker -> Additional library directories you must add $(CUDA_PATH)/lib/$(PlatformName);../../common/lib/$(PlatformName);$(NVSDKCOMPUTE_ROOT)\C\common\lib\Win32 Then our DLL function takes a matrix of values and performs the Fourier transform of both (previously padded) the input data and the kernel. We can precompute the kernel with Matlab (with fspecial) and copy the values. In this case, for a kernel with$t=1\$, we store the values in h_kernel. Then we perform the multiplication and compute the inverse Fourier transform. This code excerpt contains the important lines:

DLL int blur(float *h_Data, int dataH, int dataW, float **d, int *dH, int *dW) {
float
*d_Data,
*d_Kernel,

fComplex
*d_DataSpectrum,
*d_KernelSpectrum;

cufftHandle
fftPlanFwd,
fftPlanInv;

const int kernelH = 7;
const int kernelW = 6;
...

float *h_ResultGPU = (float *)malloc(fftH    * fftW * sizeof(float));

cutilSafeCall( cudaMalloc((void **)&d_Data,   dataH   * dataW   * sizeof(float)) );
...

float h_Kernel[] = {0.0001F,0.0006F,0.0016F,0.0016F,0.0006F,0.0001F,
0.0009F,0.0070F,0.0190F,0.0190F,0.0070F,0.0009F,
0.0043F,0.0314F,0.0854F,0.0854F,0.0314F,0.0043F,
0.0070F,0.0518F,0.1407F,0.1407F,0.0518F,0.0070F,
0.0043F,0.0314F,0.0854F,0.0854F,0.0314F,0.0043F,
0.0009F,0.0070F,0.0190F,0.0190F,0.0070F,0.0009F,
0.0001F,0.0006F,0.0016F,0.0016F,0.0006F,0.0001F};

...

cufftSafeCall( cufftExecR2C(fftPlanFwd,

cutilSafeCall( cutilDeviceSynchronize() );

cufftSafeCall( cufftExecR2C(fftPlanFwd,

modulateAndNormalize(d_DataSpectrum, d_KernelSpectrum, fftH, fftW, 1);
cufftSafeCall( cufftExecC2R(fftPlanInv,

cutilSafeCall( cutilDeviceSynchronize() );

cutilSafeCall( cudaMemcpy(h_ResultGPU,

d_PaddedData, fftH * fftW * sizeof(float), cudaMemcpyDeviceToHost) );
...

*d=h_ResultGPU;
*dH=fftH;
*dW=fftW;

return 1;
}

I have highlighted the lines where the Fourier transform, the multiplication and the inverse Fourier transform with the induced blur take place.

In the application, we must have a way to call the DLL function. This code loads the DLL, creates a function pointer and casts it to the DLL function.
private: int CallMyDLL(float *f, int h, int w, float **d, int *dh, int *dw)
{
FARPROC lpfnGetProcessID =

typedef int (__stdcall * pICFUNC)

(float *, int, int, float**, int *, int*);

pICFUNC blur;
blur = pICFUNC(lpfnGetProcessID);

int MyReturnVal = blur(f, h, w, d, dh, dw);

FreeLibrary(hGetProcIDDLL);

return MyReturnVal;
}

Then we can call with the pixels as a linearized matrix and the computations will be made within the GPU. Conversions between .Net objects and float matrices are straighforward.

I made the source code available here.