Sunday, May 27, 2012

Least squares in C

Just for fun.

I show here how to perform simple regression with least squares.

We implement the closed formulae that result from
$$\min_{\alpha,\,\beta}Q(\alpha,\beta)$$
where
$$Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2} = \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2 $$
We thus compute
$$\frac{\partial}{\partial \beta} Q(\alpha,\beta)$$
and
$$\frac{\partial}{\partial \alpha} Q(\alpha,\beta)$$
Which results in
 $$\hat\beta  = \frac{ \sum_{i=1}^{n} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{n} (x_{i}-\bar{x})^2 }
             = \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 }
              = \frac{ \operatorname{Cov}[x,y] }{ \operatorname{Var}[x] }
$$

$$\hat\alpha  = \bar{y} - \hat\beta\,\bar{x}$$

So here is the code:

#include "stdafx.h"
#include <stdlib.h>

double sum(double *x, int n) {
	double s=0;

	for (int i=0; i<n; i++) {
		s+=x[i];
	}

	return s;
}


double dot(double *x, double *y, int n) {
	double s=0;

	for (int i=0; i<n; i++) {
		s+=x[i]*y[i];
	}

	return s;
}

void ols(double *x, double *y, int n, double *beta, double *alpha) {
	double sx = sum(x,n);
	double sy = sum(y,n);
	double sx2 = dot(x,x,n);
	double sxy = dot(x,y,n);
	double sxx = sx*sx;

	*beta = (sxy-sx*sy/n) / (sx2-sxx/n);
	*alpha = sy/n - *beta*sx/n;
}

And you can download the Visual C++ project here.

No comments:

Post a Comment