## Sunday, May 27, 2012

### Least squares in C

Just for fun.

I show here how to perform simple regression with least squares.

We implement the closed formulae that result from
$$\min_{\alpha,\,\beta}Q(\alpha,\beta)$$
where
$$Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2} = \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2$$
We thus compute
$$\frac{\partial}{\partial \beta} Q(\alpha,\beta)$$
and
$$\frac{\partial}{\partial \alpha} Q(\alpha,\beta)$$
Which results in
$$\hat\beta = \frac{ \sum_{i=1}^{n} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{n} (x_{i}-\bar{x})^2 } = \frac{ \overline{xy} - \bar{x}\bar{y} }{ \overline{x^2} - \bar{x}^2 } = \frac{ \operatorname{Cov}[x,y] }{ \operatorname{Var}[x] }$$

$$\hat\alpha = \bar{y} - \hat\beta\,\bar{x}$$

So here is the code:

#include "stdafx.h"
#include <stdlib.h>

double sum(double *x, int n) {
double s=0;

for (int i=0; i<n; i++) {
s+=x[i];
}

return s;
}

double dot(double *x, double *y, int n) {
double s=0;

for (int i=0; i<n; i++) {
s+=x[i]*y[i];
}

return s;
}

void ols(double *x, double *y, int n, double *beta, double *alpha) {
double sx = sum(x,n);
double sy = sum(y,n);
double sx2 = dot(x,x,n);
double sxy = dot(x,y,n);
double sxx = sx*sx;

*beta = (sxy-sx*sy/n) / (sx2-sxx/n);
*alpha = sy/n - *beta*sx/n;
}