I wanted to write about CUDA because I feel it is the future of non-desktop computation, meaning any process that goes beyond data movement and has to apply a scientific computation to obtain information from raw data.
CUDA (Compute Unified Device Architecture) is a form of GPGPU that uses the graphics processors on NVIDIA graphics cards as computation units. One can send programs (called kernels) that perform those computations to the graphics device.
The importance of CUDA lies in the specific hardware architecture aimed at performing vector computations. Scientific and graphics software making extensive use of arithmetic operations will therefore benefit from CUDA parallelization (this includes everywhere you see matrix algebra, such as in quadratic optimization including SVMs, PCA, ICA, CCA, and other discretized operations such as fast Fourier transform, wavelet filter banks and so on). On the other hand, software using large memory transfers are impervious to (any kind of arithmetic) parallelization (this includes databases, web servers, algorithmic/protocol-driven networking software, etc.).
It is interesting for data processing practitioners in the sense that it can cope with the larga datasets. Modern CPUs contain a high among of L2 cache, and the tree-structured cache may expand up to three levels. This design is so because software programs have high data coherency, meaning predominance of serial (and thus continuously accessible) code. However, when working quickly across a large dataset, this cache design performance decays very rapidly. The GPU memory interface is very different from that of the CPU. GPUs use massive parallel interfaces in order to connect with their memory. For example, the GTX 280 uses a 512-bit interface to its high performance GDDR3 memory. This interface is approximately 10 times faster than a typical CPU-RAM interface. On the other hand, GPUs lack the vast amount of memory that the main CPU system enjoy. Customized, large memory parallel GPUs are commercially available at a higher price than that of a home high-performing gaming system. But a nice approach to higly-parallelized software can be taken notwithstanding.
There is a serious drawback from a software engineering point of view, and that is that, Unlike most programming languages, CUDA is coupled very closely together with the hardware implementation. While CPU families do not basically change to maintain retro-compatibility, CUDA-enabled hardware significantly change in basic architectural design.
An important concept to be aware of is the thread hierarchy. CPUs are designed to run just a few threads. GPUs, on the other hand, are designed to process thousands of threads simultaneously. So, in order to take full advantage of your graphics card, the problem must be liable to be broken down to small pieces. Important CUDA tree-like thread organizational structures one has to bear in mind:
Half-Warp: A half-warp is a group of 16 consecutive threads. Half-warp threads are generally executed together and aligned, for instance, threads 0-15 will be in the same half-warp, 16-31, and so on.
Warp: A warp of threads is a group of 32 consecutive threads. On future computing devices from NVIDIA, it might be possible that all threads in the same Warp are generally executed together in parallel. Therefore, it is a good idea to make your programs as if all threads within the same warp will execute together in parallel.
Block: A block is a collection of threads. For technical reasons, blocks should have at least 192 threads to obtain maximum efficiency and full latency avoidance. Typically, blocks might contain 256 threads, 512 threads, or even 768 threads. Here’s the important thing you need to know. Threads within the same block can synchronize with each other, and quickly communicate with each other.
Grid: A grid is a collection of blocks. Blocks can not synchronize with each other, and therefore threads within one block can not synchronize with threads in another block.
Regarding memory, CUDA uses the following kinds:
Global Memory: Global memory can be thought of as the physical memory on your graphics card. All threads can read and write to Global memory.
Shared Memory: A GPU consists of many processors, or multiprocessors. Each multiprocessor has a small amount of shared memory, with a usual size of 16KB. Shared memory is generally used as a very quick working space for threads within a block. It is allocated on a block by block basis. For example, you may have three blocks running consecutively on the same multiprocessor. This means that the maximum amount of shared memory the blocks can reserve is 16KB/3. Threads within the same block can quickly and easily communicate with each other by writing and reading to the shared memory. It’s worth mentioning that the shared memory is at least 100 times faster than global memory, so it’s very advantageous if you can use it correctly.
Texture Memory: A GPU also has texture units and memory which can be taken advantage of in some circumstances. Unlike global memory, texture memory is cached, and is generally read only. This can be taken advantage by using this memory to store chunks of raw data to process.
From the software development point of view, the programs run within a thread are:
Kernels: In a Warp, a kernel is executed multiple times in a SIMD (single instruction multiple data) fashion, which means that the flow of execution in the processors is the same, executing the same instruction, but each processor operation that instruction on different data. We are interested in splitting the software in pieces so that a piece is executed multiple types on different data. For example, in the matrix multiplication A*B, a single kernel may have the task to multiply a row from A and a column from B. When instantiating this kernel in N x M threads, each instantiation (thread) performs the dot-product of a specific row of A and a specific row of B.
Since, at the time of compilation and linkage, the kernel, which is CUDA code, is stored as data in the program (as seen by the operating system), and is subsequently sent to the GPU via the corresponding DMA channel. Once in the GPU, the kernel is instantiated on each processor.
I made a quick Example that can directly be compiled with Visual C++ with help of the the included project files (assuming one has the CUDA toolkit, driver, and Visual C++ environment setup). It can also be compiled anywhere one has a working C++ and NVCC environment.
I will write a follow up article with the basic setip with Visual C++ 2008 Express, which requires some tweaks to get up and running.
Saturday, March 31, 2012
Tuesday, March 20, 2012
Multiclass SVM with e1071
When dealing with multi-class classification using the package e1071 for R, which encapsulates LibSVM, one faces the problem of correctly predicting values, since the predict function doesn't seem to deal effectively with this case. In fact, testing the very example that comes in the svm help (?svm on the R command line), one sees the failing performance of the function (albeit working with a correctly fitted model).
This model is correctly fitted. However
data(iris) attach(iris) model <- svm(Species ~ ., data = iris) x <- subset(iris, select = -Species) y <- Species model <- svm(x, y, probability = TRUE)
This model is correctly fitted. However
Does not show the correct values
pred <- predict(model, x)
We see that every prediction is setosa, even though the dataset is equally divided between the three classes (setosa, versicolor and virginica). It seems that something went wrong with the direct prediction, but one way to overcome this problem is to use the predicted probabilities, that seem to be well computed:
> pred 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa Levels: setosa versicolor virginica
Observe how they look
pred <- predict(model, x, decision.values = TRUE, probability = TRUE)
Now we see that the most probable class is indeed the ground truth and we can correctly classify with the following function
> attr(pred, "probabilities") setosa versicolor virginica 1 0.980325653 0.011291686 0.008382661 2 0.972927977 0.018061300 0.009010723 3 0.979044327 0.011921880 0.009033793 ... 48 0.977140826 0.013050710 0.009808464 49 0.977831001 0.013359834 0.008809164 50 0.980099521 0.011501036 0.008399444 51 0.017740468 0.954734399 0.027525133 52 0.010394167 0.973918376 0.015687457 ... 97 0.009263806 0.986123276 0.004612918 98 0.008503724 0.988168405 0.003327871 99 0.025068812 0.965415124 0.009516064 100 0.007514580 0.987584706 0.004900714 101 0.012482541 0.002502134 0.985015325 ... 149 0.013669944 0.017618659 0.968711397 150 0.010205071 0.140882630 0.848912299
predsvm<-function(model,newdata)One might also program the following function, that deals with the way the support vector coefficients are stored in the model object, in model$coefs and model$rho:
{
prob<-attr(predict(model, newdata, probability = TRUE),"probabilities")
n<-dim(prob)[1]
m<-dim(prob)[2]
me<-which(prob==apply(prob,1,max))
return(as.factor(model$labels[floor((me-1)/n)+1]))
}
## Linear Kernel function
K <- function(i,j) crossprod(i,j)
predsvm <- function(object, newdata) {
## compute start-index
start <- c(1, cumsum(object$nSV)+1)
start <- start[-length(start)]
## compute kernel values
kernel <- sapply (1:object$tot.nSV,
function (x) K(object$SV[x,], newdata))
## compute raw prediction for classifier (i,j)
predone <- function (i,j) {
## ranges for class i and j:
ri <- start[i] : (start[i] + object$nSV[i] - 1)
rj <- start[j] : (start[j] + object$nSV[j] - 1)
## coefs for (i,j):
coef1 <- object$coefs[ri, j-1]
coef2 <- object$coefs[rj, i]
## return raw values:
crossprod(coef1, kernel[ri]) + crossprod(coef2, kernel[rj])
}
## compute votes for all classifiers
votes <- rep(0,object$nclasses)
c <- 0 # rho counter
for (i in 1 : (object$nclasses - 1))
for (j in (i + 1) : object$nclasses)
if (predone(i,j) > object$rho[c <- c + 1])
votes[i] <- votes[i] + 1
else
votes[j] <- votes[j] + 1
## return winner (index with max. votes)
object$levels[which(votes %in% max(votes))[1]]
}
Sunday, March 18, 2012
Smoothing splines and interest rate curves
Yield curves are important in Economics and used by finance professionals to analyze bonds and look for trading opportunities and by economists, to try to understand economic conditions.
The yield is the amount in cash that returns to the owners of a security, for example, if a sovereign bond with maturity at time $T$ is bought at time $t$ for a price $P(t,T)$, the yield would be high if $P(t,T)$ is much less than the maturity price $P(T,T)$ (which would obviously be the risk-free amount received by the lender). A higher yield allows the owner to make more profit from the investment, but it may also mean that the security is being sold at a discount as a result of some risk related to the security. For a recent example of this, Greek bonds, which were sold at such discounts because the market anticipated a default, this made yields soar, with risk-averting investors unloading bonds at the lower prices that risk-taking investors were comfortable buying in. Ultimately everybody took a haircut.
For fixed income mathematics, a set of axioms must be assumed, namely:
It is assumed further that there are no legal barriers to trading and that the market is rational (traders try to maximize their profits).
Now, define the zero yield in terms of the bond price above. The zero yield, as seen at time $t$, of a bond that matures at time $T$, is denoted by $y(t,T)$ and is defined by
$$
P(t,T)=e^{-(T-t) y(t,T) }
$$
for every $t<T$, which means that the current price of the bond is equal to discounted price at maturity (using compound interest).
Suppose that at time $t$ we enter into a forward contract to deliver at time $T_1$ a bond that will mature at time $T_2$ (obviously $T_2>T_1$). Let the forward price of the bond be denoted by $P(t,T_1,T_2)$. At the same time, a bond that matures at time $T_1$ is purchased at a price of $P(t,T_1)$. Again at time $t$, a bond that matures at time $T_2$ is bought with a price of $P(t,T_2)$. Notice that the axiom specifying that there are no arbitrage opportunities implies that the price of the bond maturing at time $T_2$ must be equal to the product of the price of the bond maturing at time $T_1$ and the forward price $P(t,T_1) P(t,T_1,T_2)$. Let the implied forward rate $f(t,T_1,T_2)$, valued at time $t$, for the period $(T_1,T_2)$ as
$$
P(t,T_1,T_2)=e^{-(T_2-T_1) f(t,T_1,T_2) }
$$
Notice that $P(t,T_2) =P(t,T_1) P(t,T_1,T_2)=e^{-(T_1-t) y(t,T_1) } e^{-(T_2-T_1) f(t,T_1,T_2) }$ and, summing the exponents, we get
$$
f(t,T_1,T_2)= \frac{(T_2-t) y(t,T_2) - (T_1 - t) y(t,T_1)}{T_2 - T_1}
$$
which is the period forward rate. However, the instantaneous forward rate is of much greater importance in the theory of the term structure. The instantaneous forward rate for time $T$, as seen at time $t$, is denoted by $f(t,T)$ and is the continuously compounded rate defined by
$$
f(t,T)= \lim_{h \rightarrow 0} {f(t,T,T+h) } = y(t,T) + (T-t) y_T (t,T)
$$
where $y_T = \frac{\partial}{\partial T} y(t,T)$. We can interpret the previous equation as the current yield value plus the instantaneous change of the yield with the maturity time.
To interpolate curves, and thus have values for all the points within the interval and not only the data points that are made available to us by the problem, we can use a number of different methods. One needs some complexity to be able to capture what the nature is saying, but at the same time this complexity might be cause by some "noise", or perturbations that we want to avoid (market panics, manipulations or poorly registered data). We can choose a parametric family, such as polynomials, fix the order and fit the coefficients. However, this appears as somehow arbitrary. One can choose to interpolate with a non-parametric method such as splines. When using some functional space, one must restrict himself to functions that meet some criteria, not only that approximate well (or maybe the best approximation in that space). This is to avoid overfitting the said noise.
Now we can choose to fit an approximating yield or the forward curve that approximates the true yield or forward curves, with the information given by actual bond prices as of time $t$, by minimizing the following functional (assuming we fit the yield curve $\varphi$):
$$
R(\varphi,t) = \sum_{i=1}^n (P(t,T_i) - P(t,T_i,\varphi))^2 + \lambda \int \left( \frac{\partial^2}{\partial s^2} \varphi(s) \right)^2 ds.
$$
Where $\lambda$ is a regularization parameter, and $P(t,T_i,\varphi)$ is a functional that prices a bond of maturity $T_i$ at time $t$ with the yield curve $\varphi$. We change formulation, for practical derivation purposes (we now compute the observed yield $y(t,T_i)$ from the observed bond price)
$$
R(\varphi,t) = \sum_{i=1}^n (y(t,T_i) - \varphi(T_i))^2 + \lambda \int \left( \frac{\partial^2}{\partial s^2} \varphi(s) \right)^2 ds.
$$
And then we apply the standard smoothing splines theory to compute the solution, which is:
$$
\mathbf{w} = (I+\lambda K)^{-1} \mathbf{y}
$$
Where $\mathbf{y}$ is a vector with elements $y(t,T_i)$, $K$ is a matrix whose elements are of the form $\int \frac{\partial^2}{\partial s^2} \phi_i (s) \frac{\partial^2}{\partial s^2} \phi_j (s) ds$, with $\phi$ is a choosen spline basis (notice the dot-product structure) and $\mathbf{w}$ is a vector with the coefficients on that basis, so that the estimated function is
$$
\varphi = \sum_i w_i \phi_i
$$
The yield is the amount in cash that returns to the owners of a security, for example, if a sovereign bond with maturity at time $T$ is bought at time $t$ for a price $P(t,T)$, the yield would be high if $P(t,T)$ is much less than the maturity price $P(T,T)$ (which would obviously be the risk-free amount received by the lender). A higher yield allows the owner to make more profit from the investment, but it may also mean that the security is being sold at a discount as a result of some risk related to the security. For a recent example of this, Greek bonds, which were sold at such discounts because the market anticipated a default, this made yields soar, with risk-averting investors unloading bonds at the lower prices that risk-taking investors were comfortable buying in. Ultimately everybody took a haircut.
For fixed income mathematics, a set of axioms must be assumed, namely:
- The market trades continuously over its trading horizon: it extends from the current time to some distant future time such that the maturities of all the instruments to be valued fall between now and the trading horizon..
- The market is efficient: information is available to all traders simultaneously, and every trader makes use of all the available information, also related to the axiom that
- There are no arbitrage opportunities: the price of a portfolio is the sum of its constituent parts.
- The market is complete: any desired cash flow can be obtained from a suitable self-financing strategy based on a portfolio of discount bonds.
It is assumed further that there are no legal barriers to trading and that the market is rational (traders try to maximize their profits).
Now, define the zero yield in terms of the bond price above. The zero yield, as seen at time $t$, of a bond that matures at time $T$, is denoted by $y(t,T)$ and is defined by
$$
P(t,T)=e^{-(T-t) y(t,T) }
$$
for every $t<T$, which means that the current price of the bond is equal to discounted price at maturity (using compound interest).
Suppose that at time $t$ we enter into a forward contract to deliver at time $T_1$ a bond that will mature at time $T_2$ (obviously $T_2>T_1$). Let the forward price of the bond be denoted by $P(t,T_1,T_2)$. At the same time, a bond that matures at time $T_1$ is purchased at a price of $P(t,T_1)$. Again at time $t$, a bond that matures at time $T_2$ is bought with a price of $P(t,T_2)$. Notice that the axiom specifying that there are no arbitrage opportunities implies that the price of the bond maturing at time $T_2$ must be equal to the product of the price of the bond maturing at time $T_1$ and the forward price $P(t,T_1) P(t,T_1,T_2)$. Let the implied forward rate $f(t,T_1,T_2)$, valued at time $t$, for the period $(T_1,T_2)$ as
$$
P(t,T_1,T_2)=e^{-(T_2-T_1) f(t,T_1,T_2) }
$$
Notice that $P(t,T_2) =P(t,T_1) P(t,T_1,T_2)=e^{-(T_1-t) y(t,T_1) } e^{-(T_2-T_1) f(t,T_1,T_2) }$ and, summing the exponents, we get
$$
f(t,T_1,T_2)= \frac{(T_2-t) y(t,T_2) - (T_1 - t) y(t,T_1)}{T_2 - T_1}
$$
which is the period forward rate. However, the instantaneous forward rate is of much greater importance in the theory of the term structure. The instantaneous forward rate for time $T$, as seen at time $t$, is denoted by $f(t,T)$ and is the continuously compounded rate defined by
$$
f(t,T)= \lim_{h \rightarrow 0} {f(t,T,T+h) } = y(t,T) + (T-t) y_T (t,T)
$$
where $y_T = \frac{\partial}{\partial T} y(t,T)$. We can interpret the previous equation as the current yield value plus the instantaneous change of the yield with the maturity time.
To interpolate curves, and thus have values for all the points within the interval and not only the data points that are made available to us by the problem, we can use a number of different methods. One needs some complexity to be able to capture what the nature is saying, but at the same time this complexity might be cause by some "noise", or perturbations that we want to avoid (market panics, manipulations or poorly registered data). We can choose a parametric family, such as polynomials, fix the order and fit the coefficients. However, this appears as somehow arbitrary. One can choose to interpolate with a non-parametric method such as splines. When using some functional space, one must restrict himself to functions that meet some criteria, not only that approximate well (or maybe the best approximation in that space). This is to avoid overfitting the said noise.
Now we can choose to fit an approximating yield or the forward curve that approximates the true yield or forward curves, with the information given by actual bond prices as of time $t$, by minimizing the following functional (assuming we fit the yield curve $\varphi$):
$$
R(\varphi,t) = \sum_{i=1}^n (P(t,T_i) - P(t,T_i,\varphi))^2 + \lambda \int \left( \frac{\partial^2}{\partial s^2} \varphi(s) \right)^2 ds.
$$
Where $\lambda$ is a regularization parameter, and $P(t,T_i,\varphi)$ is a functional that prices a bond of maturity $T_i$ at time $t$ with the yield curve $\varphi$. We change formulation, for practical derivation purposes (we now compute the observed yield $y(t,T_i)$ from the observed bond price)
$$
R(\varphi,t) = \sum_{i=1}^n (y(t,T_i) - \varphi(T_i))^2 + \lambda \int \left( \frac{\partial^2}{\partial s^2} \varphi(s) \right)^2 ds.
$$
And then we apply the standard smoothing splines theory to compute the solution, which is:
$$
\mathbf{w} = (I+\lambda K)^{-1} \mathbf{y}
$$
Where $\mathbf{y}$ is a vector with elements $y(t,T_i)$, $K$ is a matrix whose elements are of the form $\int \frac{\partial^2}{\partial s^2} \phi_i (s) \frac{\partial^2}{\partial s^2} \phi_j (s) ds$, with $\phi$ is a choosen spline basis (notice the dot-product structure) and $\mathbf{w}$ is a vector with the coefficients on that basis, so that the estimated function is
$$
\varphi = \sum_i w_i \phi_i
$$
Thursday, March 15, 2012
Beautiful spectacle in the sky
I enjoy Astronomy so much that I examine the night sky every time I can. Tonight I couldn't wait to describe the delightful sight in the sky I was witnessing: Jupiter and Venus almost at the closest distance (as they meet the eye, not actual Euclidean distance, they were closest two days ago). And these are some beautiful pictures on the Internet (you may see them in space.com) so that the blog records this spectacular celestial art.
Mostar in Bosnia-Herzegovina
Eagle Harbor, Mich. USA
As per the fabulous site http://www.solarsystemscope.com, the current planet locations are like this, observe the narrow angle in which we can see Jupiter and Venus.
An observer from California at the time of this post would see Venus on the top right of Jupiter in the sunset (in the previous picture of the solar system, put your head horizontally to the left, and see that this would indeed be true).
If the doors of perception were cleansed every thing would appear to man as it is, infinite
William Blake
One feels blessed to having been granted the opportunity to grasp such a marvelous look into infinity.
The coin denomination problem
I read sometime that this is a common question in interviews with Google. The problem states:
Given any set of coins, build an algorithm that returns the minimum amount of coins necessary to make up this number.
The first naive strategy one tends to follow is to use the maximim amount of the maximum valued coins in the set, and then follow with lower valued coins. This usually works with real world coin systems, with follow a pattern that makes this possible. However, the general problem is slightly more complicated. Consider the following set of coins: {1,2,5,8,10}. This set won't work with this strategy.
A better approach consists on considering the lower-valued coins not only when more higher valued coins are substracted, but on every iteration. When we are finished, we perform the same operation recursively, without using the highest-valued coin at the present recurseve call. In the example, we would take the 10-coin out, in the next recursive call, we take the 8-coin out, an so forth. Thus, we take into account all possibilities. Of course, we should track the minimum number of coins to cut the tree.
The following program implements the solution to this problem:
Given any set of coins, build an algorithm that returns the minimum amount of coins necessary to make up this number.
The first naive strategy one tends to follow is to use the maximim amount of the maximum valued coins in the set, and then follow with lower valued coins. This usually works with real world coin systems, with follow a pattern that makes this possible. However, the general problem is slightly more complicated. Consider the following set of coins: {1,2,5,8,10}. This set won't work with this strategy.
A better approach consists on considering the lower-valued coins not only when more higher valued coins are substracted, but on every iteration. When we are finished, we perform the same operation recursively, without using the highest-valued coin at the present recurseve call. In the example, we would take the 10-coin out, in the next recursive call, we take the 8-coin out, an so forth. Thus, we take into account all possibilities. Of course, we should track the minimum number of coins to cut the tree.
The following program implements the solution to this problem:
/* This program solves the coin denomination problem.
//
// You may redistribute this file according to the GPL terms.
// Copyright 2006 Javier Arriero Pais
*/
#include <stdio.h>
#include <stdlib.h>
#define MAXC 5
int coins[MAXC] = {1,2,5,8,10};
int amount = 13;
int numcoins (int, int);
int main (int argc, char *argv[])
{
if (argc > 1)
amount = strtol (argv[1], NULL, 10);
printf ("Minimal number of coins: %i\n", numcoins(amount, MAXC) );
return 0;
}
int numcoins (int amount, int maxc)
{
int i;
int auxamount=amount, mon=10000, auxmon=0, auxmon2;
if (maxc<=0) return 0;
if (maxc==1) return amount;
if (amount <= 0) return 0;
if (amount == 1) return 1;
i=maxc-1;
while (auxamount >= coins[i]) {
auxamount-=coins[i];
auxmon2 = numcoins (auxamount, maxc-1);
auxmon++;
if (auxmon + auxmon2 < mon) mon = auxmon2 + auxmon;
}
auxmon2 = numcoins(amount, maxc-1);
if (auxmon2 != 0 && auxmon2 < mon)
mon = auxmon2;
return mon;
}
Sunday, March 11, 2012
Integrating Latex in Blogger
If you wonder how I managed to get the beautiful formulae of the previous post, you need to go to the Dashboard, and in Templates, find the "Edit HTML" button. You will edit the very template you are using to visualize your blog. Then, right before the HTML tag "</head>", put this:
By the way, I prevented the parsing of the previous code by the Math.js Latex parser (on the http://cdn.mathjax.org server, as you can see) by surrounding the text by the HTML tag <pre>. Otherwise the parser finds some of the codes above as erroneous Latex.
Then write your $\LaTeX$ formulae as if you were using your MikTex, eclosing them within dollar symbols, or within double dollar symbols if you need an environment similar to displaymath.<script src='http://cdn.mathjax.org/mathjax/latest/MathJax.js' type='text/javascript'>
MathJax.Hub.Config({
extensions: ["tex2jax.js","TeX/AMSmath.js","TeX/AMSsymbols.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
By the way, I prevented the parsing of the previous code by the Math.js Latex parser (on the http://cdn.mathjax.org server, as you can see) by surrounding the text by the HTML tag <pre>. Otherwise the parser finds some of the codes above as erroneous Latex.
Buffon's needle
What a better way to start the blog but with the classical probabilistic puzzle of Buffon's needle?
The statement, as per Wikipedia:
Now let's think the situation in which the needle will cut the strip. If the angle $\theta$ is zero, then the needle is forced to rest along the strip, therefore $h=0$. If the needle is perpendicular to the strip (the angle is $\frac{\pi}{2}$, then the distance can be anything, up to $\frac{1}{2}$. We see that we can rotate a needle from an angle of zero up to $\pi$ before we get the same position of the needle and that the distance can range from zero (the center of the needle lying on a strip) to $\frac{1}{2}$ (before being closer to another strip).
We must add "as many" combinations of $\theta$ and $h$ make the needle touch a strip. Since these variables are continuous, we need to integrate them. Therefore, to "add" all the values, we fix an angle $\theta$ and think of how far left or right (in the picture above) we can move the needle without preventing it from touching the strip. So, when $\theta=0$ then the needle cannot move without leaving the strip, and thus $h=0$, its minimum value. On the other hand, when $\theta=\frac{\pi}{2}$ then $h$ is completely free to have values within its possible range from $-\frac{1}{2}$ to $\frac{1}{2}$, this is, $\frac{1}{2}$ away from the closest strip. In between, we will be able to move the needle's center within a range $[-\frac{\sin\theta}{2}, \frac{\sin\theta}{2}]$ from the closest strip. Each value must be weighted by its "importance" in the interval we are integrating. As per our assumptions, every point "weights" the same (has the same probability or density in the inverval). For the angles, the density all over the interval is $\frac{1}{\pi}$, and for the distances, $1$.
$$\int_{0}^{\pi} { \int_{-\frac{\sin\theta}{2}}^{\frac{\sin\theta}{2}} {1 dh} \frac{1}{\pi} d\theta} =\\ \int_{0}^{\pi} {\left( \frac{\sin\theta}{2} - (-\frac{\sin\theta}{2}) \right) \frac{1}{\pi} d\theta} = \\ \frac{1}{\pi} \int_{0}^{\pi} {\sin\theta d\theta} = \\
\frac{1}{\pi} \left(- \cos\pi - (-\cos0) \right) = \frac{2}{\pi}$$
Although we didn't say, the equal weighting is called uniform probability. The result would change if the needle was to be thrown with some other probability law.
This problem also gives us the change to estimate $\pi$. If we throw $N$ needles over the floor, and count how may actually met the strips and divide it by $N$, we would get an approximation to $ \frac{1}{\pi}$ (by the law of large numbers). Inverting, we have estimated $\pi$. In this image (taken from here), 100.000 needle throws are simulated. If you focus on the angle, what this histogram tells you is the freedom to move the center of the needle away from the closest strip. The farthest distance would be $\frac{1}{2}$, but the author scaled the histogram to one to match it with the $\sin \theta$ function.
The statement, as per Wikipedia:
Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?For the sake of clarity, we will consider the case where the separation of the strips is $t=1$, the needle's lenght is $l=1$ and the needle can fall equally at every point in the floor. We will call $h$ to the distance from the center of the needle to the closest point of the closest strip, and $\theta$ to the angle of the needle and the strip line.
Now let's think the situation in which the needle will cut the strip. If the angle $\theta$ is zero, then the needle is forced to rest along the strip, therefore $h=0$. If the needle is perpendicular to the strip (the angle is $\frac{\pi}{2}$, then the distance can be anything, up to $\frac{1}{2}$. We see that we can rotate a needle from an angle of zero up to $\pi$ before we get the same position of the needle and that the distance can range from zero (the center of the needle lying on a strip) to $\frac{1}{2}$ (before being closer to another strip).
We must add "as many" combinations of $\theta$ and $h$ make the needle touch a strip. Since these variables are continuous, we need to integrate them. Therefore, to "add" all the values, we fix an angle $\theta$ and think of how far left or right (in the picture above) we can move the needle without preventing it from touching the strip. So, when $\theta=0$ then the needle cannot move without leaving the strip, and thus $h=0$, its minimum value. On the other hand, when $\theta=\frac{\pi}{2}$ then $h$ is completely free to have values within its possible range from $-\frac{1}{2}$ to $\frac{1}{2}$, this is, $\frac{1}{2}$ away from the closest strip. In between, we will be able to move the needle's center within a range $[-\frac{\sin\theta}{2}, \frac{\sin\theta}{2}]$ from the closest strip. Each value must be weighted by its "importance" in the interval we are integrating. As per our assumptions, every point "weights" the same (has the same probability or density in the inverval). For the angles, the density all over the interval is $\frac{1}{\pi}$, and for the distances, $1$.
$$\int_{0}^{\pi} { \int_{-\frac{\sin\theta}{2}}^{\frac{\sin\theta}{2}} {1 dh} \frac{1}{\pi} d\theta} =\\ \int_{0}^{\pi} {\left( \frac{\sin\theta}{2} - (-\frac{\sin\theta}{2}) \right) \frac{1}{\pi} d\theta} = \\ \frac{1}{\pi} \int_{0}^{\pi} {\sin\theta d\theta} = \\
\frac{1}{\pi} \left(- \cos\pi - (-\cos0) \right) = \frac{2}{\pi}$$
Although we didn't say, the equal weighting is called uniform probability. The result would change if the needle was to be thrown with some other probability law.
This problem also gives us the change to estimate $\pi$. If we throw $N$ needles over the floor, and count how may actually met the strips and divide it by $N$, we would get an approximation to $ \frac{1}{\pi}$ (by the law of large numbers). Inverting, we have estimated $\pi$. In this image (taken from here), 100.000 needle throws are simulated. If you focus on the angle, what this histogram tells you is the freedom to move the center of the needle away from the closest strip. The farthest distance would be $\frac{1}{2}$, but the author scaled the histogram to one to match it with the $\sin \theta$ function.
Subscribe to:
Posts (Atom)