Feeds:
Posts

## Pymc: Bayesian fit for Python

Using a Bayesian fit is totally different from a  least-squared fit. Doing it is also more complicated.  In a chi-squared fit, we minimize a merit function. In a Bayesian fit, we have a set of priors, and a set of observations. To fit a model to those observations, we calculate a likelihood function. Unlike a least-squared problem, here basically each variable has a distribution. and we find the peak of distribution as the solution for unknown parameters. It is way better to read a review or something like this nice paper:

Well, now that we know what is it, how can we do simplest things with that: i.e., fitting a straight line to some data points? Pymc made it easy. It is a python package which contains three different solvers for Bayesian statistics including a Markov chain Monte Carlo (MCMC) estimator.

You can not only use it to do simple fitting stuff like this, but also do more complicated things.  However, I try to show some simple examples of its usage and comparison to a traditional fit in a separate post. Basically I compare fitting a parabola using chi-square and Bayesian method.

save the following in a file (I call it test.py):

#———————————————————–
import numpy, pymc

# create some test data
x = numpy.arange(100) * 0.3
f = 0.1 * x**2 – 2.6 * x – 1.5
numpy.random.seed(76523654)
noise = numpy.random.normal(size=100) * .1     # create some Gaussian noise
f = f + noise                                # add noise to the data

z = numpy.polyfit(x, f, 2)   # the traditional chi-square fit
print ‘The chi-square result: ‘,  z

#priors
sig = pymc.Uniform(‘sig’, 0.0, 100.0, value=1.)

a = pymc.Uniform(‘a’, -10.0, 10.0, value= 0.0)
b = pymc.Uniform(‘b’, -10.0, 10.0, value= 0.0)
c = pymc.Uniform(‘c’, -10.0, 10.0, value= 0.0)

#model
@pymc.deterministic(plot=False)
return a*x**2 + b*x + c

#likelihood
y = pymc.Normal(‘y’, mu=mod_quadratic, tau=1.0/sig**2, value=f, observed=True)
#———————————————————–

Now, go to command line and run the following (or alternatively put them in a file):

import pymc, test              # load the model file
R = pymc.MCMC(test)    #  build the model
R.sample(10000)              # populate and run it
print ‘a   ‘, R.a.value        # print outputs
print ‘b    ‘, R.b.value
print ‘c    ‘, R.c.value
The output looks like this:

The chi-square result: [ 0.0999244  -2.59879643 -1.49278601]
Sampling: 100% [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 10000
a    0.0981867109293
b     -2.54514077442
c     -1.77016283024

The correct results are a=0.1, b=2.6, and c=1.5. Both methods gave satisfactory results. Indeed, if you increase number of iterations from 10000 to one million, the answer will be like this:

Sampling: 100% [000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 1000000
a    0.0997945467529
b     -2.59731835923
c     -1.47308938314

As you see, the results are more accurate compared to the former case.

If you like to have more information, you can use the following command:

print ‘a   ‘, R.a.stats()

a    {‘95% HPD interval’: array([ 0.09956635,  0.10027448]), ‘n’: 1000000, ‘quantiles’: {2.5: 0.099545071298350371, 25: 0.099805050609863097, 50: 0.099919386231275664, 75: 0.10003496316945379, 97.5: 0.10025986774435611}, ‘standard deviation’: 0.0050467522199834905, ‘mc error’: 0.0003721035009025152, ‘mean’: 0.099548164235412406}

Recall that in  a Bayesian fit, each variable has a distribution. The following plot shows the data points as well as the two Bayesian fits.

To compare it with a least-square fit, I repeated the experiment with a sample data which has more noise. The resulting fits are compared in the following panel.

There are much more you can learn from the examples of Pymc. I hope in a future post, I can explain other types of fit, like a weighted-least-square fit or a bisector fit.

## Python: solving 1D diffusion equation

The diffusion  equations:

Assuming a constant diffusion coefficient, D, we use the Crank-Nicolson methos (second order accurate in time and space):

u[n+1,j]-u[n,j] = 0.5 a {(u[n+1,j+1] – 2u[n+1,j] + u[n+1,j-1])+(u[n,j+1] – 2u[n,j] + u[n,j-1])}

A linear system of equations, A.U[n+1] = B.U[n], should be solved in each time setp.

As an example, we take a Gaussian pulse and study variation of density with time. Such example can occur in several fields of physics, e.g., quantum mechanics.

As stated above, we have two arrays: A and B. We form them once, and then calculate inverse(A).B:

#———————————————–
# populate the coefficient arrays
#———————————————-
from scipy.linalg import svd
from scipy.linalg import inv

for j in arange(0, nx_step):
coefa[j, (j-1)%nx_step] = – lamb
coefa[j, (j)%nx_step] = 2*lamb + 2.
coefa[j, (j+1)%nx_step] = – lamb
#———————————————–
lamb = – lamb
for j in arange(0, nx_step):
coefb[j, (j-1)%nx_step] = – lamb
coefb[j, (j)%nx_step] = 2*lamb + 2.
coefb[j, (j+1)%nx_step] = – lamb
#———————————————–
coefa = scipy.linalg.inv(coefa)         # inverse of A
coef = numpy.dot(coefa, coefb)
coef = scipy.linalg.inv(coef)

for i in arange(1,nt_step):        #———– main loop ———

ans = scipy.linalg.solve(coef, uo)
uo = ans
plot(uoz,’k’, ans,’r’)
draw()
t[i] = t[i-1] + dt
print dt*nt_step – t[i], ‘      ‘, ans.sum()
rho[:,i] = ans

The result is shown in the following figure: as expected, the peak of the density profile falls down. If one continues the simulation for a longer time, the density distribution will be completely uniform.

## 2D density plot (or 2D histogram)

Sometimes we have to prepare scatter plot of two parameters. When number of elements in each
parameter is a big number, e.g., several thousands, and points are concentrated, it is very difficult
to use a standard scatter plot. The reason is that due to over-population, one cannot say where
the density has a maximum. In this post, I explain alternative ways to prepare such plots.

The below scatter plot has more than two million points. It is absolutely impossible to say where is
the peak of the distribution.

One way to cope with the problem is to over plot the contours of the density on the scatter plot.

Using np.histogram2d function of numpy, one can create a map of two dimensional density function.

It is not always the best idea. Sometimes, it might be better to show the 2D density map itself (below).
The colorbar reads the number of points in each bin.

Now, one might complain that due to large central concentration, the halo of the distribution is not seen in
the 2D density map. There are two solutions for the issue: either we change the color table, or over plot
the contour on the 2D density plot (below). As you see, we can easily show the values of the contours as well.

the Python code to create this plot is the following:

fig = plt.figure()
H, xedges, yedges = np.histogram2d(aa, bb, range=[[293.,1454.0], [464.,1896.0]], bins=(50, 50))
extent = [yedges[0], yedges[-1], xedges[0], xedges[-1]]
levels = (1.0e4, 1.0e3, 1.0e2, 2.0e1)
cset = contour(H, levels, origin=’lower’,colors=[‘black’,’green’,’blue’,’red’],linewidths=(1.9, 1.6, 1.5, 1.4),extent=extent)
plt.clabel(cset, inline=1, fontsize=10, fmt=’%1.0i’)
for c in cset.collections:
c.set_linestyle(‘solid’)

Yet another alternative is just to show the density contours and forget about the rest(below).

## FFT in Python:get the correct frequency

It happens that one uses the standard FFT routine of Python (or better to say Numy.fft) without knowing how can it return the proper frequency. In the following simple example, I show two methods to get it working correctly.

import numpy as np
from scipy import fft
import numpy, matplotlib, scipy.fftpack
from pylab import plt

num = 256
norm = 2.0 / float(num) # normalization factor
sample = 4.
time = np.arange(num) / sample
dx = time[1] – time[0]
f1 = .88
a1 = 2.
kont = a1 * np.sin(2 * np.pi * f1 * time)   # the input
g = fft(kont) * norm
#————————————————–
#– Method 1
freq = np.fft.fftshift(np.fft.fftfreq(kont.shape[-1]))/dx
plt.plot(freq, abs(np.fft.fftshift(g)))
#————————————————–
#– Method 2
#freq = np.fft.fftshift(np.fft.fftfreq(kont.shape[-1]))/dx
#plt.plot(freq, abs(np.fft.fftshift(g)))
#————————————————–
plt.axvline(x=f1, linewidth=2, color=’r’)
plt.xlim(xmin=1e-6)   # to have only positive frequencies
plt.xlabel(r’Frequency’)
plt.ylabel(r’Normalized amplitude’)
plt.title(‘method 1’)
plt.show()

If you have trouble due to  mismatch of some characters, try the PDF version.

## Importance sampling: concept + example

Importance sampling is choosing a good distribution from which to simulate one’s random variables. It involves multiplying the integrand by 1 (usually dressed up in a “tricky fashion”) to yield an expectation of a quantity that varies less than the original integrand over the region of integration.

We now return to it in a slightly more formal way. Suppose that an integrand f can be written as the product of a function h that is almost constant times another, positive, function g. Then its integral over a multidimensional volume V is

f dV =(f /g) gdV =h gdV

Here, we interpreted this equation as suggesting a change of variable to G, the indeﬁnite integral of g. That made gdV a perfect differential. We then proceeded to use the basic theorem of Monte Carlo integration. A more general interpretation of the above equation is that we can integrate f  by instead sampling h — not, however, with uniform probability density dV , but rather with nonuniform density gdV . In this second interpretation, the ﬁrst interpretation follows as the special case.

In this case, random points are chosen from a uniform distribution in the domain A < y < B.  The new integrand, f/g, is close to unity and so the variance (i.e. the value of  sigma) for this function is much smaller than that obtained when evaluating the function  by sampling from a uniform distribution. Sampling from a non-uniform distribution for this function should therefore be more efficient than doing a crude Monte Carlo calculation without importance sampling.

## Numerical example

We want to evaluate the integral of y=x^2 in the range between zero and one. We take g(x)=x. It samples larger values more often. The inverse function of g is √(2x).

Recipes: 1) take N uniform samples between zero and one, 2) build a set of non-uniform sample such that dU=gdV is uniformly sampled. To this end, one has to use the inverse of g(x). In other words, if p(x) is the uniform distribution of x, then q(u)= g-1(x)  will be a uniform distribution for U. 3) evaluate the average of h=f/g for the new sample.

The distribution of the new variable, dU=gdV, is like this:

Now, we have to perform a normal Monte Carlo integration of the function h in U domain. The results are shown in the next plot. For comparison, result of a Monte Carlo integration with uniform sample and the Simpson’s rule were also shown.

Here, each point shows the average of absolute errors after a hundred iterations (except for the Simpson’s method). Results of importance sampling have systematically smaller variance compared to the ones with uniform samples. Hence, the practical trick is to find the best g function.

## g MUST be a good approximation for f

In the following two examples, I show a similar plot as above but for two other functions. In the first case, y=√x, the importance sampling does not help. The variance is comparable to the Monte Carlo method with uniformly distributed samples.

The next example shows that selecting a wrong  g function can make the situation even worse. We use the similar function, g(x)=x, for integration of the unit circle (see the previous post). In other words, we sample regions around one more often than regions about zero while the value of function close to zero is larger than those about one. So, we get a worse variance compared to a completely uniform sample.

## Monte Carlo Integration

In mathematics, Monte Carlo integration is numerical integration using random numbers. That is, Monte Carlo integration methods are algorithms for the approximate evaluation of definite integrals, usually multidimensional ones. The usual algorithms evaluate the integrand at a regular grid. Monte Carlo methods, however, randomly choose the points at which the integrand is evaluated. It is helpful for complex functions or large dimensions.

To evaluate a function f, we sample it N times:

Here the angle brackets denote taking the arithmetic mean over the N sample points,

The fundamental disadvantage of simple Monte Carlo integration is that its accuracy increases only as the square root of N , the number of sampled points. Each new point sampled adds linearly to an accumulated sum that will become the function average, and also linearly to an accumulated sum of squares that will become the variance (see chapter 7 of  Numerical Recipes).

The sampled points should be randomly selected. Assume we take a set of uniform random number with a distribution like below:

## Numerical example: calculate π

We want to calculate the Pi number to a certain level of accuracy. There are several approaches. I take integral of the unit circle in the first quadrant and multiply that with four.

Recipe: 1) generate N random number between zero and one, 2) evaluate function f   for each of them, and 3) perform averaging (above equation). The plot below shows the average of error after 100 times iteration for each number of samples.  For comparison, I put the simple Simpson integration method as well.   The uncertainty is larger than traditional integration method (1/N in this case) and with increasing  number of samples, it goes down slowly.

As discussed above, the variance goes with square root of N. It starts to show advantages over traditional integration methods only in large dimensions. We have limitations for increasing this number. The convergence scheme  makes calculations computationally expensive. The CPU run time for the above calculations is shown below.

In one dimension (the present example), it goes linearly. In two dimensions, it goes with square of N !  That severely limits number of iterations in large dimensions. Recall that we selected a set of uniformly distributed random numbers. However, if the function has a sharp peak at certain range and takes small values elsewhere, we need to somehow sample those regions more frequently to enhance the accuracy. That is topic of the next post, the importance sampling.