Archive for December, 2011

Pymc: Bayesian fit for Python

Using a Bayesian fit is totally different from a  least-squared fit. Doing it is also more complicated.  In a chi-squared fit, we minimize a merit function. In a Bayesian fit, we have a set of priors, and a set of observations. To fit a model to those observations, we calculate a likelihood function. Unlike a least-squared problem, here basically each variable has a distribution. and we find the peak of distribution as the solution for unknown parameters. It is way better to read a review or something like this nice paper:


Well, now that we know what is it, how can we do simplest things with that: i.e., fitting a straight line to some data points? Pymc made it easy. It is a python package which contains three different solvers for Bayesian statistics including a Markov chain Monte Carlo (MCMC) estimator.

You can not only use it to do simple fitting stuff like this, but also do more complicated things.  However, I try to show some simple examples of its usage and comparison to a traditional fit in a separate post. Basically I compare fitting a parabola using chi-square and Bayesian method.

save the following in a file (I call it test.py):

# A quadratic fit
import numpy, pymc

# create some test data
x = numpy.arange(100) * 0.3
f = 0.1 * x**2 – 2.6 * x – 1.5
noise = numpy.random.normal(size=100) * .1     # create some Gaussian noise
f = f + noise                                # add noise to the data

z = numpy.polyfit(x, f, 2)   # the traditional chi-square fit
print ‘The chi-square result: ‘,  z

sig = pymc.Uniform(‘sig’, 0.0, 100.0, value=1.)

a = pymc.Uniform(‘a’, -10.0, 10.0, value= 0.0)
b = pymc.Uniform(‘b’, -10.0, 10.0, value= 0.0)
c = pymc.Uniform(‘c’, -10.0, 10.0, value= 0.0)

def mod_quadratic(x=x, a=a, b=b, c=c):
      return a*x**2 + b*x + c

y = pymc.Normal(‘y’, mu=mod_quadratic, tau=1.0/sig**2, value=f, observed=True)

Now, go to command line and run the following (or alternatively put them in a file):

import pymc, test              # load the model file
R = pymc.MCMC(test)    #  build the model
R.sample(10000)              # populate and run it
print ‘a   ‘, R.a.value        # print outputs
print ‘b    ‘, R.b.value
print ‘c    ‘, R.c.value
The output looks like this:

The chi-square result: [ 0.0999244  -2.59879643 -1.49278601]
Sampling: 100% [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 10000
a    0.0981867109293
b     -2.54514077442
c     -1.77016283024

The correct results are a=0.1, b=2.6, and c=1.5. Both methods gave satisfactory results. Indeed, if you increase number of iterations from 10000 to one million, the answer will be like this:

Sampling: 100% [000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 1000000
a    0.0997945467529
b     -2.59731835923
c     -1.47308938314

As you see, the results are more accurate compared to the former case.

If you like to have more information, you can use the following command:

print ‘a   ‘, R.a.stats()

a    {‘95% HPD interval’: array([ 0.09956635,  0.10027448]), ‘n’: 1000000, ‘quantiles’: {2.5: 0.099545071298350371, 25: 0.099805050609863097, 50: 0.099919386231275664, 75: 0.10003496316945379, 97.5: 0.10025986774435611}, ‘standard deviation’: 0.0050467522199834905, ‘mc error’: 0.0003721035009025152, ‘mean’: 0.099548164235412406}

Recall that in  a Bayesian fit, each variable has a distribution. The following plot shows the data points as well as the two Bayesian fits.

To compare it with a least-square fit, I repeated the experiment with a sample data which has more noise. The resulting fits are compared in the following panel.

There are much more you can learn from the examples of Pymc. I hope in a future post, I can explain other types of fit, like a weighted-least-square fit or a bisector fit.

Read Full Post »