Using a Bayesian fit is totally different from a least-squared fit. Doing it is also more complicated. In a chi-squared fit, we minimize a merit function. In a Bayesian fit, we have a set of priors, and a set of observations. To fit a model to those observations, we calculate a likelihood function. Unlike a least-squared problem, here basically each variable has a distribution. and we find the peak of distribution as the solution for unknown parameters. It is way better to read a review or something like this nice paper:

http://adsabs.harvard.edu/abs/1992ApJ…398..146G

Well, now that we know what is it, how can we do simplest things with that: i.e., fitting a straight line to some data points? Pymc made it easy. It is a python package which contains three different solvers for Bayesian statistics including a Markov chain Monte Carlo (MCMC) estimator.

You can not only use it to do simple fitting stuff like this, but also do more complicated things. However, I try to show some simple examples of its usage and comparison to a traditional fit in a separate post. Basically I compare fitting a parabola using chi-square and Bayesian method.

save the following in a file (I call it test.py):

# A quadratic fit

#———————————————————–

import numpy, pymc

# create some test data

x = numpy.arange(100) * 0.3

f = 0.1 * x**2 – 2.6 * x – 1.5

numpy.random.seed(76523654)

noise = numpy.random.normal(size=100) * .1 # create some Gaussian noise

f = f + noise # add noise to the data

z = numpy.polyfit(x, f, 2) # the traditional chi-square fit

print ‘The chi-square result: ‘, z

#priors

sig = pymc.Uniform(‘sig’, 0.0, 100.0, value=1.)

a = pymc.Uniform(‘a’, -10.0, 10.0, value= 0.0)

b = pymc.Uniform(‘b’, -10.0, 10.0, value= 0.0)

c = pymc.Uniform(‘c’, -10.0, 10.0, value= 0.0)

#model

@pymc.deterministic(plot=False)

def mod_quadratic(x=x, a=a, b=b, c=c):

return a*x**2 + b*x + c

#likelihood

y = pymc.Normal(‘y’, mu=mod_quadratic, tau=1.0/sig**2, value=f, observed=True)

#———————————————————–

Now, go to command line and run the following (or alternatively put them in a file):

import pymc, test # load the model file

R = pymc.MCMC(test) # build the model

R.sample(10000) # populate and run it

print ‘a ‘, R.a.value # print outputs

print ‘b ‘, R.b.value

print ‘c ‘, R.c.value

The output looks like this:

The chi-square result: [ 0.0999244 -2.59879643 -1.49278601]

Sampling: 100% [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 10000

a 0.0981867109293

b -2.54514077442

c -1.77016283024

The correct results are a=0.1, b=2.6, and c=1.5. Both methods gave satisfactory results. Indeed, if you increase number of iterations from 10000 to one million, the answer will be like this:

Sampling: 100% [000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 1000000

a 0.0997945467529

b -2.59731835923

c -1.47308938314

As you see, the results are more accurate compared to the former case.

If you like to have more information, you can use the following command:

print ‘a ‘, R.a.stats()

a {‘95% HPD interval’: array([ 0.09956635, 0.10027448]), ‘n’: 1000000, ‘quantiles’: {2.5: 0.099545071298350371, 25: 0.099805050609863097, 50: 0.099919386231275664, 75: 0.10003496316945379, 97.5: 0.10025986774435611}, ‘standard deviation’: 0.0050467522199834905, ‘mc error’: 0.0003721035009025152, ‘mean’: 0.099548164235412406}

Recall that in a Bayesian fit, each variable has a distribution. The following plot shows the data points as well as the two Bayesian fits.

To compare it with a least-square fit, I repeated the experiment with a sample data which has more noise. The resulting fits are compared in the following panel.

There are much more you can learn from the examples of Pymc. I hope in a future post, I can explain other types of fit, like a weighted-least-square fit or a bisector fit.

Read Full Post »