Using a Bayesian fit is totally different from a least-squared fit. Doing it is also more complicated. In a chi-squared fit, we minimize a merit function. In a Bayesian fit, we have a set of priors, and a set of observations. To fit a model to those observations, we calculate a likelihood function. Unlike a least-squared problem, here basically each variable has a distribution. and we find the peak of distribution as the solution for unknown parameters. It is way better to read a review or something like this nice paper:
http://adsabs.harvard.edu/abs/1992ApJ…398..146G
Well, now that we know what is it, how can we do simplest things with that: i.e., fitting a straight line to some data points? Pymc made it easy. It is a python package which contains three different solvers for Bayesian statistics including a Markov chain Monte Carlo (MCMC) estimator.
You can not only use it to do simple fitting stuff like this, but also do more complicated things. However, I try to show some simple examples of its usage and comparison to a traditional fit in a separate post. Basically I compare fitting a parabola using chi-square and Bayesian method.
save the following in a file (I call it test.py):
# A quadratic fit
#———————————————————–
import numpy, pymc
# create some test data
x = numpy.arange(100) * 0.3
f = 0.1 * x**2 – 2.6 * x – 1.5
numpy.random.seed(76523654)
noise = numpy.random.normal(size=100) * .1 # create some Gaussian noise
f = f + noise # add noise to the data
z = numpy.polyfit(x, f, 2) # the traditional chi-square fit
print ‘The chi-square result: ‘, z
#priors
sig = pymc.Uniform(‘sig’, 0.0, 100.0, value=1.)
a = pymc.Uniform(‘a’, -10.0, 10.0, value= 0.0)
b = pymc.Uniform(‘b’, -10.0, 10.0, value= 0.0)
c = pymc.Uniform(‘c’, -10.0, 10.0, value= 0.0)
#model
@pymc.deterministic(plot=False)
def mod_quadratic(x=x, a=a, b=b, c=c):
return a*x**2 + b*x + c
#likelihood
y = pymc.Normal(‘y’, mu=mod_quadratic, tau=1.0/sig**2, value=f, observed=True)
#———————————————————–
Now, go to command line and run the following (or alternatively put them in a file):
import pymc, test # load the model file
R = pymc.MCMC(test) # build the model
R.sample(10000) # populate and run it
print ‘a ‘, R.a.value # print outputs
print ‘b ‘, R.b.value
print ‘c ‘, R.c.value
The output looks like this:
The chi-square result: [ 0.0999244 -2.59879643 -1.49278601]
Sampling: 100% [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 10000
a 0.0981867109293
b -2.54514077442
c -1.77016283024
The correct results are a=0.1, b=2.6, and c=1.5. Both methods gave satisfactory results. Indeed, if you increase number of iterations from 10000 to one million, the answer will be like this:
Sampling: 100% [000000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 1000000
a 0.0997945467529
b -2.59731835923
c -1.47308938314
As you see, the results are more accurate compared to the former case.
If you like to have more information, you can use the following command:
print ‘a ‘, R.a.stats()
a {‘95% HPD interval’: array([ 0.09956635, 0.10027448]), ‘n’: 1000000, ‘quantiles’: {2.5: 0.099545071298350371, 25: 0.099805050609863097, 50: 0.099919386231275664, 75: 0.10003496316945379, 97.5: 0.10025986774435611}, ‘standard deviation’: 0.0050467522199834905, ‘mc error’: 0.0003721035009025152, ‘mean’: 0.099548164235412406}
Recall that in a Bayesian fit, each variable has a distribution. The following plot shows the data points as well as the two Bayesian fits.
To compare it with a least-square fit, I repeated the experiment with a sample data which has more noise. The resulting fits are compared in the following panel.
There are much more you can learn from the examples of Pymc. I hope in a future post, I can explain other types of fit, like a weighted-least-square fit or a bisector fit.
Read Full Post »