Posts Tagged ‘science’

The diffusion  equations:

Assuming a constant diffusion coefficient, D, we use the Crank-Nicolson methos (second order accurate in time and space):

u[n+1,j]-u[n,j] = 0.5 a {(u[n+1,j+1] – 2u[n+1,j] + u[n+1,j-1])+(u[n,j+1] – 2u[n,j] + u[n,j-1])}

A linear system of equations, A.U[n+1] = B.U[n], should be solved in each time setp.

As an example, we take a Gaussian pulse and study variation of density with time. Such example can occur in several fields of physics, e.g., quantum mechanics.

As stated above, we have two arrays: A and B. We form them once, and then calculate inverse(A).B:

# populate the coefficient arrays
from scipy.linalg import svd
from scipy.linalg import inv

for j in arange(0, nx_step):
    coefa[j, (j-1)%nx_step] = – lamb
    coefa[j, (j)%nx_step] = 2*lamb + 2.
    coefa[j, (j+1)%nx_step] = – lamb
lamb = – lamb
for j in arange(0, nx_step):
    coefb[j, (j-1)%nx_step] = – lamb
    coefb[j, (j)%nx_step] = 2*lamb + 2.
    coefb[j, (j+1)%nx_step] = – lamb
coefa = scipy.linalg.inv(coefa)         # inverse of A
coef = numpy.dot(coefa, coefb)  
coef = scipy.linalg.inv(coef)

for i in arange(1,nt_step):        #———– main loop ———

    ans = scipy.linalg.solve(coef, uo)
    uo = ans
    plot(uoz,’k’, ans,’r’)
    t[i] = t[i-1] + dt
    print dt*nt_step – t[i], ‘      ‘, ans.sum()
    rho[:,i] = ans

The result is shown in the following figure: as expected, the peak of the density profile falls down. If one continues the simulation for a longer time, the density distribution will be completely uniform.

Read Full Post »

Sometimes we have to prepare scatter plot of two parameters. When number of elements in each
parameter is a big number, e.g., several thousands, and points are concentrated, it is very difficult
to use a standard scatter plot. The reason is that due to over-population, one cannot say where
the density has a maximum. In this post, I explain alternative ways to prepare such plots.

The below scatter plot has more than two million points. It is absolutely impossible to say where is
the peak of the distribution.

One way to cope with the problem is to over plot the contours of the density on the scatter plot.

Using np.histogram2d function of numpy, one can create a map of two dimensional density function.

It is not always the best idea. Sometimes, it might be better to show the 2D density map itself (below).
The colorbar reads the number of points in each bin.

Now, one might complain that due to large central concentration, the halo of the distribution is not seen in
the 2D density map. There are two solutions for the issue: either we change the color table, or over plot
the contour on the 2D density plot (below). As you see, we can easily show the values of the contours as well.

the Python code to create this plot is the following:

fig = plt.figure()
ax = fig.add_subplot(111)
H, xedges, yedges = np.histogram2d(aa, bb, range=[[293.,1454.0], [464.,1896.0]], bins=(50, 50))
extent = [yedges[0], yedges[-1], xedges[0], xedges[-1]]
subplots_adjust(bottom=0.15, left=0.15)
levels = (1.0e4, 1.0e3, 1.0e2, 2.0e1)
cset = contour(H, levels, origin=’lower’,colors=[‘black’,’green’,’blue’,’red’],linewidths=(1.9, 1.6, 1.5, 1.4),extent=extent)
plt.clabel(cset, inline=1, fontsize=10, fmt=’%1.0i’)
for c in cset.collections:

Yet another alternative is just to show the density contours and forget about the rest(below).

Read Full Post »

Importance sampling is choosing a good distribution from which to simulate one’s random variables. It involves multiplying the integrand by 1 (usually dressed up in a “tricky fashion”) to yield an expectation of a quantity that varies less than the original integrand over the region of integration.

We now return to it in a slightly more formal way. Suppose that an integrand f can be written as the product of a function h that is almost constant times another, positive, function g. Then its integral over a multidimensional volume V is

f dV =(f /g) gdV =h gdV

Here, we interpreted this equation as suggesting a change of variable to G, the indefinite integral of g. That made gdV a perfect differential. We then proceeded to use the basic theorem of Monte Carlo integration. A more general interpretation of the above equation is that we can integrate f  by instead sampling h — not, however, with uniform probability density dV , but rather with nonuniform density gdV . In this second interpretation, the first interpretation follows as the special case.

In this case, random points are chosen from a uniform distribution in the domain A < y < B.  The new integrand, f/g, is close to unity and so the variance (i.e. the value of  sigma) for this function is much smaller than that obtained when evaluating the function  by sampling from a uniform distribution. Sampling from a non-uniform distribution for this function should therefore be more efficient than doing a crude Monte Carlo calculation without importance sampling.

Numerical example

We want to evaluate the integral of y=x^2 in the range between zero and one. We take g(x)=x. It samples larger values more often. The inverse function of g is √(2x).

Recipes: 1) take N uniform samples between zero and one, 2) build a set of non-uniform sample such that dU=gdV is uniformly sampled. To this end, one has to use the inverse of g(x). In other words, if p(x) is the uniform distribution of x, then q(u)= g-1(x)  will be a uniform distribution for U. 3) evaluate the average of h=f/g for the new sample.

The distribution of the new variable, dU=gdV, is like this:

Now, we have to perform a normal Monte Carlo integration of the function h in U domain. The results are shown in the next plot. For comparison, result of a Monte Carlo integration with uniform sample and the Simpson’s rule were also shown.

Here, each point shows the average of absolute errors after a hundred iterations (except for the Simpson’s method). Results of importance sampling have systematically smaller variance compared to the ones with uniform samples. Hence, the practical trick is to find the best g function.

g MUST be a good approximation for f

In the following two examples, I show a similar plot as above but for two other functions. In the first case, y=√x, the importance sampling does not help. The variance is comparable to the Monte Carlo method with uniformly distributed samples.

The next example shows that selecting a wrong  g function can make the situation even worse. We use the similar function, g(x)=x, for integration of the unit circle (see the previous post). In other words, we sample regions around one more often than regions about zero while the value of function close to zero is larger than those about one. So, we get a worse variance compared to a completely uniform sample.

Read Full Post »

Monte Carlo Integration

In mathematics, Monte Carlo integration is numerical integration using random numbers. That is, Monte Carlo integration methods are algorithms for the approximate evaluation of definite integrals, usually multidimensional ones. The usual algorithms evaluate the integrand at a regular grid. Monte Carlo methods, however, randomly choose the points at which the integrand is evaluated. It is helpful for complex functions or large dimensions.

To evaluate a function f, we sample it N times:

Here the angle brackets denote taking the arithmetic mean over the N sample points,

The fundamental disadvantage of simple Monte Carlo integration is that its accuracy increases only as the square root of N , the number of sampled points. Each new point sampled adds linearly to an accumulated sum that will become the function average, and also linearly to an accumulated sum of squares that will become the variance (see chapter 7 of  Numerical Recipes).

The sampled points should be randomly selected. Assume we take a set of uniform random number with a distribution like below:

Numerical example: calculate π

We want to calculate the Pi number to a certain level of accuracy. There are several approaches. I take integral of the unit circle in the first quadrant and multiply that with four.

Recipe: 1) generate N random number between zero and one, 2) evaluate function f   for each of them, and 3) perform averaging (above equation). The plot below shows the average of error after 100 times iteration for each number of samples.  For comparison, I put the simple Simpson integration method as well.   The uncertainty is larger than traditional integration method (1/N in this case) and with increasing  number of samples, it goes down slowly.

As discussed above, the variance goes with square root of N. It starts to show advantages over traditional integration methods only in large dimensions. We have limitations for increasing this number. The convergence scheme  makes calculations computationally expensive. The CPU run time for the above calculations is shown below.

In one dimension (the present example), it goes linearly. In two dimensions, it goes with square of N !  That severely limits number of iterations in large dimensions. Recall that we selected a set of uniformly distributed random numbers. However, if the function has a sharp peak at certain range and takes small values elsewhere, we need to somehow sample those regions more frequently to enhance the accuracy. That is topic of the next post, the importance sampling.

Read Full Post »

A word on PDEs

So far, we discussed very simple problems. From now one, we want to simulate some simple problems, e.g., the diffusion equation. For this reason, I would like to have a very short discussion on the PDEs. I refer the interested reader to chapter 19 of Numerical Recipes.

Partial differential equations are at the heart of many, if not most, computer analysis or simulations of continuous physical systems, such as fluids, electromagnetic fields, etc. In most mathematics books, partial differential equations (PDEs) are classified into the three categories, hyperbolic, parabolic, and elliptic, on the basis of their characteristics, or curves of information propagation. The prototypical example of a hyperbolic equation is the one-dimensional wave equation

where v = constant is the velocity of wave propagation. The prototypical parabolic equation is the diffusion equation

where D is the diffusion coefficient.

The prototypical elliptic equation is the Poisson equation

where the source term ρ is given. If the source term is equal to zero, the equation is Laplace’s equation. From a computational point of view, the classification into these three canonical

From a computational point of view, the classification into these three canonical types is not very meaningful. The first two equations both define initial value or Cauchy problems: If information on u (perhaps including time derivative information) is given at some initial time t0 for all x, then the equations describe how u(x, t) propagates itself forward in time. In other words, these two equations describe time evolution. The goal of a numerical code should be to track that time evolution with some desired accuracy. In initial value problems, one’s principal computational concern must be the stability of the algorithm. Many reasonable- looking algorithms for initial value problems just don’t work — they are numerically unstable.

By contrast, the third equation directs us to find a single “static” function u(x, y) which satisfies the equation within some (x, y) region of interest, and which — one must also specify — has some desired behavior on the boundary of the region of interest. These problems are called boundary value problems. In contrast to initial value problems, stability is relatively easy to achieve for boundary value problems. Thus, the efficiency of the algorithms, both in computational load and storage requirements, becomes the principal concern.

Read Full Post »

Older Posts »