Posts Tagged ‘histogram’


Using histogram or better to say a probability density function (PDF) is a daily task for many people in science and engineering. Making a histogram is the same as a PDF: one just needs to add a keyword “normed=True”. Since making histogram is a standard example in Matplotlib, I do not repeat it here in detail. I rather discuss an annoying feature of it.  The problem is the following: Imagine you have two histograms on a plot and want to add a legend. If you select the step function for plotting the histogram and then add a legend, it creates a rectangle instead of a line in the legend, like below:

n, bins, patches = plt.hist(x, 50, normed=1, histtype=’step’, lw=2, color=’blue’, label=’plot a’)

n, bins, patches = plt.hist(y, 50, normed=1, histtype=’step’, lw=2, color=’red’, label=’plot b’)

Now if we use the legend command as usual:

plt.legend(loc=’upper left’)

we face the following which is very annoying for me:

As far as I know, there is no way to get rid of this rectangle. The only way I could figure out was to use a Line2D object to make fake legend like this:

legend([Line2D([0], [0], color=’b’,lw=2),Line2D([0], [0], color=’r’,lw=2)], [‘plot a’, ‘plot b’], ‘upper left’)

which leads to what I expected to eb a default behavior:

I hope in the next release of Matplotlib, they consider an improvement for this issue.

Read Full Post »

Sometimes we have to prepare scatter plot of two parameters. When number of elements in each
parameter is a big number, e.g., several thousands, and points are concentrated, it is very difficult
to use a standard scatter plot. The reason is that due to over-population, one cannot say where
the density has a maximum. In this post, I explain alternative ways to prepare such plots.

The below scatter plot has more than two million points. It is absolutely impossible to say where is
the peak of the distribution.

One way to cope with the problem is to over plot the contours of the density on the scatter plot.

Using np.histogram2d function of numpy, one can create a map of two dimensional density function.

It is not always the best idea. Sometimes, it might be better to show the 2D density map itself (below).
The colorbar reads the number of points in each bin.

Now, one might complain that due to large central concentration, the halo of the distribution is not seen in
the 2D density map. There are two solutions for the issue: either we change the color table, or over plot
the contour on the 2D density plot (below). As you see, we can easily show the values of the contours as well.

the Python code to create this plot is the following:

fig = plt.figure()
ax = fig.add_subplot(111)
H, xedges, yedges = np.histogram2d(aa, bb, range=[[293.,1454.0], [464.,1896.0]], bins=(50, 50))
extent = [yedges[0], yedges[-1], xedges[0], xedges[-1]]
subplots_adjust(bottom=0.15, left=0.15)
levels = (1.0e4, 1.0e3, 1.0e2, 2.0e1)
cset = contour(H, levels, origin=’lower’,colors=[‘black’,’green’,’blue’,’red’],linewidths=(1.9, 1.6, 1.5, 1.4),extent=extent)
plt.clabel(cset, inline=1, fontsize=10, fmt=’%1.0i’)
for c in cset.collections:

Yet another alternative is just to show the density contours and forget about the rest(below).

Read Full Post »