Feeds:
Posts

## 2D density plot (or 2D histogram)

Sometimes we have to prepare scatter plot of two parameters. When number of elements in each
parameter is a big number, e.g., several thousands, and points are concentrated, it is very difficult
to use a standard scatter plot. The reason is that due to over-population, one cannot say where
the density has a maximum. In this post, I explain alternative ways to prepare such plots.

The below scatter plot has more than two million points. It is absolutely impossible to say where is
the peak of the distribution. One way to cope with the problem is to over plot the contours of the density on the scatter plot. Using np.histogram2d function of numpy, one can create a map of two dimensional density function.

It is not always the best idea. Sometimes, it might be better to show the 2D density map itself (below).
The colorbar reads the number of points in each bin. Now, one might complain that due to large central concentration, the halo of the distribution is not seen in
the 2D density map. There are two solutions for the issue: either we change the color table, or over plot
the contour on the 2D density plot (below). As you see, we can easily show the values of the contours as well. the Python code to create this plot is the following:

fig = plt.figure()
H, xedges, yedges = np.histogram2d(aa, bb, range=[[293.,1454.0], [464.,1896.0]], bins=(50, 50))
extent = [yedges, yedges[-1], xedges, xedges[-1]]
levels = (1.0e4, 1.0e3, 1.0e2, 2.0e1)
cset = contour(H, levels, origin=’lower’,colors=[‘black’,’green’,’blue’,’red’],linewidths=(1.9, 1.6, 1.5, 1.4),extent=extent)
plt.clabel(cset, inline=1, fontsize=10, fmt=’%1.0i’)
for c in cset.collections:
c.set_linestyle(‘solid’)

Yet another alternative is just to show the density contours and forget about the rest(below). ### 3 Responses

1. on October 3, 2011 at 5:12 am | Reply drinking water to lose weight

http://howmanycaloriesshouldieatadayinfo.com/drinking-water-to-lose-weight-does-it-work/ Thanks for that awesome posting. It saved MUCH time 🙂

2. on June 23, 2015 at 9:12 am | Reply Sheng-Jun Lin

Hi! I think the extent option of contour() and the xedges/yedges returned by histogram2d() should a little different meaning.

xedges/yedges are exactly the positions of edge nodes; however, extent option requires the edges of cell-centered positions of grid.

Thus, for contours, the extent option should be assigned as,
dy = yedges-yedges; dx = xedges-xedges (by their monotonic increasing in linear scale)
extent = [yedges+dy/2, yedges[-1]-dy/2, xedges+dx/2, xedges[-1]-dx/2]

But the extent option of imshow() requires exactly the edges of boundary grid. Thus, for 2D density map, the extent option remains the same, i.e. extent = [yedges, yedges[-1], xedges, xedges[-1]]

Is my understanding right? Thank you.

3. on August 11, 2015 at 5:18 am | Reply Jo

I still don’t understand how to use your code…:( thing is i have 25000 by 25000 scatter points and i want to create a constant density number plot

but i can’t seem to figure it out…:(