Sometimes we have to prepare scatter plot of two parameters. When number of elements in each
parameter is a big number, e.g., several thousands, and points are concentrated, it is very difficult
to use a standard scatter plot. The reason is that due to over-population, one cannot say where
the density has a maximum. In this post, I explain alternative ways to prepare such plots.
The below scatter plot has more than two million points. It is absolutely impossible to say where is
the peak of the distribution.
One way to cope with the problem is to over plot the contours of the density on the scatter plot.
It is not always the best idea. Sometimes, it might be better to show the 2D density map itself (below).
The colorbar reads the number of points in each bin.
Now, one might complain that due to large central concentration, the halo of the distribution is not seen in
the 2D density map. There are two solutions for the issue: either we change the color table, or over plot
the contour on the 2D density plot (below). As you see, we can easily show the values of the contours as well.
the Python code to create this plot is the following:
fig = plt.figure()
ax = fig.add_subplot(111)
H, xedges, yedges = np.histogram2d(aa, bb, range=[[293.,1454.0], [464.,1896.0]], bins=(50, 50))
extent = [yedges, yedges[-1], xedges, xedges[-1]]
levels = (1.0e4, 1.0e3, 1.0e2, 2.0e1)
cset = contour(H, levels, origin=’lower’,colors=[‘black’,’green’,’blue’,’red’],linewidths=(1.9, 1.6, 1.5, 1.4),extent=extent)
plt.clabel(cset, inline=1, fontsize=10, fmt=’%1.0i’)
for c in cset.collections:
Yet another alternative is just to show the density contours and forget about the rest(below).