Contour plots & smoothing: rights and wrongs

From: Mario Roederer (Roederer@Darwin.Stanford.EDU)
Date: Wed Sep 24 1997 - 16:43:28 EST


>   Since I believe in showing the real data, warts and all,
>   I avoid using contour plots

Howard, 

I must take issue with this.  Contour plots (when properly computed) do not
inaccurately display bivariate data.  In fact, they can be much more informative
than even color (or gray-scale) dot plots, which are much more difficult for
most people to readily interpret.  It's not difficult to make up a series of
test plots, shown in both formats, and demonstrate that the inexperienced person
will more readily estimate the population frequency in a contour plot than a
color-dot plot.  This, ultimately, is the goal in graphical presentation of
data.

Of course, you will agree that dot plots are completely inappropriate.
(Everyone:  please stop publishing data with single-color dot plots!)

You also stated that "smoothing" makes the data "look better" than it is.  This
is also not entirely correct--proper smoothing algorithms simply make the
contour plot look like it would if you were to collect a huge number of events.
In other words, proper density estimation algorithms, which are those that
employ a variable kernel-width smoothing algorithm, do not distort the data
presentation, and, in general, make it easier to interpret by mere humans.  Dave
Parks and Marty Bigos have discussed these issues at length in various chapters
on data analysis (for example, in the Handbook of Experimental Immunology).

The main downside of contour plots is that data outside the last contour is
generally not shown.  This problem has a simple solution:  by showing outlying
events together with contour plots:  thus, the contours give you the frequency
estimation that they are so good at, but the outliers will shown the low
frequency events.  This format combines the best qualities of both presentation
styles.

mr



This archive was generated by hypermail 2b29 : Wed Apr 03 2002 - 11:50:09 EST