Last words from Mario (we hope) on data display

From: Mario Roederer (Roederer@Darwin.Stanford.EDU)
Date: Wed Oct 01 1997 - 07:48:40 EST


Alice laid the gauntlet at my feet, and now I will return the favor to all of
those involved in this debate (or who wanted to be involved!).  I will propose
to Cytometry to write a perspective on the topic of FACS data display.  Everyone
now knows my bias against dot plots:  I challenge any of those of you who
champion dot plots (or even color dot plots) to join my effort and write a
"counterpoint" analysis to provide a balancing viewpoint.

This ongoing discussion has been most spirited and, I think, very informative.
I think that we are winding down to repeating ourselves, so I will try to make
this my last words to the mailing list, at least for now!

Calman writes about dot plots:

>   Furthermore it stresses the single cell nature of the
>   data- each dot is a cell.

No, no, no, no, no, no, no!  This is precisely the problem!  In dot plots, each
dot can be one cell, two cells, three cells, or a thousand cells.  You can never
know which.  This is the fundamental error of dot plots.  Once again, the number
of events you display in a dot plot can totally change how it appears--you are
thereby inadvertently massaging the data.

I agree with Alice that the precise way of contouring can affect how the more
frequent populations appear.  This is why the choice of contouring algorithms is
so important!  One of the most robust (in that it is objective, not allowing
"user-defined" contouring levels, etc.) is probability contouring.  This method
of contouring has been adopted by SAS Institute for use in their bivariate
displays--in addition, it is offered by several FACS data analysis packages.

This method of contouring generates displays that are indepedent of the number
of events collected -- something that no other display can do.  Thus, using Dot
plots or color dot plots or user-defined thresholding, I can make a variety of
conclusions about the same sample depending solely on how many events I choose
to collect (or display)!

Jim Houston is 100% correct that the precise method of data display is critical.
I urge  reviewers and editors to demand that this information be included in all
FACS data displays.

Finally, one last word about "raw data."  Let us not delude ourselves into
thinking that dot plots or unsmoothed plots are "raw"--these themselves are
presentations of highly processed data.  Much more raw is the listmode data--why
not publish tables of the basic values, then?  (e.g., "Note how frequent are the
events which have parameter 2 values between 1200 and 1240, and parameter 3
values between 800 and 950.  This suggests...") 

Of course, this is nonsense:  but I bring it up to drive home the point that
these displays are only called "raw" in order to subtly convey the mistaken
impression that the original measurements haven't been tampered with.  Of
course, even the listmode data is only as raw as a well-done steak.  There has
been a lot of signal processing that converts the original photoelectron counts
of the PMT into a computer stored value--there is averaging, smoothing,
background correction, etc., etc.

Once again, this brings us to the fundamental point of data display:  to convey
information accurately to the reader.  I highly recommend a book by Edward
Tufte, "The Visual Display of Quantitative Information," about this topic
(especially to programmers developing analysis packages).  This fabulous book
shows how misleading different styles of graphs can be, and discusses some of
the underlying principles of data display--principles largely ignored by
developers of FACS data analysis programs.

There was some discussion about art vs. science.  Do not mistake artistry for
disinformation!  Of course there is art in science, and in the presentation of
scientific data.  If not, we would only see tables of numbers that would be
incomprehensible--we are, after all, only human.

mr



This archive was generated by hypermail 2b29 : Wed Apr 03 2002 - 11:50:10 EST