Data displays

From: Mario Roederer (Roederer@Darwin.Stanford.EDU)
Date: Fri Sep 26 1997 - 11:04:44 EST


Ah, Ken, I don't know if you realize what a can of worms you are opening!  But,
it's always my pleasure to go fishing.

>   Being a relative new-comer to flow 
>   cytometry, I don't see what is wrong with dot plots.
>   Please educate  me.  If dot plots are "completely
>   inappropriate", is there ever a  situation when they
>   would be appropriate to use?

First, it's important to understand the underlying principle of flow cytometric
data presentation:  that is, to convey to the reader the underlying biological
process.  Thus, we should use graphs which are best at conveying such
information while not being misleading by de-emphasizing or over-emphasizing
certain attributes.

Note that this does not require the most "accurate" rendition of data (i.e.,
unsmoothed data).  As has been discussed before, appropriate smoothing does not
alter the data, it simply makes it more understandable.  (However, you must be
wary of smoothing algorithms: some simply smear the data out without regard to
local-area precision.  This is why the use of variable-kernel width smoothing is
important.  Ask the people who provided your data analysis package if they use
such smoothing, or if they simply smooth by averaging data--which DOES "change"
the data!).

Back to dot plots.  While dot plots are pretty understandable, they
under-emphasize the frequecies of populations with many events.  Just consider
the case where you collect 10,000,000 events:  what would this dot plot look
like?  Basically, a big smear--you would be unable to know where the populations
are, much less be able to judge how frequent cells within a population are.
Note also that if you were to collect a 10,000 event file and a 50,000 event
file of the same sample, the dot plots from each sample may look quite different
(since all of a sudden the rarer populations would have five times as many dots
much more frequent)!  And that means that one of the plots must be misleading
the reader (if not both).

So, most people try to get around this by showing exactly the same number of
dots (usually, around 10,000) in a plot.  But this is still a problem; if the
events are highly clustered, you still have no way to estimate population
frequencies.

Bottom line:  monochromatic dot plots, while often conveying information, just
as often hide important information.  Many people who've been in flow for a long
time can attest to the number of published FACS data which are misinterpreted
because of the use of dot plots.

The answer is to use graphs which accurately convey population density
information.  One is to use dot plots in which the color of each dot conveys the
information regarding density (like what Howard suggested).  However, it is
still difficult for most people to easily guess the population frequencies by
looking at these graphs.  

Contour plots are also an excellent way to convey density information.  When an
appropriate contouring algorithm is used, such as "probability contours", then
readers can readily guess at population frequencies, even if they are not
experienced analyzers.  The downside of contours is that you miss rare
populations that fall outside the last contour line.  Of course, this is easily
remedied by displaying outlying events as dots, combining the best attributes of
contour plots (density estimation) with dot plots (labelling every event).

The only time you should even consider using dot plots is to reveal rare
populations (and don't care about the frequencies of the others).  However, as
long as your data analysis program can draw contour plots with outliers, there
is never a need for dot plots!  Color dot plots can also substituted.

mr



This archive was generated by hypermail 2b29 : Wed Apr 03 2002 - 11:50:10 EST