Re: Bad Flow Data & reviewing -- What can we do?

From: Ray Hicks (rayh@fcspress.com)
Date: Wed Oct 17 2001 - 20:42:51 EST


Many good points Mario, but I'm going to take you back a few years to our
discussion on dot plot versus contour, and how misleading contours are.  I'd
reverse your logic in " remember that contour plots are also
histograms (2D histograms), and they have no numbers on the "Z" axis
corresponding to event frequency.  Why should univariate histograms have
them?", and suggest that contour plots need even more annotation.

I'm sorely tempted to attach a few figures to this e-mail, but I've
restrained myself, and made them available at:
http://www.fcspress.com/seeWhatIMean.gif
and
http://www.fcspress.com/512AlongTheAxis.gif

The first <http://www.fcspress.com/seeWhatIMean.gif> shows how strikingly
different contour plots of the same data can be (the data is from the FlowJo
tutorial set, the figures are made in FlowJo 3.2 and FCSPress 1.3).  The top
left dotplot is from FlowJo, and shows the crowding you object to, the upper
central plot is FlowJo's default contour plot of SSCvFSC with ten thousand
cells, the upper right plot is a plot of 1600 cells gated from the same file
- doesn't look like fewer cells does it?
The lower left plot is a log 50% contour plot of the data in the top left
and top centre plots, what is one to make of those contours based on four
cells that jump out in the lower left?  The lower central plot is a dot plot
from FCSPress, plotting data at 512 points along the axis (the data has a
range of 512 "channels"), FCSPress has dithered the plot to represent how it
would  (and does) print on a printer which isn't limited to screen
resolution (using the "clarify option), you'll notice that using higher
resolution avoids much of the coalescing to a black blob that you object to
in dot plots (the second figure,
<http://www.fcspress.com/512AlongTheAxis.gif>, shows this graph at full size
with no dithering) . The lower right plot shows a density plot from FlowJo,
the smoothing belies the sparsity of the data.

What's an expert to do when presented with this kind of thing?  Would
labelling the upper left and lower left plots as having the same number of
cells be enough to make you see them as representing the exact same data
set? The dot plot of 1600 cells (not shown for brevity) clearly has fewer
cells than that of 10000, and does a better job of warning the viewer,
expert or not, of how confident they should be in making conclusions based
on the plot than numbering the events on the two contour plots (upper left
and upper right).

Oh, alright then, I've put a further figure up with two dot plots and two
contour plots with paired numbers of events at:

http://www.fcspress.com/nowDoYouSee.gif

The other issue I take is; how is the collective going to select the
experts? Surely the people who are publishing this stuff ARE people "with a
modicum of experience in flow". Putting the responsibility on editorial
boards is probably going to end up in a status quo.  How about pressuring
your lab-fellows to sling the FACS aspect of papers, that they're reviewing
anyway, in your direction?

Ray

ps as an aside, there's something freaky happening on the axes of these
graphs - they're 512 channel data, but the linear FSC axis runs out just
past 200, and one of the events exceeds the maximum for side scatter (ie the
one that juumps above the red line in the left hand plots - has this been
fixed in later versions of FlowJo?  Would this be something an expert could
criticise/reject a paper for?


> From: "Roederer, Mario  (VRC)" <MarioR@mail.nih.gov>
> Date: Tue, 16 Oct 2001 13:00:05 -0400
> To: Cytometry Mailing List <cytometry@flowcyt.cyto.purdue.edu>
> Subject: Bad Flow Data & reviewing -- What can we do?
>
>
> This topic strikes a nerve with many of us.  Indeed, ISAC did at one point
> have the decent notion to have a committee on "data presentation standards"
> or something like that.  I remember seeing something at Montpellier--a
> pamphlet on presentation, I think.  Since then, I haven't heard about the
> progress of this committee.  I made a number of suggestions on the
> committee's effort, as it was a reasonable start, but don't know if that had
> any affect.  Indeed, even this pamphlet had a number of mistaken notions,
> showing how ingrained things can get even within the community.
>
> For example, there was the suggestion that we should always put numbers on
> the Y axis of a univariate histogram ("# of cells").  In reality, these
> numbers are meaningless--they depend on the resolution with which the data
> is binned, which can vary from program to program and instrument to
> instrument.  The reasoning was that the only way to compare histograms was
> to have these numbers to ensure that the data was interpreted properly.
> However, this is a misconception--in reality, the peak height in a histogram
> is rarely meaningful; it is the peak area which carries meaning.  What is
> necessary in a histogram presentation is to identify how many cells were
> collected (and displayed in the histogram), and, if any peak in the
> histogram is cut off, to identify what fraction of the vertical scale is
> shown.  I.e., the only thing worth putting on the Y axis label is "% max",
> where "max" is the maximum peak height.  Admittedly, many of my papers have
> the meaningless numbers on the axis...  but I'm still learning...
>
> I am sure that even this little discussion may set off a minor
> firestorm--and that's probably good: it will be educational, which is the
> main point of this list!  (By the way, remember that contour plots are also
> histograms (2D histograms), and they have no numbers on the "Z" axis
> corresponding to event frequency.  Why should univariate histograms have
> them?)
>
> Jim Houston asks about the needed information for histograms or dot
> plots--always, the minimum information is the number of events displayed.
> (And yes, I am guilty of not always putting that information in my own
> publications.)  I still strongly advocate against dot plots; there are much
> more informative displays available.
>
> But the point of this email is not to address the specific defects in data
> presentation, nor even to start to lay them out.  That, in fact, would be
> better done in a book.
>
> Both Jim and Robert Zucker bring up the lack of the Community's involvement
> in peer review.  It is worth noting that JAMA requires every paper to be
> reviewed by a statistician, outside of the normal review.  Why not have the
> same thing for every flow paper?  It seems that the major publications
> should require an expert to review papers containing FACS
> presentations/analyses for appropriateness.  But it won't happen: if we
> can't even police our own Journals to ensure appropriate data presentation,
> then what makes anyone think we have the competence to do so for other
> Journals?
>
> Some years ago, a few of us bantied around an idea of "post-publication"
> review of articles that would be placed online.  The concept was as follows:
> each major journal would be assigned to one or two expert reviewers.  Each
> issue would be examined for articles that had flow cytometry in them, and
> then the reviewer would go over the paper with a predefined list of
> criteria.  The review would explicitly avoid any judgment about the paper's
> conclusions; it would only address whether the flow cytometric analyses were
> properly presented, interpreted, and then to note what additional
> information is required, what possible artifacts need to be eliminated, etc.
> The review process would be fundamentally based on a checklist (e.g., "was
> cell viability assessed?", "what staining controls were performed?", "is the
> data properly compensated?", "did the authors note how many events were
> displayed?", "are the statistical intreprations of low event counts
> appropriate?" etc. etc.... I could envision a 100-item list).  There would
> be "sub-lists" for different types of flow, like "cell cycle",
> "immunophenotyping", "intracellular detection", and "it's obvious I dropped
> my samples off at my local core facility, didn't tell them what was in each
> tube, forgot my controls anway, had them generate a few graphs for me, and
> then xeroxed them until the dots I didn't like went away, so don't blame me
> because I can't understand the difference between a contour plot and a
> photomultiplier tube."  The reviews would be posted on-line.
>
> The idea of the online post-publication review is that the general
> scientific community, when reviewing an article, could turn to the web site
> and quickly see if there are major problems with the technology that they
> might not appreciate because of the subtleties.  Since the criteria would
> all be published online as well, the goal would be that authors would start
> turning to this site before publication in order to better present data,
> rather than seeing criticisms of their papers show up afterwards.  Authors
> might be allowed to appeal aspects of a review that they feel are
> inappropriate, thereby providing an ongoing evolution of the evaluation
> process.  There might even be a manuscript pre-review service where authors
> could ensure appropriateness before submitting for review.
>
> What would this require?  No more than a one or two dozen FACS-savvy people
> to volunteer for this public service. Anyone with a modicum of experience in
> flow would be excellent for this; in fact, it's probably better to recruit
> younger (less jaundiced) people for the process. In reality, the review
> process would be very rapid, since these are not detailed reviews aimed at
> the science of the paper, but only at the data presentation.  I was so hot
> on this idea (now 2 years old) that I even registered a domain for its use
> (http://www.sciwatch.org)--a registration I renew in the hopes that
> something might actually come of it.
>
> In my idealistic vision, eventually journals would turn to the Flow
> community to do this as a standard of practice rather than have it go on
> post-publication.  Journals might even adopt the standard data presentation
> requirements.  People might actually publish FACS data that we can believe.
>
> But maybe we need to start at home first.  I'd like to suggest that
> Cytometry and Clinical Communications in Cytometry both make an editorial
> decision to require all published papers to come up to some minimum
> acceptable standard.  If these journals make the commitment, then perhaps
> there will be enough motivation for a document outlining these procedures to
> be put together.  However much it makes sense, I do not suggest that this be
> done by a committee under the auspices of ISAC, since that effort has
> essentially failed, principally through inaction.  Rather, I think the
> Editorial Boards should empower a group to put such a document together. If
> such an effort works, it can serve as a model for other journals to adopt.
>
> mr
>


> From: "Roederer, Mario  (VRC)" <MarioR@mail.nih.gov>
> Date: Tue, 16 Oct 2001 13:00:05 -0400
> To: Cytometry Mailing List <cytometry@flowcyt.cyto.purdue.edu>
> Subject: Bad Flow Data & reviewing -- What can we do?
>
>
> This topic strikes a nerve with many of us.  Indeed, ISAC did at one point
> have the decent notion to have a committee on "data presentation standards"
> or something like that.  I remember seeing something at Montpellier--a
> pamphlet on presentation, I think.  Since then, I haven't heard about the
> progress of this committee.  I made a number of suggestions on the
> committee's effort, as it was a reasonable start, but don't know if that had
> any affect.  Indeed, even this pamphlet had a number of mistaken notions,
> showing how ingrained things can get even within the community.
>
> For example, there was the suggestion that we should always put numbers on
> the Y axis of a univariate histogram ("# of cells").  In reality, these
> numbers are meaningless--they depend on the resolution with which the data
> is binned, which can vary from program to program and instrument to
> instrument.  The reasoning was that the only way to compare histograms was
> to have these numbers to ensure that the data was interpreted properly.
> However, this is a misconception--in reality, the peak height in a histogram
> is rarely meaningful; it is the peak area which carries meaning.  What is
> necessary in a histogram presentation is to identify how many cells were
> collected (and displayed in the histogram), and, if any peak in the
> histogram is cut off, to identify what fraction of the vertical scale is
> shown.  I.e., the only thing worth putting on the Y axis label is "% max",
> where "max" is the maximum peak height.  Admittedly, many of my papers have
> the meaningless numbers on the axis...  but I'm still learning...
>
> I am sure that even this little discussion may set off a minor
> firestorm--and that's probably good: it will be educational, which is the
> main point of this list!  (By the way, remember that contour plots are also
> histograms (2D histograms), and they have no numbers on the "Z" axis
> corresponding to event frequency.  Why should univariate histograms have
> them?)
>
> Jim Houston asks about the needed information for histograms or dot
> plots--always, the minimum information is the number of events displayed.
> (And yes, I am guilty of not always putting that information in my own
> publications.)  I still strongly advocate against dot plots; there are much
> more informative displays available.
>
> But the point of this email is not to address the specific defects in data
> presentation, nor even to start to lay them out.  That, in fact, would be
> better done in a book.
>
> Both Jim and Robert Zucker bring up the lack of the Community's involvement
> in peer review.  It is worth noting that JAMA requires every paper to be
> reviewed by a statistician, outside of the normal review.  Why not have the
> same thing for every flow paper?  It seems that the major publications
> should require an expert to review papers containing FACS
> presentations/analyses for appropriateness.  But it won't happen: if we
> can't even police our own Journals to ensure appropriate data presentation,
> then what makes anyone think we have the competence to do so for other
> Journals?
>
> Some years ago, a few of us bantied around an idea of "post-publication"
> review of articles that would be placed online.  The concept was as follows:
> each major journal would be assigned to one or two expert reviewers.  Each
> issue would be examined for articles that had flow cytometry in them, and
> then the reviewer would go over the paper with a predefined list of
> criteria.  The review would explicitly avoid any judgment about the paper's
> conclusions; it would only address whether the flow cytometric analyses were
> properly presented, interpreted, and then to note what additional
> information is required, what possible artifacts need to be eliminated, etc.
> The review process would be fundamentally based on a checklist (e.g., "was
> cell viability assessed?", "what staining controls were performed?", "is the
> data properly compensated?", "did the authors note how many events were
> displayed?", "are the statistical intreprations of low event counts
> appropriate?" etc. etc.... I could envision a 100-item list).  There would
> be "sub-lists" for different types of flow, like "cell cycle",
> "immunophenotyping", "intracellular detection", and "it's obvious I dropped
> my samples off at my local core facility, didn't tell them what was in each
> tube, forgot my controls anway, had them generate a few graphs for me, and
> then xeroxed them until the dots I didn't like went away, so don't blame me
> because I can't understand the difference between a contour plot and a
> photomultiplier tube."  The reviews would be posted on-line.
>
> The idea of the online post-publication review is that the general
> scientific community, when reviewing an article, could turn to the web site
> and quickly see if there are major problems with the technology that they
> might not appreciate because of the subtleties.  Since the criteria would
> all be published online as well, the goal would be that authors would start
> turning to this site before publication in order to better present data,
> rather than seeing criticisms of their papers show up afterwards.  Authors
> might be allowed to appeal aspects of a review that they feel are
> inappropriate, thereby providing an ongoing evolution of the evaluation
> process.  There might even be a manuscript pre-review service where authors
> could ensure appropriateness before submitting for review.
>
> What would this require?  No more than a one or two dozen FACS-savvy people
> to volunteer for this public service. Anyone with a modicum of experience in
> flow would be excellent for this; in fact, it's probably better to recruit
> younger (less jaundiced) people for the process. In reality, the review
> process would be very rapid, since these are not detailed reviews aimed at
> the science of the paper, but only at the data presentation.  I was so hot
> on this idea (now 2 years old) that I even registered a domain for its use
> (http://www.sciwatch.org)--a registration I renew in the hopes that
> something might actually come of it.
>
> In my idealistic vision, eventually journals would turn to the Flow
> community to do this as a standard of practice rather than have it go on
> post-publication.  Journals might even adopt the standard data presentation
> requirements.  People might actually publish FACS data that we can believe.
>
> But maybe we need to start at home first.  I'd like to suggest that
> Cytometry and Clinical Communications in Cytometry both make an editorial
> decision to require all published papers to come up to some minimum
> acceptable standard.  If these journals make the commitment, then perhaps
> there will be enough motivation for a document outlining these procedures to
> be put together.  However much it makes sense, I do not suggest that this be
> done by a committee under the auspices of ISAC, since that effort has
> essentially failed, principally through inaction.  Rather, I think the
> Editorial Boards should empower a group to put such a document together. If
> such an effort works, it can serve as a model for other journals to adopt.
>
> mr
>



This archive was generated by hypermail 2b29 : Sun Jan 05 2003 - 19:01:35 EST