MFI vs %Positive: myths deserve answers

From: Mario Roederer <roederer@drmr.com>
Date: Tue Sep 14 2004 - 20:06:24 EST
In response to a recent posting by a self-trademarked FlowJock:

There's been extensive rigorous analysis of %Positive quantitation 
with dim populations--look it up in the literature, there are a 
number of papers (see bottom of email).  Many third party software 
programs support these types of analyses; for example, FlowJo can 
compare the negative and stained controls using the five 
somewhat-related algorithms devised by Roy Overton, Bruce Bagewell 
("SED"), Cox (Cox chi-square), Kolmogorov and Smirnoff ("K-S"), as 
well as by our own group ("PB" or probability binning).  In general, 
these algorithms agree quite well--although the Cox, KS and PB 
methods aren't strictly %Positive quantifiers (but can be adapted to 
be such).

In my opinion, for %Positive, the best is the SED algorithm.  This 
has significant support both from basic mathematical principles as 
well as from empirical analyses.  The PB method has the unique facet 
that it can be used to "gate" on cells that are different (e.g., 
positive) in one or more dimensions.

Now let's dispense with a few myths that you brought up!

>... When the whole peak shifts, the whole population is brighter 
>than the Negative control population.	That means it's 100% positive 
>- including those dim cells in the 'positive' peak that aren't as 
>bright as the bright cells in the negative peak.

That's not really true, for a number of reasons.  First, you haven't 
defined what you mean by "peak".  If you mean the "mode" (which is 
what most people mean when they think of peaks), then it's not at all 
true; the mode can be significantly influenced by changes in 
underlying representations of positive and negative; an increase in 
the mode does not mean that the cells are 100% positive.  Even if the 
"whole peak" (by which I assume you mean the bottom percentile as 
well as the top percentile) moves, this does not indicate that all 
the cells are positive!  Consider the simple example of an unstained 
population that is actually comprised of two sets of cells: A & B, 
where "B" cells have slightly more autofluorescence than "A", but the 
mixture doesn't resolve and appears to be a single peak.  After 
staining, all of the "A" cells become slightly positive, and are 
slightly brighter than the B cells, which are still negative.  Again, 
the distribution doesn't resolve into 2 peaks.

That's a simple case where half of the population is staining, yet 
"the whole peak shifts"!  Is this a trivial example?  Certainly not: 
Within lymphocytes, B cells and T cells have different 
autofluorescence levels--so if you were to stain one population only 
with a dim reagent, you might mistakenly conclude that all of the 
lymphocytes express that antigen!

>Back to the 'small differences' case:	If your negative control is 
>in one location, and the negative cells in the test sample are in 
>the same location but there are a few bright cells, then you can use 
>frequency analysis to get the percentage of those positive cells 
>(use a 2-parameter plot and a polygon region - NEVER a histogram).

Well, that's a blanket statement that I must also disagree with!  Why 
"NEVER" use a histogram?  Admittedly, I use bivariate plots often to 
gate essentially one a one dimensional expression.  But I do so with 
guilty pleasure.

The claim appears to be that you can better separate the dim 
positives from the negatives on a bivariate display.  And this is 
visually supported in many cases.  However, this is purely a visual 
artifact!  It's magic!	It's not mathematically true!  In fact 
it's.... myth!

EXCEPT when there is a relationship between the expression of the dim 
marker and the measurement on the other axis (and there often 
is--particularly with something like SS or FS, when there is a 
size-dependence).  If that's the case, then you can't use just any 
bivariate display, you must use the bivariate display of your 
interesting marker against the parameter which provides additional 
information.

If the other parameter in the bivariate display is not mathematically 
related to the measurement marker, then there is no scientific basis 
for stating that the resolution of the dim cells (ability to gate) is 
better in a bivariate display.	And yes, I'd be happy to follow this 
up with real math if necessary.  One thing to consider is that dot 
plots are heavily influenced by the number of events you collect. 
Pretend that you had collected a trillion events instead of 10,000 -- 
all of a sudden, the distinction on that bivariate plot has 
disappeared!  (And yet, the histogram looks no different).

Furthermore, the assertion that gating on bivariate plots is better 
than on histograms belies the underlying assumption that the gating 
is completely subjective!  Don't be misled by the typical elliptical 
(or circular) distribution of events in a bivariate display of 
uncorrelated parameters -- this does not help you identify boundaries 
any better than from a histogram, except in a subjective manner.

Of course, there's nothing wrong with subjectively placing gates, as 
long as you are aware that this is the case.  But if your are 
concerned about accurately estimating %Positive, then certainly any 
subjectivity in gate placement must be removed.

Incidentally, the algorithms referenced above do an excellent job of 
estimating %Positive, whether the expression is bright OR dim. 
Manual gating fails miserably if there's no defined separation.

>   If you have brighter events AND your negative peak moves up, you 
>either have 100% positivity in your sample (with 'dims' and 
>'brights') OR your negative control isn't working properly and you 
>only have a few bright positive events. 

OR... your negative control works just fine, it's just that the stain 
has some nonspecific binding on the nonexpressing cells!  Oh wait -- 
this means your negative control isn't an adequate control... but 
then, that's almost always true.  It's nearly impossible to have the 
perfect negative control.  (And please, don't even get me started on 
isotype "controls" -- something I want to rename as "isotype 
uncontrols").

Nonetheless, the point is that there are lots more possibilities than 
the two you mention.

>PS-The training videos will be available in October.

Well, great! ... but I hope they carry a bit more rigorous 
explanations than your original response...  Perhaps the 
self-assignment of the moniker "FlowJock" is a bit premature.  (PS, I 
sincerely hope you don't try to claim a trademark on a term that has 
been in general use by the community for many years--that would be a 
waste of effort and community good will).

mr (you may consider me as an untrademarked FlowJock)


1) Overton WR. Modified histogram subtraction technique for analysis 
of flow cytometry data. Cytometry. 1988 Nov;9(6):619-26.

  3) Roederer M, Treister A, Moore W, Herzenberg LA. Probability 
binning comparison: A metric for quantitating univariate distribution 
differences. Cytometry. 2001 Sep 1;45(1):37-46.

  4) Roederer M, Moore W, Treister A, Hardy RR, Herzenberg LA. 
Probability binning comparison: a metric for quantitating 
multivariate distribution differences. Cytometry. 2001 Sep 
1;45(1):47-55.

  5) Roederer M, Hardy RR. Frequency difference gating: A multivariate 
method for identifying subsets that differ between samples. 
Cytometry. 2001 Sep 1;45(1):56-64.

6) Cox C, Reeder JE, Robinson RD, Suppes SB, Wheeless LL. Comparison 
of frequency distributions in flow cytometry. Cytometry. 1988 
Jul;9(4):291-8.
Received on Wed Sep 15 15:06:28 2004

This archive was generated by hypermail 2.1.8 : Sat Sep 25 2004 - 03:12:06 EST