Re: Kolmogorov-Smirnov

From: Kenneth Ault (AULTK@MAIL.MMC.ORG)
Date: Tue Nov 25 1997 - 10:03:17 EST


As someone who made extensive use of the K-S test in a past life I would
like to point out one important issue that is not dealt with in
statistics text books, and I think was not explicitly mentioned in Ted
Young's excellent paper that introduced K-S to the flow community.

The K-S test is typically used to compare two frequency distributions
non-parametrically.  The number of degrees of freedom that one uses to
calculate a p value from a K-S statistic is based upon the number of
bins in the frequency distribution histogram.  For flow cytometry data
one is tempted to use the number of channels, i.e. 256 or 1024 etc. as
the degrees of freedom.  Doing this will result in any two histograms
that are not identical being highly statistically significantly
different.  In other words using channels as degrees of freedom makes
the K-S test ridiculously too sensitive to trivial differences in the
histograms.
   In fact our flow histograms have far fewer degrees of freedom than
they have channels.  The correct value is based upon the CV of your
histogram, i.e. how many distinct distinguishable histograms can you fit
into the number of channels that you have?  For most of our data we have
no way (that I know of) to estimate the correct number of degrees of
freedom.
   For this reason I have always used the K-S statistic as a measure of
the difference between two histograms, but have not used it to calculate
a p value.  I am sure there are others listening to this discussion who
are better qualified to discuss how one calculates degrees of freedom
for a flow histogram - I for one would be interested in such a
discussion.

Ken Ault
aultk@mail.mmc.org



This archive was generated by hypermail 2b29 : Wed Apr 03 2002 - 11:50:22 EST