As someone who made extensive use of the K-S test in a past life I would like to point out one important issue that is not dealt with in statistics text books, and I think was not explicitly mentioned in Ted Young's excellent paper that introduced K-S to the flow community. The K-S test is typically used to compare two frequency distributions non-parametrically. The number of degrees of freedom that one uses to calculate a p value from a K-S statistic is based upon the number of bins in the frequency distribution histogram. For flow cytometry data one is tempted to use the number of channels, i.e. 256 or 1024 etc. as the degrees of freedom. Doing this will result in any two histograms that are not identical being highly statistically significantly different. In other words using channels as degrees of freedom makes the K-S test ridiculously too sensitive to trivial differences in the histograms. In fact our flow histograms have far fewer degrees of freedom than they have channels. The correct value is based upon the CV of your histogram, i.e. how many distinct distinguishable histograms can you fit into the number of channels that you have? For most of our data we have no way (that I know of) to estimate the correct number of degrees of freedom. For this reason I have always used the K-S statistic as a measure of the difference between two histograms, but have not used it to calculate a p value. I am sure there are others listening to this discussion who are better qualified to discuss how one calculates degrees of freedom for a flow histogram - I for one would be interested in such a discussion. Ken Ault aultk@mail.mmc.org
This archive was generated by hypermail 2b29 : Wed Apr 03 2002 - 11:50:22 EST