Descriptive statistics

From: Sergey Dzekunov (sergeyd@maxcyte.com)
Date: Thu May 23 2002 - 12:57:44 EST


I am a new member of the discussion group and would like to address a few
questions of general nature.

1) I have noticed in the previous messages that when talking about
descriptive statistics of FC data,
some people refer to the Kolmogorov-Smirnov test as the tool to determine
which distribution better
describes a data set. I am not sure about specific implementations of this
test, but am afraid that it
applies to continuous distribution functions, whereas binned distributions
should be challenged by
the Chi-square test for the same purpose.

2) Another common task is comparison of distribution means. To generalize on
the first comment, I'd like
to share with the group the following logistics that I was glad to find in
this famous book: "Numerical recipes in C. The art
of scientific computing." Second edition, Cambridge U. press, 1988-1992,
ISBN 0521431085
Here is the scheme:
Q: Do two samples have different means?
A: Prior to comparing the averages, one should do the following
	Step 1: Run Chi-square test to see if the two distributions are different.
		"No" -- go to step 2. "Yes" -- go to step 5.
	Step 2: F-test to find if the two data sets have the same variances.
		"Yes" -- go to step 3. "No" -- go to step 4.
	Step 3: t-test to see if two samples have the same means (or significantly
different ones).
		Done.
	Step 4: Use the Unequal-variance t-test, but be careful with the
distributions which are
		substantially different in shape. With this in mind, consider it done.
	Step 5: This situation is very likely to be identical to the "peanuts and
oranges" one.
		Since the distributions are different, the conclusions about any
differences in their means
		may be quite speculative.

3) This part refers to the procedure of determining "percent positive"
cells, which I think everyone
working with flow cytometry has to deal with at least sometimes. Although
there is little if any of statistical
power in this parameter, it is widely used in validation and comparison of
assays and is almost universal
in reporting results of gene transfection/expression.
The "Percent positive" is determined as the number of cells above a
threshold arbitrarily established on a
control distribution. Such thresholding has no ability to distinguish
between the cells that have increased
their fluorescence either by one percent of ten-fold -- as long as both have
crossed the threshold.
Nevertheless, given the popularity and simplicity of this parameter, it
would be worthwhile to standardize
it in some fashion. I have put together a simple algorithm that is capable
of computing a single number for
"percent positive" which is immune to the influence from the user. However,
prior to splitting hairs
on others and posting this algorithm, I would like to know if someone has
already tried to solve the same problem.
What does the discussion group think about it? Has any FC society ever
posted some guidelines on certain
algorithms and specifically thresholding?

I highly regard flow cytometry as an "intuitive" and "artistic" technique
which has as much power as a researcher
is capable to make use of -- I must explicitly say this in respect of
talented scientists who think far and wide
when looking at the data. However, as a biophysicist I can't help it but try
to bring more sense into the numbers
as long as those are such a big part of our scientific language :)
I sincerely wish that articles like this one appeared more often:
Durand R.E. Calibration of Flow Cytometer Detection Systems. Methods in Cell
Biology, Acad.Press 1990, Vol.33 p.647

Sincerely,
Sergey M. Dzekunov
MaxCyte, Inc. Rockville, MD



This archive was generated by hypermail 2b29 : Sun Jan 05 2003 - 19:26:10 EST