Howard: You seemed a bit steamed in your EMail... so it is with trepidation that I step into the foray. Hopefully you won't be "mean" to me. >...the software gives you the geometric >mean because it is easy to compute, i.e., it doesn't require transformation >of the data between log and linear space, while computing the real mean >would. Most software doesn't give you geometric mean because it's "easy", it does so because it tends to be the most useful statistic. The transformation itself is trivial and does not impact on programming demand, and it sure doesn't slow down today's computers noticeably. Any software package capable of software compensation clearly has the capability of computing arithmetic means... >TAKING THE RATIO OF TWO QUANTITIES ON A LOG SCALE IS SO STUPID THAT ANYONE >INCLUDING SUCH A CALCULATION IN A PAPER SUBMITTED TO A JOURNAL SHOULD BE >BANNED FOR A YEAR FROM SUBMITTING ANOTHER PAPER - but, lucky for so many >people, most of the reviewers and editors, even those associated with some >really toney journals, are blissfully unaware just how stupid it is. It is important to clarify that what is stupid is ratioing the channel values of a log scale--i.e., ratioing values that increase linearly on a logarithmic fluorescence scale. There is nothing inherently wrong with ratioing the "scale" values (those that increase exponentially). For example, if a population has a median fluorescence of 10,000 (4th decade, channel 1024) and another has a median fluorescence of 1,000 (3rd decade, channel 768), then it is appropriate to note that it is 10 times as bright as the other (not 1.33 times). We used ratios when it was appropriate: for example, when we were measuring the fold-increase of beta-gal activity driven by a promoter after stimulation (measured by FACS-Gal assay). The pre-stimulation condition still expressed considerable beta-gal; when stimulated, we got 5x or 10x as much. The ratio of the median fluorescences was appropriate because we found that the RATIO of the post- to pre-stimulation values was conserved across different cell lines, although they had different basal expression levels (and therefore different stimulated levels). This was interesting scientifically--says something about the log-responsiveness of promoters and enhancers... but I digress. Note that it is rarely correct to ratio against the median or mean autofluorescence--rather, as you point out, subtraction is superior for such a case. >If you are actually trying to compare flow data with a bulk assay of some >kind - for example, you have determined the total amount of fluorescent >label in a solution of 100,000 cells, and you now want to calibrate the >flow cytometric fluorescence histogram in terms of molecules of label per >channel - you do need to use the arithmetic mean, as Alice Givan recently >pointed out, and you therefore need linear data, while you usually have log >data. Actually, we found an very good correlation between the MEDIAN fluorescence of a population of cultured cells expressing b-galactosidase (measured by the FACS-Gal assay) and the total b-gal content of the population by a biochemical assay (MUG). Of course, this was because the populations were relatively homogeneous (clonal), with about 1-decade range in fluorescence--for heterogeneous expressions, the median was not a very good correlate of the biochemical activity. This actually raises the most important point that everyone seems to be dancing around but ignoring: using any statistic is good as long as you justify (to yourself, and to the reviewers) that it is appropriate! In other words, if your cell population is homogeneous, then nearly anything will work. If it's not homogeneous, then you may have a lot of trouble with any single statistic. While I agree that the arithmetic mean is probably going to be the closest for heterogeneous populations, I disagree that it should be used! The fact that the population cannot be effectively described by the median means that there is an interesting heterogeneity underlying the expression--and therefore it becomes a mistake to reduce the data to a single value. After all, this is where the power of flow cytometry is: in the description of the DISTRIBUTION of expression. The fact that people continue to take pains to reduce our gloriously rich and detailed data to a single number pains me to no end! Much better would be to calculate the 10th, 25th, 50th (median), 75th, and 90th percentiles of a complex distribution: at least now you have 5 parameters to the distribution and therefore a much better chance of accurately describing it (and possibly discovering underlying phenomena hidden by using only a single value). Here's my bottom line for log distributions: "If the median is an inadequate description of the distribution, then it is inappropriate to reduce the distribution to a single value by any algorithm." In such a case, using the arithmetic mean, geometric mean, Mario's mean, Howard's mean, or even God's mean (should that actually differ from Howard's) won't be any better and is only throwing mud onto a beautiful painting of data. mr (PS: Mario's never mean.)
This archive was generated by hypermail 2b29 : Wed Apr 03 2002 - 11:57:31 EST