Assistance wanted to debug open source (statistical) FACS software

From: A.J. Rossini (rossini@blindglobe.net)
Date: Thu Sep 19 2002 - 06:41:46 EST


As part of some statistical methods research I'm doing in
visualization as well as in sensitivity analysis, we've been
developing a package for handling flow cytometry data using the
open-source statistics package, R (www.r-project.org), which runs on
Unix machines, Microsoft Windows, and recent versions of MacOS (9 and
higher, though I've heard it might work on the last versions of 8.x).
R is very much like S/S-PLUS, and most of the same non-commercial
developers (including the inventor/originator of S, upon which S-PLUS
is based) are active members of the R core development team.

Currently, we've implemented routines for reading data from FCS files,
gating the resulting data, constructing a number of visualizations,
and for determining differences (i.e. 1 and 2 sample tests).

Before we release a version for general public consumption, however,
the last sticking issue is in reading FCS files (right now, we've been
successful with files from a number of groups (3), but we've had
problems with files from a 4th.  So what I'm looking for in terms of
assistance is not programming time, but example FCS files to aid in
the debugging, since if we can't read your data, it'll be really
annoying to have to construct text-files just to read back in.

Why might you be interested in helping out?  The end result will be a
platform upon which one can build on R's tools for sophisticated
statistical analysis as well as GUI construction to quickly prototype
results.  The code will be released probably under either the GNU GPL
or LGPL (there are lots of technical issues with open source licensing
which I'll not go into at this point), and the end product will
probably be suitable as a complementary tool to the common commercial
flow cytometry software packages.  With a bit more help, it may
eventually be suitable for production work, as well.

So:

A. If you would be able to "donate" an anonymized (i.e. source of data
   removed) FCS file for testing, please send private email with the
   generating platform, so that I don't end up testing the same thing
   over and over again, and so that we can arrange for the upload.

B. If you can't send files (many valid reasons for this, I know!) but:

   1. would be interested in testing the system,
   2. are reasonably sophisticated at being able to download and install
      software, and
   3. (most critical!) comfortable with using a command-line interface
      (Unix shell, DOS window, or similar),

   please contact me and I'll send a quick set of instructions and the
   functions to test (a limited version of the R library).  We are
   hoping to release the initial non-testing version by the end of the
   year.

We currently can -- read (most?) FCS files, do programmatic and
interactive gating, low dimensional visualizations, 2-sample testing
(KS, truncated distribution curves, Prob binning (with K. Baggerly's
corrections as well), and with a bit of programming in R, any other
routine R can do (heirachical clustering, k-means, clustering of large
datasets (Clara, etc), and on summaries of the individual results, one
might consider regression (standard, ANOVA, mixed effects models,
GEE), smoothing (kernel, loess, splines), etc, etc...

What we are currently working on (research into both
analytics/informatics as well as statistical methods): making some of
those tools easier to work on, designs, novel 2-sample comparisons,
high dimensional visualizations and structured grand-tours, and a few
other topics related to manuscripts in preparation.

What we probably won't work on (but others may): GUIs, database
specific interfaces (to Oracle, MySQL, or PostgreSQL relational
storage, HDF5 for high-performance flat-file access), extending the
currently limited annotation system.

(one related non-flow cytometry project that I'm very much involved
with, to provide some ideas of where I'd like to take the project
eventually, is the Bioconductor project (http://www.bioconductor.org)
which is an open source system (statistical analysis and
visualization, as well as some annotation) for the analysis of gene
expression arrays (affy, spotted cDNA, SAGE)).

best,
-tony

--
A.J. Rossini				Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics		rossini@u.washington.edu
FHCRC/SCHARP/HIV Vaccine Trials Net	rossini@scharp.org
-------------- http://software.biostat.washington.edu/ ----------------
FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW:   Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
(my tuesday/wednesday/friday locations are completely unpredictable.)



This archive was generated by hypermail 2b29 : Sun Jan 05 2003 - 19:26:25 EST