As part of some statistical methods research I'm doing in visualization as well as in sensitivity analysis, we've been developing a package for handling flow cytometry data using the open-source statistics package, R (www.r-project.org), which runs on Unix machines, Microsoft Windows, and recent versions of MacOS (9 and higher, though I've heard it might work on the last versions of 8.x). R is very much like S/S-PLUS, and most of the same non-commercial developers (including the inventor/originator of S, upon which S-PLUS is based) are active members of the R core development team. Currently, we've implemented routines for reading data from FCS files, gating the resulting data, constructing a number of visualizations, and for determining differences (i.e. 1 and 2 sample tests). Before we release a version for general public consumption, however, the last sticking issue is in reading FCS files (right now, we've been successful with files from a number of groups (3), but we've had problems with files from a 4th. So what I'm looking for in terms of assistance is not programming time, but example FCS files to aid in the debugging, since if we can't read your data, it'll be really annoying to have to construct text-files just to read back in. Why might you be interested in helping out? The end result will be a platform upon which one can build on R's tools for sophisticated statistical analysis as well as GUI construction to quickly prototype results. The code will be released probably under either the GNU GPL or LGPL (there are lots of technical issues with open source licensing which I'll not go into at this point), and the end product will probably be suitable as a complementary tool to the common commercial flow cytometry software packages. With a bit more help, it may eventually be suitable for production work, as well. So: A. If you would be able to "donate" an anonymized (i.e. source of data removed) FCS file for testing, please send private email with the generating platform, so that I don't end up testing the same thing over and over again, and so that we can arrange for the upload. B. If you can't send files (many valid reasons for this, I know!) but: 1. would be interested in testing the system, 2. are reasonably sophisticated at being able to download and install software, and 3. (most critical!) comfortable with using a command-line interface (Unix shell, DOS window, or similar), please contact me and I'll send a quick set of instructions and the functions to test (a limited version of the R library). We are hoping to release the initial non-testing version by the end of the year. We currently can -- read (most?) FCS files, do programmatic and interactive gating, low dimensional visualizations, 2-sample testing (KS, truncated distribution curves, Prob binning (with K. Baggerly's corrections as well), and with a bit of programming in R, any other routine R can do (heirachical clustering, k-means, clustering of large datasets (Clara, etc), and on summaries of the individual results, one might consider regression (standard, ANOVA, mixed effects models, GEE), smoothing (kernel, loess, splines), etc, etc... What we are currently working on (research into both analytics/informatics as well as statistical methods): making some of those tools easier to work on, designs, novel 2-sample comparisons, high dimensional visualizations and structured grand-tours, and a few other topics related to manuscripts in preparation. What we probably won't work on (but others may): GUIs, database specific interfaces (to Oracle, MySQL, or PostgreSQL relational storage, HDF5 for high-performance flat-file access), extending the currently limited annotation system. (one related non-flow cytometry project that I'm very much involved with, to provide some ideas of where I'd like to take the project eventually, is the Bioconductor project (http://www.bioconductor.org) which is an open source system (statistical analysis and visualization, as well as some annotation) for the analysis of gene expression arrays (affy, spotted cDNA, SAGE)). best, -tony -- A.J. Rossini Rsrch. Asst. Prof. of Biostatistics U. of Washington Biostatistics rossini@u.washington.edu FHCRC/SCHARP/HIV Vaccine Trials Net rossini@scharp.org -------------- http://software.biostat.washington.edu/ ---------------- FHCRC: M: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email UW: Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX (my tuesday/wednesday/friday locations are completely unpredictable.)
This archive was generated by hypermail 2b29 : Sun Jan 05 2003 - 19:26:25 EST