James G. Booth, Ph.D.

Professor of Statistics

University of Florida

Gainesville, Florida

 

 

Sorting periodically-expressed genes using microarray data: statistical analysis of the yeast cell-cycle data

 

Authors:  Jim Booth, George Casella, Janice Cooke, and John Davis

 

 

We reanalyzed three publicly available cDNA microarray data on yeast, with the goal of identifying and classifying genes that are cell cycle-regulated. The data consist of time series from several experiments involving different synchronization methods.  We argue that the Fourier analysis of Spellman et al. (2000) can be framed in terms of multiple linear regressions using cosine and sine terms as covariates.  The statistical significance of genes within each data set can be assessed using the standard F-test based on the R-squared goodness-of-fit criterion. We suggest combining P-values from different datasets using Fisher’s method. This approach leads to a list of over 1000 significant genes at the 1% level, which includes over 90% of 104 genes known a priori to be cell cycle-regulated. There is considerable disagreement between this list and that obtained by Spellman et al. using their CDC score method.

 

As a second part of our analysis we propose a classification scheme based on the estimated phases of the 104 genes with known function. Finally, we show that a proposed method based on singular value decomposition of the microarray data matrix is essentially equivalent to our multiple linear regression approach.