colcor

Detecting column dependence when rows are correlated and estimating the strength of the row correlation. Microarray experiments often yield a normal data matrix X whose rows correspond to genes and columns to samples. We commonly calculate test statistics Z=Xw, where Z i is a test statistic for the ith gene, and apply false discovery rate (FDR) controlling methods to find interesting genes. For example, Z could measure the difference in expression levels between treatment and control groups and we could seek differentially expressed genes. The empirical cdf of Z is important for FDR methods, since its mean and variance determine the bias and variance of FDR estimates. Efron (2009b) has shown that if the columns of X are independent, the variance of the empirical cdf of Z only depends on the mean-squared row correlation. Microarray data, however, frequently shows signs of column dependence. In this paper, we show that Efron’s result still holds under column dependence, and give a conservative (upwardly biased) estimator for the mean-squared row correlation. We show Fisher’s transformation for sample correlations is still normalizing and variance stabilizing under column dependence, and use it to construct a permutation-invariant test of column independence. Finally, we argue that estimating the mean-squared row correlation under column dependence is impossible in general. Code to perform our test is available in the R package “colcor,” available on CRAN