Adaptive choice of the number of bootstrap samples in large scale multiple testing It is a common practice to use resampling methods such as the bootstrap for calculating the p-value for each test when performing large scale multiple testing. The precision of the bootstrap p-values and that of the false discovery rate (FDR) relies on the number of bootstraps used for testing each hypothesis. Clearly, the larger the number of bootstraps the better the precision. However, the required number of bootstraps can be computationally burdensome, and it multiplies the number of tests to be performed. Further adding to the computational challenge is that in some applications the calculation of the test statistic itself may require considerable computation time. As technology improves one can expect the dimension of the problem to increase as well. For instance, during the early days of microarray technology, the number of probes on a cDNA chip was less than 10,000. Now the Affymetrix chips come with over 50,000 probes per chip. Motivated by this important need, we developed a simple adaptive bootstrap methodology for large scale multiple testing, which reduces the total number of bootstrap calculations while ensuring the control of the FDR. The proposed algorithm results in a substantial reduction in the number of bootstrap samples. Based on a simulation study we found that, relative to the number of bootstraps required for the Benjamini-Hochberg (BH) procedure, the standard FDR methodology which was the proposed methodology achieved a very substantial reduction in the number of bootstraps. In some cases the new algorithm required as little as 1/6th the number of bootstraps as the conventional BH procedure. Thus, if the conventional BH procedure used 1,000 bootstraps, then the proposed method required only 160 bootstraps. This methodology has been implemented for time-course/dose-response data in our software, ORIOGEN, which is available from the authors upon request.

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 15 articles )

Showing results 1 to 15 of 15.
Sorted by year (citations)

  1. Hahn, Georg: Optimal allocation of Monte Carlo simulations to multiple hypothesis tests (2020)
  2. Cinar, Ozan; Ilk, Ozlem; Iyigun, Cem: Clustering of short time-course gene expression data with dissimilar replicates (2018)
  3. Davidov, Ori; Jelsema, Casey M.; Peddada, Shyamal: Testing for inequality constraints in singular models by trimming or winsorizing the variance matrix (2018)
  4. Hahn, Georg: Closure properties of classes of multiple testing procedures (2018)
  5. Wang, Yishi; Stapleton, Ann E.; Chen, Cuixian: Two-sample nonparametric stochastic order inference with an application in plant physiology (2018)
  6. Gandy, Axel; Hahn, Georg: QuickMMCTest: quick multiple Monte Carlo testing (2017)
  7. Gandy, Axel; Hahn, Georg: A framework for Monte Carlo based multiple testing (2016)
  8. Sweeney, Elizabeth; Crainiceanu, Ciprian; Gertheiss, Jan: Testing differentially expressed genes in dose-response studies and with ordinal phenotypes (2016)
  9. Gandy, Axel; Hahn, Georg: MMCTest -- a safe algorithm for implementing multiple Monte Carlo tests (2014)
  10. Sinha, Anshu; Markatou, Marianthi: A platform for processing expression of short time series (PESTS) (2011) ioport
  11. Lim, Changwon; Sen, Pranab K.; Peddada, Shyamal D.: Statistical inference in nonlinear regression under heteroscedasticity (2010)
  12. Mathur, Sunil K.: Statistical bioinformatics with R. (2010)
  13. Peddada, Shyamal D.; Umbach, David M.; Harris, Shawn F.: A response to information criterion-based clustering with order-restricted candidate profiles in short time-course microarray experiments (2009) ioport
  14. Rueda, Cristina; Fernández, Miguel A.; Salvador, Bonifacio: Bayes discriminant rules with ordered predictors (2009)
  15. Guo, Wenge; Peddada, Shyamal: Adaptive choice of the number of bootstrap samples in large scale multiple testing (2008)