Bayesian variable selection regression for genome-wide association studies and other large-scale problems We consider applying Bayesian Variable Selection Regression, or BVSR, to genome-wide association studies and similar large-scale regression problems. Currently, typical genome-wide association studies measure hundreds of thousands, or millions, of genetic variants (SNPs), in thousands or tens of thousands of individuals, and attempt to identify regions harboring SNPs that affect some phenotype or outcome of interest. This goal can naturally be cast as a variable selection regression problem, with the SNPs as the covariates in the regression. Characteristic features of genome-wide association studies include the following: (i) a focus primarily on identifying relevant variables, rather than on prediction; and (ii) many relevant covariates may have tiny effects, making it effectively impossible to confidently identify the complete “correct” subset of variables. Taken together, these factors put a premium on having interpretable measures of confidence for individual covariates being included in the model, which we argue is a strength of BVSR compared with alternatives such as penalized regression methods. We focus primarily on analysis of quantitative phenotypes, and on appropriate prior specification for BVSR in this setting, emphasizing the idea of considering what the priors imply about the total proportion of variance in outcome explained by relevant covariates. We also emphasize the potential for BVSR to estimate this proportion of variance explained, and hence shed light on the issue of “missing heritability” in genome-wide association studies. More generally, we demonstrate that, despite the apparent computational challenges, BVSR can provide useful inferences in these large-scale problems, and in our simulations produces better power and predictive performance compared with standard single-SNP analyses and the penalized regression method LASSO. Methods described here are implemented in a software package, pi-MASS, available from the Guan Lab website http://bcm.edu/cnrc/mcmcmc/pimass.

References in zbMATH (referenced in 20 articles , 1 standard article )

Showing results 1 to 20 of 20.
Sorted by year (citations)

  1. Zhou, Haiming; Huang, Xianzheng: Bayesian beta regression for bounded responses with unknown supports (2022)
  2. Alexopoulos, Angelos; Bottolo, Leonardo: Bayesian variable selection for Gaussian copula regression models (2021)
  3. Luo, Zhao Tang; Sang, Huiyan; Mallick, Bani: A Bayesian contiguous partitioning method for learning clustered latent variables (2021)
  4. Bar, Haim Y.; Booth, James G.; Wells, Martin T.: A scalable empirical Bayes approach to variable selection in generalized linear models (2020)
  5. Posch, Konstantin; Arbeiter, Maximilian; Pilz, Juergen: A novel Bayesian approach for variable selection in linear regression models (2020)
  6. Crawford, Lorin; Flaxman, Seth R.; Runcie, Daniel E.; West, Mike: Variable prioritization in nonlinear black box methods: a genetic association case study (2019)
  7. Metzner, Selma; Wübbeler, Gerd; Elster, Clemens: Approximate large-scale Bayesian spatial modeling with application to quantitative magnetic resonance imaging (2019)
  8. Narisetty, Naveen N.; Shen, Juan; He, Xuming: Skinny Gibbs: a consistent and scalable Gibbs sampler for model selection (2019)
  9. Zhou, Quan; Guan, Yongtao: Fast model-fitting of Bayesian variable selection regression using the iterative complex factorization algorithm (2019)
  10. Zhou, Quan; Guan, Yongtao: On the null distribution of Bayes factors in linear regression (2018)
  11. Briollais, Laurent; Dobra, Adrian; Liu, Jinnan; Friedlander, Matt; Ozcelik, Hilmi; Massam, Hélène: A Bayesian graphical model for genome-wide association studies (GWAS) (2016)
  12. Thompson, Katherine L.; Linnen, Catherine R.; Kubatko, Laura: Tree-based quantitative trait mapping in the presence of external covariates (2016)
  13. Bonnet, Anna; Gassiat, Elisabeth; Lévy-Leduc, Céline: Heritability estimation in high dimensional sparse linear mixed models (2015)
  14. Dickhaus, Thorsten: Simultaneous Bayesian analysis of contingency tables in genetic association studies (2015)
  15. Wu, Zheyang; Sun, Yiming; He, Shiquan; Cho, Judy; Zhao, Hongyu; Jin, Jiashun: Detection boundary and higher criticism approach for rare and weak genetic effects (2014)
  16. Scutari, Marco; Mackay, Ian; Balding, David: Improving the efficiency of genomic selection (2013)
  17. Sverdlov, Serge; Thompson, Elizabeth A.: Correlation between relatives given complete genotypes: from identity by descent to identity by function (2013)
  18. Guan, Yongtao; Stephens, Matthew: Bayesian variable selection regression for genome-wide association studies and other large-scale problems (2011)
  19. Savitsky, Terrance; Vannucci, Marina; Sha, Naijun: Variable selection for nonparametric Gaussian process priors: Models and computational strategies (2011)
  20. Stingo, Francesco C.; Chen, Yian A.; Tadesse, Mahlet G.; Vannucci, Marina: Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes (2011)