Finding large average submatrices in high dimensional data. The search for sample-variable associations is an important problem in the exploratory analysis of high dimensional data. Biclustering methods search for sample-variable associations in the form of distinguished submatrices of the data matrix. (The rows and columns of a submatrix need not be contiguous.) We propose and evaluate a statistically motivated biclustering procedure (LAS) that finds large average submatrices within a given real-valued data matrix. The procedure operates in an iterative-residual fashion, and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value. We examine the performance and potential utility of LAS, and compare it with a number of existing methods, through an extensive three-part validation study using two gene expression data sets. The validation study examines quantitative properties of biclusters, biological and clinical assessments using auxiliary information, and classification of disease subtypes using bicluster membership. In addition, we carry out a simulation study to assess the effectiveness and noise sensitivity of the LAS search procedure. These results suggest that LAS is an effective exploratory tool for the discovery of biologically relevant structures in high dimensional data. Software is available at url{}.

References in zbMATH (referenced in 18 articles )

Showing results 1 to 18 of 18.
Sorted by year (citations)

  1. Li, Gen: Generalized co-clustering analysis via regularized alternating least squares (2020)
  2. Huang, Lei; Bai, Jiawei; Ivanescu, Andrada; Harris, Tamara; Maurer, Mathew; Green, Philip; Zipunnikov, Vadim: Multilevel matrix-variate analysis and its application to accelerometry-measured physical activity in clinical populations (2019)
  3. Gamarnik, David; Li, Quan: Finding a large submatrix of a Gaussian random matrix (2018)
  4. Hajek, Bruce; Wu, Yihong; Xu, Jiaming: Submatrix localization via message passing (2018)
  5. Arias-Castro, Ery; Liu, Yuchao: Distribution-free detection of a submatrix (2017)
  6. Bhamidi, Shankar; Dey, Partha S.; Nobel, Andrew B.: Energy landscape for large average submatrix detection problems in Gaussian random matrices (2017)
  7. Chi, Eric C.; Allen, Genevera I.; Baraniuk, Richard G.: Convex biclustering (2017)
  8. Chen, Yudong; Xu, Jiaming: Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices (2016)
  9. Butucea, Cristina; Ingster, Yuri I.; Suslina, Irina A.: Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix (2015)
  10. Deshpande, Yash; Montanari, Andrea: Finding hidden cliques of size (\sqrtN/e) in nearly linear time (2015)
  11. Ma, Zongming; Wu, Yihong: Computational barriers in minimax submatrix detection (2015)
  12. Montanari, Andrea: Finding one community in a sparse graph (2015)
  13. Wilson, James D.; Wang, Simi; Mucha, Peter J.; Bhamidi, Shankar; Nobel, Andrew B.: A testing based extraction algorithm for identifying significant communities in networks (2014)
  14. Butucea, Cristina; Ingster, Yuri I.: Detection of a sparse submatrix of a high-dimensional noisy matrix (2013)
  15. Izenman, Alan J.; Harris, Philip W.; Mennis, Jeremy; Jupin, Joseph; Obradovic, Zoran: Local spatial biclustering and prediction of urban juvenile delinquency and recidivism (2011)
  16. Addario-Berry, Louigi; Broutin, Nicolas; Devroye, Luc; Lugosi, Gábor: On combinatorial testing problems (2010)
  17. Lee, Mihee; Shen, Haipeng; Huang, Jianhua Z.; Marron, J. S.: Biclustering via sparse singular value decomposition (2010)
  18. Shabalin, Andrey A.; Weigman, Victor J.; Perou, Charles M.; Nobel, Andrew B.: Finding large average submatrices in high dimensional data (2009)