Missing value estimation for DNA microarrays. Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values. Methods for imputing missing data are needed, therefore, to minimize the effect of incomplete data sets on analyses, and to increase the range of data sets to which these algorithms can be applied. In this report, we investigate automated methods for estimating missing data. Results: We present a comparative study of several methods for the estimation of missing values in gene microarray data. We implemented and evaluated three methods: a Singular Value Decomposition (SVD) based method (SVDimpute), weighted K-nearest neighbors (KNNimpute), and row average. We evaluated the methods using a variety of parameter settings and over different real data sets, and assessed the robustness of the imputation methods to the amount of missing data over the range of 1–20% missing values. We show that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVDimpute and KNNimpute surpass the commonly used row average method (as well as filling missing values with zeros). We report results of the comparative experiments and provide recommendations and tools for accurate estimation of missing microarray data under a variety of conditions. Availability: The software is available at

References in zbMATH (referenced in 55 articles )

Showing results 1 to 20 of 55.
Sorted by year (citations)

1 2 3 next

  1. Freund, Robert M.; Grigas, Paul; Mazumder, Rahul: An extended Frank-Wolfe method with “in-face” directions, and its application to low-rank matrix completion (2017)
  2. Li, Ying; He, Ye; Zhang, Yu: Analyzing gene expression time-courses based on multi-resolution shape mixture model (2016)
  3. Luo, Shan; Xu, Jinfeng; Chen, Zehua: Extended Bayesian information criterion in the Cox model with a high-dimensional feature space (2015)
  4. Yang, Aijun; Li, Yunxian; Tang, Niansheng; Lin, Jinguan: Bayesian variable selection in multinomial probit model for classifying high-dimensional data (2015)
  5. Matyja, Artur; Siminski, Krzysztof: Comparison of algorithms for clustering incomplete data (2014)
  6. Wasito, Ito: Nearest neighbour in least squares data imputation algorithms for marketing data (2014)
  7. Bailey, R.A.; Schiffl, Katharina; Hilgers, Ralf-Dieter: A note on robustness of D-optimal block designs for two-colour microarray experiments (2013)
  8. Shabalin, Andrey A.; Nobel, Andrew B.: Reconstruction of a low-rank matrix in the presence of Gaussian noise (2013)
  9. Wang, Wan-Lun: Mixtures of common factor analyzers for high-dimensional data with missing information (2013)
  10. Cheng, K.O.; Law, N.F.; Siu, W.C.: Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data (2012) ioport
  11. Luengo, Julián; Sáez, José A.; Herrera, Francisco: Missing data imputation for fuzzy rule-based classification systems (2012) ioport
  12. Städler, Nicolas; Bühlmann, Peter: Missing values: sparse inverse covariance estimation and an extension to sparse regression (2012)
  13. Alfons, Andreas; Baaske, Wolfgang E.; Filzmoser, Peter; Mader, Wolfgang; Wieser, Roland: Robust variable selection with application to quality of life research (2011)
  14. Julià, Carme; Sappa, Angel D.; Lumbreras, Felipe; Serrat, Joan; López, Antonio: Rank estimation in missing data matrix problems (2011)
  15. Ma, Shiqian; Goldfarb, Donald; Chen, Lifeng: Fixed point and Bregman iterative methods for matrix rank minimization (2011)
  16. Stingo, Francesco C.; Chen, Yian A.; Tadesse, Mahlet G.; Vannucci, Marina: Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes (2011)
  17. Allen, Genevera I.; Tibshirani, Robert: Transposable regularized covariance models with an application to missing data imputation (2010)
  18. Dax, Achiya: A minimum norm approach for low-rank approximations of a matrix (2010)
  19. García-Laencina, Pedro J.; Sancho-Gómez, José-Luis; Figueiras-Vidal, Aníbal R.: Pattern classification with missing data: a review (2010) ioport
  20. Hron, K.; Templ, M.; Filzmoser, P.: Imputation of missing values for compositional data using classical and robust methods (2010)

1 2 3 next