CorShrink

CorShrink: Empirical Bayes shrinkage estimation of correlations, with applications. Estimation of correlation matrices and correlations among variables is a ubiquitous problem in statistics. In many cases – especially when the number of observations is small relative to the number of variables – some kind of shrinkage or regularization is necessary to improve estimation accuracy. Here, we propose an Empirical Bayes shrinkage approach, CorShrink, which adaptively learns how much to shrink correlations by combining information across all pairs of variables. One key feature of CorShrink, which distinguishes it from most existing methods, is its flexibility in dealing with missing data. Indeed, CorShrink explicitly accounts for varying amounts of missingness among pairs of variables. Numerical studies suggest CorShrink is competitive with other popular correlation shrinkage methods, even when there is no missing data. We illustrate CorShrink on gene expression data from GTEx project, which suffers from extensive missing observations, and where existing methods struggle. We also illustrate its flexibility by applying it to estimate cosine similarities between word vectors from word2vec models, thereby generating more accurate word similarity rankings