GeneClust

GeneClust. Two-way clustering techniques—such as hierarchical clustering, K-means clustering, tree-structured vector quantization, self-organizing maps, and principal components analysis—have been used to organize genes into groups or “clusters“ with similar behavior across relevant tissue samples or cell lines. However, these procedures seek a single global reordering of the samples or cell lines for all genes, and although they are effective in uncovering gross global structure, they are much less effective when applied to more complex clustering patterns (for example, where there are overlapping gene clusters). This chapter describes gene shaving (Hastie et al., 2000), a simple but effective method for identifying subsets of genes with coherent expression patterns and large variations across samples or conditions. After summarizing the gene-shaving methodology, we describe two software packages implementing the method: a small package written in S (usable in either S-Plus or R) and a considerably faster, mixed-language implementation with a graphical user interface intended for more applied use. The package can perform unsupervised, fully supervised, or partially supervised gene shaving, and the user is able to specify various parameters pertinent to the algorithm. The package outputs graphical representations of the extracted clusters (as colored heat maps) and diagnostic statistics. We then demonstrate how the latter tool can be used to analyze two published datasets (the Alon colon data and the NCI60 data).