CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts. We present CLIFF, an algorithm for clustering biological samples using gene expression microarray data. This clustering problem is difficult for several reasons, in particular the sparsity of the data, the high dimensionality of the feature (gene) space, and the fact that many features are irrelevant or redundant. Our algorithm iterates between two computational processes, feature filtering and clustering. Given a reference partition that approximates the correct clustering of the samples, our feature filtering procedure ranks the features according to their intrinsic discriminability, relevance to the reference partition, and irredundancy to other relevant features, and uses this ranking to select the features to be used in the following round of clustering. Our clustering algorithm, which is based on the concept of a normalized cut, clusters the samples into a new reference partition on the basis of the selected features. On a well-studied problem involving 72 leukemia samples and 7130 genes, we demonstrate that CLIFF outperforms standard clustering approaches that do not consider the feature selection issue, and produces a result that is very close to the original expert labeling of the sample set.
Keywords for this software
References in zbMATH (referenced in 10 articles )
Showing results 1 to 10 of 10.
- Yoshida, Ruriko; Fukumizu, Kenji; Vogiatzis, Chrysafis: Multilocus phylogenetic analysis with gene tree clustering (2019)
- Kung, S. Y.; Luo, Yuhui; Mak, Man-Wai: Feature selection for genomic signal processing: unsupervised, supervised, and self-supervised scenarios (2010) ioport
- Nahapetyan, Artyom; Busygin, Stanislav; Pardalos, Panos: An improved heuristic for consistent biclustering problems (2008)
- Higham, Desmond J.; Kalna, Gabriela; Kibble, Milla: Spectral clustering and its use in bioinformatics (2007)
- Jiang, Daxin; Pei, Jian; Ramanathan, Murali; Lin, Chuan; Tang, Chun; Zhang, Aidong: Mining gene-sample-time microarray data: A coherent gene cluster discovery approach (2007) ioport
- Mumey, Brendan; Showe, Louise; Showe, Michael: Discovering classes in microarray data using island counts (2007)
- Busygin, Stanislav; Prokopyev, Oleg A.; Pardalos, Panos M.: Feature selection for consistent biclustering via fractional 0-1 programming (2005)
- Cai, Zhipeng; Heydari, Maysam; Lin, Guohui: Clustering binary oligonucleotide fingerprint vectors for DNA clone classification analysis (2005)
- Mamitsuka, Hiroshi: Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets (2005) ioport
- Tritchler, David; Fallah, Shafagh; Beyene, Joseph: A spectral clustering method for microarray data (2005)