Autoclass - A Bayesian Approach to Classification. We describe a Bayesian approach to the unsupervised discovery of classes in a set of cases, sometimes called finite mixture separation or clustering. The main difference between clustering and our approach is that we search for the “best” set of class descriptions rather than grouping the cases themselves. We describe our classes in terms of probability distribution or density functions, and the locally maximal posterior probability parameters. We rate our classifications with an approximate posterior probability of the distribution function w.r.t. the data, obtained by marginalizing over all the parameters. Approximation is necessitated by the computational complexity of the joint probability, and our marginalization is w.r.t. a local maxima in the parameter space. This posterior probability rating allows direct comparison of alternate density functions that differ in number of classes and/or individual class density functions. We discuss the rationale behind our approach to classification. We give the mathematical development for the basic mixture model, describe the approximations needed for computational tractability, give some specifics of models for several common attribute types, and describe some of the results achieved by the AutoClass program..

References in zbMATH (referenced in 70 articles )

Showing results 41 to 60 of 70.
Sorted by year (citations)
  1. Ling, Charles X.; Yang, Qiang: Discovering classification from data of multiple sources (2006) ioport
  2. Osei-Bryson, Kweku-Muata; Giles, Kendall: Splitting methods for decision tree induction: An exploration of the relative performance of two entropy-based families (2006)
  3. Silva, Helena Brás; Brito, Paula; Pinto da Costa, Joaquim: A partitional clustering algorithm validated by a clustering tendency index based on graph theory (2006)
  4. Valentini, Giorgio; Ruffino, Francesca: Characterization of lung tumor subtypes through gene expression cluster validity assessment (2006)
  5. Vogel, Julia; Schiele, Bernt: Performance evaluation and optimization for content-based image retrieval (2006)
  6. Wu, Xintao: Incorporating large unlabeled data to enhance EM classification (2006) ioport
  7. Wu, Xintao: Incorporating large unlabeled data to enhance EM classification (2006) ioport
  8. Agrawal, Rakesh; Gehrke, Johannes; Gunopulos, Dimitrios; Raghavan, Prabhakar: Automatic subspace clustering of high dimensional data (2005) ioport
  9. Jin, Huidong; Leung, Kwong-Sak; Wong, Man-Leung; Xu, Zong-Ben: Scalable model-based cluster analysis using clustering features (2005) ioport
  10. Liao, T. Warren: Clustering of time series data -- a survey (2005)
  11. Lin, Tsau Young; Chiang, I-Jen: A simplicial complex, a hypergraph, structure in the latent semantic space of document clustering (2005)
  12. McKenna, Stephen J.; Charif, Hammadi Nait: Summarising contextual activity and detecting unusual inactivity in a supportive home environment (2005) ioport
  13. Ryu, Tae-Wan; Eick, Christoph F.: A database clustering methodology and tool (2005) ioport
  14. Shi, Z.; Milios, E.; Zincir-Heywood, N.: Learning stochastic regular grammars by means of a state merging method (2005) ioport
  15. Shi, Z.; Milios, E.; Zincir-Heywood, N.: Post-supervised template induction for information extraction from lists and tables in dynamic Web sources (2005) ioport
  16. Wang, Xiaogang; Zidek, James V.: Selecting likelihood weights by cross-validation (2005)
  17. Zhao, Ying; Karypis, George; Fayyad, Usama: Hierarchical clustering algorithms for document datasets (2005) ioport
  18. Arturo Medrano-Soto; J. Andres Christen; Julio Collado-vides: BClass: A Bayesian Approach Based on Mixture Models for Clustering and Classification of Heterogeneous Biological Data (2004) not zbMATH
  19. Chen, R.; Sivakumar, K.; Kargupta, H.: Collective mining of Bayesian networks from distributed heterogeneous data (2004) ioport
  20. Mastroianni, Carlo; Talia, Domenico; Trunfio, Paolo: Metadata for managing grid resources in data mining applications (2004)