Unsupervised classification for tiling arrays: chip-chip and transcriptome. Tiling arrays make possible a large-scale exploration of the genome thanks to probes which cover the whole genome with very high density, up to 2,000,000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work, we propose to consider both questions simultaneously as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge such as annotation and spatial dependence between probes. Since probes are not biologically relevant units, we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of {it Arabidopsis thaliana} obtained with a NimbleGen tiling array highlight the importance of a precise modeling and of the region classification. The “TAHMMAnnot” package is implemented in R and C and is freely available from CRAN.

Keywords for this software

Anything in here will be replaced on browsers that support the canvas element