Methylation

Predicting methylation from sequence and gene expression using deep learning with attention. DNA methylation has been extensively linked to alterations in gene expression, playing a key role in the manifestation of multiple diseases, especially cancer. Hence, the sequence determinants of methylation and the relationship between methylation and expression are of great interest from a molecular biology perspective. Several models have been suggested to support the prediction of methylation status. These models, however, have two main limitations: (a) they are limited to specific CpG loci; and (b) they are not easily interpretable. We address these limitations using deep learning with attention. We produce a general model that predicts DNA methylation for a given sample in any CpG position based solely on the sample’s gene expression profile and the sequence surrounding the CpG. Depending on gene-CpG proximity, our model attains a Spearman correlation of up to 0.84 for thousands of CpG sites on two separate test sets of CpG positions and subjects (cancer and healthy samples). Importantly, our approach, especially the use of attention, offers a novel framework with which to extract valuable insights from gene expression data when combined with sequence information. We demonstrate this by linking several motifs and genes to methylation activity, including Nodal and Hand1. The code and trained weights are available at: https://github.com/YakhiniGroup/Methylation.