The promise of personalized medicine is that each patient receives customized treatment from a broad base of options rather than a single, generalized standard of care treatment . This is especially important in cancer where each patient's cancer could be viewed as a separate disease caused by a unique set of aberrations. The rapidly decreasing cost of Next Generation Sequencing (NGS) is rendering this personalized approach a reality. For diseases with relatively high treatment costs, such as cancer, it is now economically viable to obtain whole genome sequencing data for the affected individual as part of the treatment regimen, and with further decreases in cost more and more diseases will follow suit.
However, the genomic sequence of malignant cells only partially captures the abnormalities that lead to malignancy. Other factors such as gene expression levels and epigenetic signals have to be taken into account when characterizing a specific cancer and deciding on an individual's treatment regimen. One prominent epigenetic signal for which a dysregulation in various types of cancer is already well established  is the addition of methyl groups at the 5' carbon of cytosine nucleotides [3, 4].
There are several different methods to obtain genome-wide methylation information using NGS. The most reliable method is bisulfite conversion, where the genomic DNA is treated with sodium bisulfite to convert unmethylated cytosines into uracils and subsequently thymines upon PCR amplification . Sequencing of the converted DNA immediately reveals the degree of methylation at any genomic cytosine by counting the number of observed cytosines vs. thymines; however complete methylome profiling using this method requires sequencing depths far beyond what is feasible today on the scale of larger patient cohorts. The sequencing depth requirements can be significantly alleviated by focusing coverage in CpG-rich genomic regions (e.g., using reduced representation bisulfite sequencing ), but this comes at the expense of greatly diminished genomic-wide coverage. The method used in our lab, MethylCap-seq , instead uses the methyl-binding domain of human MBD2 in order to enrich fragmented genomic DNA based on methylation content. Sequencing the fragments bound to the MBD2 domain provides a genome-wide view of methylation patterns at reasonable sequencing depths.
While the cost aspect of MethylCap-seq is attractive, it has two limitations. First, resolution is at the level of the DNA fragment size, i.e., about 150bp, rather than at the level of the individual CpG. This is not that problematic as long as one is only interested in characterizing the methylation status of extended genomic regions such as CpG islands, promoters, non-coding RNAs, or gene bodies. Second, the number of reads covering a genomic region is only a relative indicator of the amount of methylation in this region, relative to the sample genome as a whole, and thus data normalization is required to compare methylation between samples. This somewhat indirect nature of methylation status determination makes this method prone to data quality issues stemming from poorly prepared libraries. Also, somewhat paradoxically, one of the first parameters one might be interested in knowing, namely the degree of overall methylation of the sample, cannot be directly extracted from the data since relative methylation is encoded in the relative number of reads covering different genomic regions, yet the total number of reads is fixed by the sequencing itself rather than by the actual level of overall methylation in the sample. Here, we first perform a systematic study of the influence of sample quality and the contribution of additional reads (beyond ~13 million unique aligned reads) in MethylCap-seq data. Then, we show experimental evidence that a computational approach for determining overall methylation levels from MethylCap-seq data we recently suggested  approximates actual overall methylation levels. These studies underpin the usability of MethylCap-seq as a reliable method to obtain genome-wide methylation information at reasonable cost.