The rise of next-generation sequencing (NGS) brought with it a demand for robust tools for variant detection from sequencing read data, typically after the data has been aligned against a reference sequence. A variety of mature analysis tools, workflows and approaches are already available to the scientific community, and the detection of common types of genomic variation in haploid and diploid genomes is a rapidly maturing area of development [1–3].
More recently, NGS has been employed in order to provide new insight into the genetic mechanisms of cancer, as the technology enables the exploration of tumor genomes in previously infeasible levels of detail. Among many examples, researchers have used it to examine the patterns of genomic alteration in non-small-cell carcinoma  and melanoma cell lines , to discover novel and possibly tumorigenic mutations in the acute myeloid leukemia genome , and have even used findings to inform clinical treatment of a patient with acute promyelocytic leukemia .
Cancer cells have deviated from the normal (germline) genome of the organism by acquiring and selecting for a set of mutations which enable them to grow rapidly and invasively, to resist regulation and/or possibly to metastasize . These changes can be simple single-base mutations to more complex genomic gain, loss or structural change events. The changes can then trigger the cancer process by modifying the function of a protein (e.g. disabling a tumor suppressor gene, or activating an oncogene), silencing a gene’s transcription or affecting a gene’s transcriptional affinity. In order to separate germline variants from these acquired (somatic) mutations of the malignant tissue, many studies have elected to sample and sequence both the tumor tissue and separate tissue with a normal genomic profile from the same individual. The tumor-unique variants are then identified; for this process, researchers have often decided to use established standard variant detection tools on both sequenced genomes, and then apply heuristic filtering methods to establish a set of confident calls out of the two result sets [5, 6].
Cancer genomes, however, pose unique challenges to variant detection from NGS data that define the effectiveness of standard methods. Aneuploidy, massive genomic amplifications and structural variations are common in cancer ; consequently, the assumption of a diploid genotype (made by most variant calling software) is no longer sound. This is further complicated by the fact that specific variations are often rare or unique to each cancer, and cannot be compared to a ‘golden standard’ genomic profile, even within the same cancer type. Some cancers are heterogeneous, with some somatic variants appearing only in small cell subpopulations of the malignant tissue. Subpopulation variants however may be critical to tumor viability  and are therefore interesting to researchers. Finally, tumor biopsies often suffer from degradation and contamination with non-malignant tissue to varying degrees, depending on the type of the tumor and the biopsy method . Generally, it becomes very likely that analysis and downstream research would be hindered by a high false-negative rate by variant calling algorithms that do not take these properties of tumor physiology into consideration.
Presently, tools have been developed or extended with cancer genomics specifically in mind. OncoSNP  utilizes a specialized Bayesian framework for detection of genomic aberrations in cancer, but is designed for the analysis of single nucleotide polymorphism (SNP) microarray data. SNVMix  is one of the first efforts that serves NGS studies, and attempts to resolve point mutations in aneuploid genomes using a binomial-mixture model that is optimized using expectation-maximization. SNVMix does not currently support paired normal/tumor analysis, however. Other approaches include somatic small variant tool Strelka , the new somatic extensions in the variant-detection tool VarScan , and the specialized Bayesian tool SomaticSniper . All of the methods mentioned focus on small genomic events, and none provide specific support for integrated genome/transcriptome analysis, structural variation detection or detection of allelic imbalance.
We present a generalized Bayesian-based approach for detecting genomic aberrations unique to one sample set with the goal of extending beyond detection of point mutations. Our methods are founded on Bayesian statistical theory and extract a probability value for a somatic event by comparing the likelihood of the available evidence against all possible explanations (models), and adjusting the likelihoods with a prior-knowledge probability for each explanation. While we compare the normal genome against models with certain assumptions such as diploidy, the assessment of the tumor data is only in reference to its similarity with normal data. Increased evidence in either the normal or tumor profile will therefore increase sensitivity by either providing more evidence towards a somatic change, or more evidence for lack of variation in the normal. Since this model does not assume a particular distribution of variant evidence in the tumor, it is robust to changes that appear in low allelic frequencies, as would possibly be the case with aneuploid genomes or sequenced samples that were contaminated by stromal cells. Similarly, the detection of allelic imbalance is performed by comparing the likelihood of a ‘balanced’ transcription and the expected evidence presentation on heterozygous loci, against the possibility of the tumor/normal variant proportions being independent.