- Research article
- Open Access
Whole genome and exome sequencing of monozygotic twins discordant for Crohn’s disease
© Petersen et al.; licensee BioMed Central Ltd. 2014
- Received: 25 April 2014
- Accepted: 27 June 2014
- Published: 5 July 2014
Crohn’s disease (CD) is an inflammatory bowel disease caused by genetic and environmental factors. More than 160 susceptibility loci have been identified for IBD, yet a large part of the genetic variance remains unexplained. Recent studies have demonstrated genetic differences between monozygotic twins, who were long thought to be genetically completely identical.
We aimed to test if somatic mutations play a role in CD etiology by sequencing the genomes and exomes of directly affected tissue from the bowel and blood samples of one and the blood-derived exomes of two further monozygotic discordant twin pairs. Our goal was the identification of mutations present only in the affected twins, pointing to novel candidates for CD susceptibility loci. We present a thorough genetic characterization of the sequenced individuals but detected no consistent differences within the twin pairs. An estimate of the CD susceptibility based on known CD loci however hinted at a higher mutational load in all three twin pairs compared to 1,920 healthy individuals.
Somatic mosaicism does not seem to play a role in the discordance of monozygotic CD twins. Our study constitutes the first to perform whole genome sequencing for CD twins and therefore provides a valuable reference dataset for future studies. We present an example framework for mosaicism detection and point to the challenges in these types of analyses.
- Crohn’s disease
- Discordant monozygotic twins
- Somatic mosaicism
- Whole genome sequencing
- Exome sequencing
Crohn’s disease (CD, OMIM #266600) is a complex, chronic inflammatory bowel disease (IBD) affecting between 0.1–16/100,000 persons worldwide, with a higher incidence in the Western world. As a multifactorial disease, a variety of genetic and environmental factors play a role in its etiology. Genome-wide association studies (GWAS) and meta-analyses have so far identified a total of 163 susceptibility loci for IBD , with 140 for CD. These GWAS loci have highlighted important pathways underlying IBD, such as immunity and autophagy. Yet, the identified variants so far explain less than 30% of the cumulative genetic variance for CD and it is thought that a part of the so-called missing heritability may be found in rare variants with larger effect sizes. Next-generation sequencing enables the genome- and exome-wide identification of novel variants and is therefore the current method of choice for finding new rare susceptibility loci for complex diseases. In the past 5 years researchers even identified several monogenic forms of severe early-onset colitis. For example, single mutations in IL10 (interleukin-10 ) and the genes encoding for its receptor (IL10RA and IL10RB) as well as mutations in XIAP (X-linked inhibitor of apoptosis protein [4, 5]) have been shown to cause severe early-onset IBD. Monozygotic (MZ) twins have long served as a model for the influence of environmental factors versus genetic factors (“nature vs. nurture”), since they are believed to be genetically identical. For example, our previously published German epidemiologic twin study revealed that red meat consumption, high antibiotic intake and living abroad before time of diagnosis, especially in countries of the non-developed world, may play a role in disease etiology . Moreover, our group identified several differentially methylated sites in the colonic epigenome of discordant colitis twins with functional consequences, i.e. impact gene expression . Monozygotic discordant UC (Ulcerative colitis) twins were also shown to differ in the bacterial composition of their gut microbiota, with the affected twins showing less diversity than their healthy co-twins . This suggests an important link between disease and the microbiome.
In the past years it has been repeatedly shown that the assumption of genetically identical monozygotic twins needs to be reconsidered. Genetic differences have been discovered between monozygotic twins , which can be regarded as an extreme form of somatic mosaicism, describing the presence of two populations of cells with different genotypes having developed from a single fertilized egg as a result of postzygotic alterations of the genome . Several studies have shown discordant phenotypes between monozygotic twins resulting from genetic differences such as chromosomal mosaicism (e.g. trisomy 21 ) or dominant gene mutations (first shown for Van der Woude syndrome ) and differences in the copy number profiles of monozygotic concordant and discordant twin pairs were detected . These findings challenge the assumption of disease discordance in monozygotic twins reflecting purely environmental effects. The genetic comparison of monozygotic twins discordant for a complex disease has been previously proposed for finding disease-relevant variants in a set of candidate genes . During the past years sequencing costs have dropped dramatically, thus allowing for whole genome- and exome-wide comparisons of monozygotic twins, enabling the identification of these rare genetic events without the need for prior assumptions like focusing on certain candidate genes. The genetic comparison of monozygotic discordant twins therefore represents a promising possibility for finding novel candidates for disease susceptibility that may help explain some part of the missing heritability. Considering the estimated human mutation rate of ~2.2 to ~2.5×10−8 per position per diploid genome [15, 16], approximately 175 mutations per genome are expected per generation, indicating that genetic differences between monozygotic twins may not be so rare. However, whole genome sequencing and the comparison of monozygotic twins discordant for multiple sclerosis yielded no differences . As a consequence, we aimed for a higher sequencing depth (per base coverage) in our study and included samples of the directly affected tissue in addition to blood samples in the sequencing to maximize our chances of detecting relevant somatic mutations.
Description of the three twin pairs and corresponding sequencing statistics
Twin pair 1
Twin pair 2
Twin pair 3
Age of onset in affected twin
Year of birth
% of genome/exome covered:
Total number of variants
Total number of SNVs
Total number of InDels
Variants in UTR
Damaging SNV predictions
All three twin pairs were female and of German ancestry and co-twins grew up together. In all affected twins, the disease showed ileocaecal localization and the diagnosis was confirmed by both endoscopy and histopathology. The first twin pair (TP1) was 63 years old at time of recruitment with an age-of-onset of 45 in the affected individual. The patient did not suffer from fistula or stenosis but presented with arthritis as an extraintestinal manifestation. So far, no surgery was necessary and flares occurred approximately once a year with few hospitalizations. The second twin pair (TP2) was 45 years old at time of recruitment. The affected twin had been suffering from Crohn’s disease since the age of 19 and presented with anal fistula to the skin and arthritis as an extraintestinal manifestation. She underwent emergency gut surgery due to ileal perforation in the ninth year of disease. The patient was hospitalized approximately once per year since disease onset and has in total spent more than 6 months in the hospital. Since surgery, she has no longer been suffering from regular flares. The third twin pair (TP3) was 32 years old at time of first participation in the study and showed the earliest age-of-onset of the three twin pairs with only twelve years. The patient presented with fistula (both anal and inguinal) and had been admitted to the hospital less than every two years for a total of one to three months since disease onset. The frequency of flares was one to three per year with few hospital stays necessary. In all twin pairs, the healthy twin showed no signs of intestinal inflammation.
For TP1, four genomes and four exomes of blood-derived DNA as well as DNA from bowel biopsies were sequenced. From TP2 and TP3, blood samples of both twins were exome sequenced, resulting in a total of four genomes and eight exomes from three twin pairs. All samples were sequenced to a minimum average coverage of 36× (Table 1). In total, more than 500Gb of sequences were uniquely aligned for the four genomes, resulting in coverage of more than 93% of the whole human genome reference. This value represents the upper limit of what can possibly be covered, since the female human genome reference hg19 (excluding chromosome Y) that was used consists to 6.6% of masked regions. These results are also consistent with previously reported genomes [15, 19]. For the exomes, on average over 96% of the targeted regions were covered. The genomes yielded a mean value of 3,063,616 variants, which is in agreement with an estimated human nucleotide diversity of 0.1% . The expected transition/transversion (Ti/Tv) ratio for genome-wide single nucleotide variants (SNVs) is 2.1 and approximately 2.8 for exons  and can be used as an indicator for the quality of SNV detection. Our genome-wide SNVs accurately matched these values, while those for exome-wide SNVs ranged between 2.6 and 2.8, and were therefore between the recommended values for genomes and exomes. This is to be expected, as the enriched targets do not exclusively consist of exons, but to a large part also include additional regions such as the untranslated regions (UTRs) of numerous genes. As an additional quality control for our variant calls, we calculated the SNV concordance in pairwise comparisons for all twin pairs. Greater than 93% concordance was achieved for the genomes of TP1 and greater than 99% for the corresponding HiSeq exomes. The overlapping genome and exome SNVs were more than 93% concordant. The exomes of TP2 and TP3 showed more than 96% pairwise concordance. These results additionally confirmed the monozygosity of all analyzed twin pairs and suggest an adequate data quality. Detailed numbers for the concordance in all datasets are illustrated in Additional file 1: Figure S1. The ratio of nonsynonymous to synonymous changes among rare (frequency <0.5%) variants typically ranges between 1 and 2, and among common variants between 0.5 and 1.5 . For a typical whole genome resequencing project the estimated ratio is between 0.8 and 1.0 . Our observed nonsynonymous to synonymous ratios between 0.8 and 0.9 (Table 1) confirm these observations. The numbers of non-frameshift and frameshift InDels and nonsense SNVs detected in our study average below those of the 1000 genomes project (estimation of 130 to 178 non-frameshift and 192 to 280 frameshift InDels and 67 to 100 nonsense SNVs) while we detected a slightly higher number of splice-site variants (28 to 45 splice-site variants estimated). These deviations may be attributable to differences in data generation or more stringent variant calling. We estimated the number of potentially deleterious and disease-related variants in our dataset by computing the number of variants classified as disease-causing (DM) in the Human Gene Mutation Database (HGMD Professional, version 2013.3). Even a healthy genome contains between 50 and 100 DM variants  and we observed no excess in our dataset. We detected between 630 and 767 missense variants predicted to be “deleterious” by SIFT  or “probably damaging” by PolyPhen2  in each of our genomes and exomes. Missense variants with a Grantham score above 100, representing amino acid changes defined as moderately radical (>100) and radical (>150 ) have a higher likelihood of clinical consequences. We found 17 to 18% of all missense variants to fall into this category. A high PhyloP value illustrates a strong evolutionary conservation of the affected genomic position and most pathogenic missense variants have been shown to have a PhyloP score above 2.5 . With approximately 7,700 highly conserved variants the genomes only display approximately four times the number found in the exomes, despite magnitude differences in total size. This is attributable to the higher conservation of exonic regions. As even the genome of a healthy person contains at least 100 loss-of-function variants , it is difficult to establish a connection between the large number of potentially deleterious variants and a higher susceptibility to CD.
For TP1 we also performed differential copy number variation analysis. The genomes showed 128 CNVs potentially present only in the affected twin after filtering. These were visually inspected in the alignments and compared to the CNVs called for the exomes. The pairwise comparison of exome CNVs from blood samples did not show any region above the filtering cutoffs while the biopsies showed a deletion on chromosome 11 for the exome data. However, this region does not show up in the comparison of blood and biopsy of either twin, nor does it show up in the blood comparison or the genomic CNVs, and there are not sufficient SNVs to support a loss of heterozygosity (LOH) in the region. Therefore, none of the observed potential copy number losses or gains differing between twins or tissues showed sufficient support to be deemed of interest for follow-up analysis.
We here report the first whole genome and exome sequences of MZ twin pairs discordant for CD. We detected no reproducible differences within the twin pairs. Based on the good data quality and a high genotype concordance between samples and different methods, our data provided a solid foundation for the detection of somatic mutations between the co-twins. The low overlap between the three used tools for the detection of differences argues against the presence of somatic mutations. We therefore assume a high probability of the analyzed genomes and exomes being truly genetically identical. However, in spite of adequate data quality and the thorough analysis methods, the probability remains that we overlooked existing differences. The samples sequenced for TP1 consisted of blood samples as well as bowel biopsies for genomes and exomes. We decided on this approach since it allows for an exceptionally thorough look at the genetic data and offers the unique possibility of identifying tissue-specific somatic mosaicisms, especially in the tissue of interest for the particular disease. But the analysis of biopsy-derived DNA also involved a certain trade-off. The low yield of DNA extracted from a single biopsy required the pooling of DNA from several biopsies for the sequencing. A somatic mosaicism may not have been present in all biopsies used for sequencing and it is unlikely that we would have been able to detect mutations present in only a small fraction of the sequenced samples and if so, validation through Sanger sequencing would present another challenge. On the other hand, mutations present only in some regions of the bowel would be rather unlikely to have an effect on the development of Crohn’s disease. In the future, this problem may best be handled through the application of single cell sequencing , where the resulting genome truly represents the genome of a single cell and not the pool of what may possibly be several mosaics. This also applies to tumor genome resequencing where similar single-cell-based approaches are currently envisioned. For TP1, the biopsies were taken from affected regions in the colon, however the primarily ileocaecal disease localization in the affected twin may be an issue for finding somatic mutations. It is questionable if the affected twin from pair 1 represents an ideal candidate for a genetic study. With an age of onset of 45, only a few short hospital stays and no surgery needed so far, the course of the disease is comparatively mild and hence the genetic contribution may be low. However, the availability of the bowel biopsies from both twins considerably limited the number of possible twin pairs. In turn, we chose a lower age of onset and more severe disease course for TP2 and TP3, for whom only blood samples were available and subjected to sequencing. The three twin pairs chosen for this project therefore constitute the best possible trade-off between sample suitability and availability.
The theoretical probability that an offspring of a diamniotic MZ twin father carries a germline mutation that can be detected by allele-specific PCR in a sperm sample of their biological father, but not of his twin brother was defined at approximately 83% . This thought experiment assumes circumstances clearly differing from the situation with our three female twin pairs, but nonetheless this estimated value may give an idea of the probability of finding differences in our twin pairs. Essentially, an experimental test of this theory in one male twin pair did detect five differences in genomes from sperm-derived DNA . Only one of these SNVs however was also detected in the according blood samples, suggesting much variation in the detectable number of differences, also depending on the tissue type sequenced. In this study we focused solely on the detection of genetic differences between MZ twins for finding novel candidates for CD susceptibility. However, several other mechanisms may also be involved in disease discordance. Bowel biopsies of MZ twins discordant for ulcerative colitis, from the same cohort as in this study, were shown to differ in their DNA methylation patterns and exhibit differential gene expression , suggesting a role in disease development and yielding several novel candidate genes. For psoriasis, another disease involving chronic inflammation, DNA methylation and gene expression data in MZ discordant twins showed no differences between co-twins when analyzed separately. However, a combined analysis identified genes where differences in DNA methylation between unaffected and affected twins correlated with differences in gene expression  and hereby identified several known and novel susceptibility genes. In contrast, sequencing of the genomes of one twin pair and transcriptome sequencing, SNP chip and methylation chip data of three discordant twin pairs revealed no differences between co-twins discordant for multiple sclerosis .
Our results are in accordance with previously published results and suggest that it is unlikely that somatic mutations have a substantial impact on the development of Crohn’s disease, yet we cannot completely exclude the possibility of having missed existing somatic mutations in the three twin pairs examined. Moreover, our study provides an analytical example framework to perform mosaicism detection and hints at potential weaknesses of existing variant detection tools. In the future, systematic transcriptome, methylome and microbiome analyses on MZ discordant twins using Next Generation Sequencing technologies are the way forward and can further elucidate the role of differential gene expression and differences in the microbial composition of the bowel in disease discordance.
The monozygotic twin pairs sequenced in this study were recruited as part of the cohort described in Spehlmann et al.. The study setup was approved by the Bioethical Committee of the University of Kiel. All patients gave written informed consent before data and biomaterials were collected. Biopsies were taken endoscopically from a defined area of the colon, and immediately snap-frozen in liquid nitrogen. DNA was extracted from the biopsies using the QIAamp Tissue DNA preparation kit (Qiagen, Hilden, Germany). Genomic DNA was extracted from blood using the Invitek kit (Invitek, Berlin, Germany) and each DNA sample was evaluated by gel electrophoresis for the presence of high-molecular weight DNA.
The four genomes of TP1 were sequenced on an Applied Biosystems SOLiD v.3+ on four to five whole slides per sample using a combination of paired-end (50 and 25 or 35 bp read length) and 1 kb insert mate-pair libraries (50 and 35 bp read length) (Additional file 1: Table S1). The four exomes of TP1 were enriched using Illumina’s TruSeq exome kit (Illumina, San Diego, CA, USA) according to the manufacturer’s instructions and sequenced together on one lane of an Illumina HiSeq2000 with 100 bp paired-end reads. The four exomes of TP2 and TP3 were enriched using the SureSelect Human All Exon kit v.2 (Agilent Technologies, Santa Clara, CA, USA) and sequenced with 50 and 35 bp paired-end reads on one quarter slide of the SOLiD v3+ each.
Mapping and variant calling
The SOLiD reads were mapped using ABI’s Bioscope v1.2 software, for the Illumina HiSeq data we used BWA . Further processing involved formatting with SAMtools  duplicate removal, local realignment around InDels, base quality score recalibration and coverage calculation using Picard (http://picard.sourceforge.net), BEDtools  and GATK , which was also used for variant detection. We applied our in-house tool snpActs (http://snpacts.ikmb.uni-kiel.de/) for variant annotation, computation of concordance and their illustration in Venn diagrams.
Detection of somatic mutations
For the genomes and exomes of TP1, three pairwise comparisons were carried out: CD biopsy vs. CD blood, CD blood vs. healthy blood and CD biopsy vs. healthy biopsy. We used SomaticSniper v0.7.4  for all genomes and exomes, filtering reads with mapping quality below 10. The somatic score calculated by SomaticSniper represents the phred-scaled probability that the two genotypes are different and ranges between 0 and 255. We manually checked those with a somatic score above 100 (representing a probability of 1-10(100/-10) for a difference between samples). For the comparison using pibase v.1.4.5  we used all variant positions called for the respective exomes, while for the genomes we reduced the number of query positions by using the variants not called in all four and not called in both biopsy or blood samples and with a maximum 1000 genomes frequency of 20%. We then used pibase_bamref to query the positions in the bam files, pibase_consensus to compute the genotypes and pibase_fisherdiff for the pairwise comparison of samples using a Fisher’s exact test. All variants with p-values below 0.01 were manually inspected. The third tool for the detection of differences was VarScan v2.3.2. . However, the results for the genome data indicated problems with the compatibility with SOLiD data mapped with BioScope, most likely due to the library types used, including different read lengths and insert sizes. We therefore excluded the VarScan results for the genomes. For the remaining comparisons we visually inspected somatic SNVs with p-values below 0.01 and no occurrence of the variant allele in the second sample. Additional file 1: Table S2 summarizes the number of inspected sites for all comparisons with the three tools and the concordances between the three methods are shown as Venn diagrams in Additional file 1: Figure S2 a-c.
CNVs in the genome data were detected using RDXplorer  and filtered with a customized pipeline. We narrowed down our selection the CNVs called in both samples of the affected twin but not in those of the healthy twin to find differences. We used BEDtools to detect CNVs reciprocally overlapping by at least 90% in the blood sample and biopsy of the affected (18,680 CNVs remaining), but not overlapping CNVs in the two samples from the healthy twin (1,235 remaining). We removed likely false-positive copy number loss calls locating to regions often experiencing low coverage due to mappability issues in non-unique genomic regions included in the “Mappability or Uniqueness of Reference Genome from ENCODE” track from UCSC  calculated for the uniqueness of the reference in 35 bp Windows (ftp://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/). For the remaining 128 CNVs potentially present only in the affected twin we added annotations from CCDS genes and known CNVs from the Database of Genomic Variants (DGV ) followed by manual inspection in the alignments in the Integrative Genomics Viewer (IGV ). For the CNVs in the exome data of TP1 we applied the in-house CNV calling pipeline, clinCNV (S. Ossowski, personal communication). This tool leverages the depth of coverage from multiple samples, and from paired samples (e.g. normal/tumor pairs) to identify copy number variable regions. Potential CNV regions were filtered based on number of 100 bp windows per region, percentage of windows with a positive loss or gain call, and average ratio against 58 healthy background control samples (20 baits, 80% support, somatic ratio <0.62 for losses or >1.38 for gains).
Estimation of CD susceptibility
The susceptibility estimation was based on data from the Immunochip project of the International IBD Genetics Consortium (IIBDGC; http://www.ibdgenetics.org). The genotypes of 1,920 healthy German controls were extracted from the data set release 5 (MSControl_QC_Version3) and filtered down to the SNVs associated with CD by Jostins el al.  using the PLINK software . The original 163 SNVs displaying association with CD, UC and IBD were based on release 4 of the data set. We excluded those exclusively associated with UC and another seven were removed in release 5 due to quality control filtering. For our dataset, the remaining 133 SNVs were used for TP1, of which 27 were exclusively associated with CD. For TP2 and TP3 the control dataset was reduced to those SNVs where genotypes were available from the exomes, resulting in a total of 44 and 40 SNVs, respectively (Additional file 1: Table S3). The risk alleles and odds ratios (OR) calculated by Jostins et al. were used for the allelic carriership based model: For each occurring risk allele in the twin pairs and the controls the logarithmic ORs were summed up. The distribution of the sum was plotted using R scripts , and the particular location for the analyzed twin pair was marked.
Availability of supporting data
The genome and exome sequencing data supporting the results of this article are available upon request.
This work was in part supported by the DFG Cluster of Excellence “Inflammation at Interfaces” and received funding under the EU Seventh Framework Programme [FP7/2007-2013, grant number 262055, ESGI]. The authors would like to thank Melanie Friskovec, Catharina von der Lancken, Sandra Greve, Tanja Henke, Stefanie Arndt, Dorina Ölsner and Ilona Urbach for technical assistance and the twins for taking part in this study.
- Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, Lee JC, Schumm LP, Sharma Y, Anderson CA, Essers J, Mitrovic M, Ning K, Cleynen I, Theatre E, Spain SL, Raychaudhuri S, Goyette P, Wei Z, Abraham C, Achkar J-P, Ahmad T, Amininejad L, Ananthakrishnan AN, Andersen V, Andrews JM, Baidoo L, Balschun T, Bampton PA, Bitton A, et al: Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature. 2012, 491: 119-124.PubMed CentralPubMedView ArticleGoogle Scholar
- Glocker E-O, Frede N, Perro M, Sebire N, Elawad M, Shah N, Grimbacher B: Infant colitis–it’s in the genes. Lancet. 2010, 376: 1272-PubMedView ArticleGoogle Scholar
- Glocker E-O, Kotlarz D, Boztug K, Gertz EM, Schäffer A a, Noyan F, Perro M, Diestelhorst J, Allroth A, Murugan D, Hätscher N, Pfeifer D, Sykora K-W, Sauer M, Kreipe H, Lacher M, Nustede R, Woellner C, Baumann U, Salzer U, Koletzko S, Shah N, Segal AW, Sauerbrey A, Buderus S, Snapper SB, Grimbacher B, Klein C: Inflammatory bowel disease and mutations affecting the interleukin-10 receptor. N Engl J Med. 2009, 361: 2033-2045.PubMed CentralPubMedView ArticleGoogle Scholar
- Worthey E a, Mayer AN, Syverson GD, Helbling D, Bonacci BB, Decker B, Serpe JM, Dasu T, Tschannen MR, Veith RL, Basehore MJ, Broeckel U, Tomita-Mitchell A, Arca MJ, Casper JT, Margolis D a, Bick DP, Hessner MJ, Routes JM, Verbsky JW, Jacob HJ, Dimmock DP: Making a definitive diagnosis: successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease. Genet Med. 2011, 13: 255-262.PubMedView ArticleGoogle Scholar
- Zeissig Y, Petersen B-S, Milutinovic S, Bosse E, Mayr G, Peuker K, Hartwig J, Keller A, Kohl M, Laass MW, Billmann-Born S, Brandau H, Feller AC, Röcken C, Schrappe M, Rosenstiel P, Reed JC, Schreiber S, Franke A, Zeissig S: XIAP variants in male Crohn’s disease. Gut. 2014, Epub ahead of printGoogle Scholar
- Spehlmann ME, Begun AZ, Saroglou E, Hinrichs F, Tiemann U, Raedler A, Schreiber S: Risk factors in German twins with inflammatory bowel disease: results of a questionnaire-based survey. J Crohns Colitis. 2012, 6: 29-42.PubMedView ArticleGoogle Scholar
- Häsler R, Feng Z, Bäckdahl L, Spehlmann ME, Franke A, Teschendorff A, Rakyan VK, Down TA, Wilson G a, Feber A, Beck S, Schreiber S, Rosenstiel P: A functional methylome map of ulcerative colitis. Genome Res. 2012, 22: 2130-2137.PubMed CentralPubMedView ArticleGoogle Scholar
- Lepage P, Häsler R, Spehlmann ME, Rehman A, Zvirbliene A, Begun A, Ott S, Kupcinskas L, Doré J, Raedler A, Schreiber S: Twin study indicates loss of interaction between Microbiota and Mucosa of patients With Ulcerative Colitis. Gastroenterology. 2011, 141: 227-236.PubMedView ArticleGoogle Scholar
- Silva S, Martins Y, Matias A, Blickstein I: Why are monozygotic twins different?. J Perinat Med. 2011, 39: 195-202.PubMedGoogle Scholar
- Youssoufian H, Pyeritz RE: Mechanisms and consequences of somatic mosaicism in humans. Nat Rev Genet. 2002, 3: 748-758.PubMedView ArticleGoogle Scholar
- Nieuwint A, Van Zalen-Sprock R, Hummel P, Pals G, Van Vugt J, Van Der Harten H, Heins Y, Madan K: “Identical” twins with discordant karyotypes. Prenat Diagn. 1999, 19: 72-76.PubMedView ArticleGoogle Scholar
- Kondo S, Schutte BC, Richardson RJ, Bjork BC, Knight AS, Watanabe Y, Howard E, de Lima RLLF, Daack-Hirsch S, Sander A, McDonald-McGinn DM, Zackai EH, Lammer EJ, Aylsworth AS, Ardinger HH, Lidral AC, Pober BR, Moreno L, Arcos-Burgos M, Valencia C, Houdayer C, Bahuau M, Moretti-Ferreira D, Richieri-Costa A, Dixon MJ, Murray JC: Mutations in IRF6 cause Van der Woude and popliteal pterygium syndromes. Nat Genet. 2002, 32: 285-289.PubMed CentralPubMedView ArticleGoogle Scholar
- Bruder CEG, Piotrowski A, Gijsbers AACJ, Andersson R, Erickson S, Diazde Ståhl T, Menzel U, Sandgren J, von Tell D, Poplawski A, Crowley M, Crasto C, Partridge EC, Tiwari H, Allison DB, Komorowski J, van Ommen G-JB, Boomsma DI, Pedersen NL, den Dunnen JT, Wirdefeldt K, Dumanski JP: Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am J Hum Genet. 2008, 82: 763-771.PubMed CentralPubMedView ArticleGoogle Scholar
- Mansilla MA, Kimani J, Mitchell LE, Christensen K, Boomsma DI, Daack-Hirsch S, Nepomucena B, Wyszynski DF, Felix TM, Martin NG, Murray JC: Discordant MZ twins with cleft lip and palate: a model for identifying genes in complex traits. Twin Res Hum Genet. 2005, 8: 39-46.PubMedView ArticleGoogle Scholar
- Roach JC, Glusman G, Smit AF a, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, Shendure J, Drmanac R, Jorde LB, Hood L, Galas DJ: Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010, 328: 636-639.PubMed CentralPubMedView ArticleGoogle Scholar
- Nachman MW, Crowell SL: Estimate of the mutation rate per nucleotide in humans. Genetics. 2000, 156: 297-304.PubMed CentralPubMedGoogle Scholar
- Baranzini SE, Mudge J, van Velkinburgh JC, Khankhanian P, Khrebtukova I, Miller N a, Zhang L, Farmer AD, Bell CJ, Kim RW, May GD, Woodward JE, Caillier SJ, McElroy JP, Gomez R, Pando MJ, Clendenen LE, Ganusova EE, Schilkey FD, Ramaraj T, Khan O a, Huntley JJ, Luo S, Kwok P-Y, Wu TD, Schroth GP, Oksenberg JR, Hauser SL, Kingsmore SF: Genome, epigenome and RNA sequences of monozygotic twins discordant for multiple sclerosis. Nature. 2010, 464: 1351-1356.PubMed CentralPubMedView ArticleGoogle Scholar
- Spehlmann ME, Begun AZ, Burghardt J, Lepage P, Raedler A, Schreiber S: Epidemiology of inflammatory bowel disease in a German twin cohort: results of a nationwide study. Inflamm Bowel Dis. 2008, 14: 968-976.PubMedView ArticleGoogle Scholar
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley N a, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, et al: Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008, 456: 53-59.PubMed CentralPubMedView ArticleGoogle Scholar
- Jorde LB, Wooding SP: Genetic variation, classification and “race”. Nat Genet. 2004, 36 (11 Suppl): S28-S33.PubMedView ArticleGoogle Scholar
- 1000 Genomes Project Consortium T: A map of human genome variation from population-scale sequencing. Nature. 2010, 467: 1061-1073.View ArticleGoogle Scholar
- Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491: 56-65.PubMedView ArticleGoogle Scholar
- Kumar P, Henikoff S, Ng PC: Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009, 4: 1073-1081.PubMedView ArticleGoogle Scholar
- Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR: A method and server for predicting damaging missense mutations. Nat Methods. 2010, 7: 248-249.PubMed CentralPubMedView ArticleGoogle Scholar
- Botstein D, Risch N: Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003, 33 (Suppl): 228-237.PubMedView ArticleGoogle Scholar
- Gilissen C, Hoischen A, Brunner HG, Veltman JA: Disease gene identification strategies for exome sequencing. Eur J Hum Genet. 2012, 20: 490-497.PubMed CentralPubMedView ArticleGoogle Scholar
- MacArthur DG, Tyler-Smith C: Loss-of-function variants in the genomes of healthy humans. Hum Mol Genet. 2010, 19: R125-R130.PubMed CentralPubMedView ArticleGoogle Scholar
- Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, Ley TJ, Mardis ER, Wilson RK, Ding L: SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012, 28: 311-317.PubMed CentralPubMedView ArticleGoogle Scholar
- Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK: VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012, 22: 568-576.PubMed CentralPubMedView ArticleGoogle Scholar
- Forster M, Forster P, Elsharawy A, Hemmrich G, Kreck B, Wittig M, Thomsen I, Stade B, Barann M, Ellinghaus D, Petersen B-S, May S, Melum E, Schilhabel MB, Keller A, Schreiber S, Rosenstiel P, Franke A: From next-generation sequencing alignments to accurate comparison and validation of single-nucleotide variants: the pibase software. Nucleic Acids Res. 2012, 41: e16-PubMed CentralPubMedView ArticleGoogle Scholar
- Rashid M, Robles-Espinoza CD, Rust AG, Adams DJ: Cake: a bioinformatics pipeline for the integrated analysis of somatic variants in cancer genomes. Bioinformatics. 2013, 29: 2208-2210.PubMed CentralPubMedView ArticleGoogle Scholar
- Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotechnol. 2011, 29: 24-26.PubMed CentralPubMedView ArticleGoogle Scholar
- Eberwine J, Sul J-Y, Bartfai T, Kim J: The promise of single-cell sequencing. Nat Methods. 2013, 11: 25-27.View ArticleGoogle Scholar
- Krawczak M, Cooper DN, Fändrich F, Engel W, Schmidtke J: How to distinguish genetically between an alleged father and his monozygotic twin: a thought experiment. Forensic Sci Int Genet. 2012, 6: e129-e130.PubMedView ArticleGoogle Scholar
- Weber-Lehmann J, Schilling E, Gradl G, Richter DC, Wiehler J, Rolf B: Finding the needle in the haystack: Differentiating “identical” twins in paternity testing and forensics by ultra-deep next generation sequencing. Forensic Sci Int Genet. 2014, 9: 42-46.PubMedView ArticleGoogle Scholar
- Gervin K, Vigeland MD, Mattingsdal M, Hammerø M, Nygård H, Olsen AO, Brandt I, Harris JR, Undlien DE, Lyle R: DNA methylation and gene expression changes in monozygotic twins discordant for psoriasis: identification of epigenetically dysregulated genes. PLoS Genet. 2012, 8: e1002454-PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25: 1754-1760.PubMed CentralPubMedView ArticleGoogle Scholar
- Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25: 2078-2079.PubMed CentralPubMedView ArticleGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842.PubMed CentralPubMedView ArticleGoogle Scholar
- McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20: 1297-1303.PubMed CentralPubMedView ArticleGoogle Scholar
- Yoon S, Xuan Z, Makarov V, Ye K, Sebat J: Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res. 2009, 19: 1586-1592.PubMed CentralPubMedView ArticleGoogle Scholar
- Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004, 32 (Database issue): D493-D496.PubMed CentralPubMedView ArticleGoogle Scholar
- Macdonald JR, Ziman R, Yuen RKC, Feuk L, Scherer SW: The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 2014, 42: D986-D992.PubMed CentralPubMedView ArticleGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81: 559-575.PubMed CentralPubMedView ArticleGoogle Scholar
- R Development Core Team: R: A language and environment for statistical computing. 2008, Vienna, Austria: R Foundation for Statistical Computing, URL http://www.R-project.org, ISBN 3-900051-07-0Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.