- Research article
- Open Access
Complexity of a small non-protein coding sequence in chromosomal region 22q11.2: presence of specialized DNA secondary structures and RNA exon/intron motifs
© Delihas. 2015
- Received: 26 January 2015
- Accepted: 28 September 2015
- Published: 14 October 2015
DiGeorge Syndrome is a genetic abnormality involving ~3 Mb deletion in human chromosome 22, termed 22q.11.2. To better understand the non-coding regions of 22q.11.2, a small 10,000 bp non-protein-coding sequence close to the DiGeorge Critical Region 6 gene (DGCR6) was chosen for analysis and functional entities as the homologous sequence in the chimpanzee genome could be aligned and used for comparisons.
The GenBank database provided genomic sequences. In silico computer programs were used to find homologous DNA sequences in human and chimpanzee genomes, generate random sequences, determine DNA sequence alignments, sequence comparisons and nucleotide repeat copies, and to predicted DNA secondary structures.
At its 5′ half, the 10,000 bp sequence has three distinct sections that represent phylogenetically variable sequences. These Variable Regions contain biased mutations with a very high A + T content, multiple copies of the motif TATAATATA and sequences that fold into long A:T-base-paired stem loops. The 3′ half of the 10,000 bp unit, highly conserved between human and chimpanzee, has sequences representing exons of lncRNA genes and segments of introns of protein genes. Central to the 10,000 bp unit are the multiple copies of a sequence that originates from the flanking 5′ end of the translocation breakpoint Type A sequence. This breakpoint flanking sequence carries the exon and intron motifs. The breakpoint Type A sequence seems to be a major player in the proliferation of these RNA motifs, as well as the proliferation of Variable Regions in the 10,000 bp segment and other regions within 22q.11.2.
The data indicate that a non-coding region of the chromosome may be reserved for highly biased mutations that lead to formation of specialized sequences and DNA secondary structures. On the other hand, the highly conserved nucleotide sequence of the non-coding region may form storage sites for RNA motifs.
- Chromosome region 22q.11.2
- Translocation breakpoint sequences
- Biased mutations
- DNA secondary structure
- RNA exons
DiGeorge Syndrome is part of a group of genetic disorders that occur in humans termed 22q11.2 deletion syndrome . The DiGeorge disorder involves ~3 Mb deletion of the chromosomal 22q11.2 region that results in loss of approximately ~40 protein genes. Clinical manifestations of a 22q11.2 deletion are pleiotropic and can include congenital heart disease, developmental problems, neurological abnormalities, immune system abnormalities, learning disabilities and psychiatric problems that also includes autism [1–4].
Deletions in chromosomal region 22q11.2 can occur from an unequal crossover between low copy repeats (LCR)s present in this region of chromosome 22. LCRs can contain very long A + T-rich stem loops that are termed palindromic A + T-rich repeats (PATRR) [5–11]. These have been cloned and sequenced and appear to be central to translocation . PATRRs can form cruciforms and are prone to breakage that can lead to chromosomal translocation. The stem loop sequences are also referred to as translocation breakpoint hot spots. Some of these have flanking sequences that are conserved or partially conserved between closely related PATRR elements.
To better understand the nature of non-coding genomic regions in a 22q.11.2 deletion, an analysis was made of a 10,000 bp non-protein coding region (GRCh38 primary assembly, coordinates:18890337–18900336), which is just upstream of the DiGeorge Critical Region 6 gene (DGCR6). DGCR6 is homologous to a Drosophila melanogaster gene, whose protein product may participate in gonadal cell development and other cell functions . The 10,000 bp region, albeit representing a small fraction of the 3 Mb region, was chosen for analysis as the homologous chimpanzee sequence for the most part has been determined. Importantly, the DGCR6 gene has been annotated in both species and this provided the guide post for alignment of sequences between species. In addition, alignment was possible even with the presence of redundant sequences (low-complexity sequences), and thus, comparisons that reveal high sequence identity and/or variation could be made. Central to this region and other regions of 22q11.2 is the presence of multiple copies of segments of the Translocation Breakpoint Type A sequence (GenBank, AB261997) . We find numerous segments of translocation breakpoint flanking sequence insertions within the 10,000 bp region, but these lack the breakpoint hot spot stem loop sequences (PATRRs). Several of the breakpoint flanking sequences are not present in the homologous chimpanzee region. This enabled us to delineate three sequence Variable Regions, #1-#3 and these are in the 5′ half of the 10,000 bp unit.
The three phylogenetically Variable Regions are very A + T-rich, in both humans and chimpanzee. These regions mutate very rapidly, however, in humans, mutations significantly accelerate the formation of a nine bp sequence, TATAATATA  and there also are sequences that fold into long A:T-rich stem loops, which are better formed and longer in humans relative to the chimpanzee. Of importance, the 22q11.2 region is one of the most susceptible regions of the human genome to undergo genetic recombination , although the 10,000 bp region is not in the region observed to undergo repeated translocation, which is the LCR-B chromosomal region .
To add to the complexity of the 10,000 bp region, we find that translocation breakpoint flanking sequences are carriers of a specific exon sequence that is present in long non-coding RNAs (lncRNA)s and in addition, segments of mRNA intron sequences; these are abundantly found in the 10,000 bp unit. The motifs may represent reservoirs for use in possible formation of new lncRNA and protein genes.
This paper presents new and multifaceted concepts in genomics, which include the spread of RNA exon and intron motifs within the genome by duplication of translocation breakpoint sequences, genomic regions reserved for translocation breakpoint sequences that may participate in formation of highly biased mutations in A + T-rich regions, and phylogenetically stable storage regions in the genome that contain both lncRNA exon and mRNA intron sequences.
Alignment at the 5′ half of the human 10,000 bp segment shows three significant variable sequence blocks, termed Variable Regions #1- #3 (Fig. 1a). The inserts are breakpoint Type A flanking sequences, and these are a major component of the 5′ half of the human 10,000 bp unit. However, the nucleotide sequence lengths within each Variable Region differ when the human sequence is compared with the homologous chimpanzee sequence. This is not due to breakpoint sequence additions, but due to other added sequences that are only present in the human segments of the Variable Regions. There are no additions in the comparable chimpanzee regions, e.g., see human positions 911–1953, Additional file 1: Figure S1.
On the other hand, the 3′ half of the 10,000 segment shows a very high nucleotide sequence identity (97 %) between human and chimpanzee sequences, however, there are two partial breakpoint flanking sequences present in both the human and chimpanzee regions and they are also conserved phylogenetically. Figure 1b shows the locations of intron and exon sequences that are present in the 10,000 bp unit.
Translocation breakpoint sequences and secondary structures
Collaborations between Japanese and American investigators resulted in pioneer work on the characterization of translocation breakpoint hot spot secondary structures and functions of these structures in genetic exchange [10, 17, 18]. In addition, another group, by using biophysical calculations has shown that translocation frequency is very closely related to stem loop ability to form DNA cruciform structures . One palindromic sequence found on chromosomal segment 22q11.2 is Type A (NCBI GenBank: AB261997.1). Two variations, Types B and C are also known. They have minor sequence changes, however Type A has a repeat of the first 363 bp of the 5′ end sequence at its 3′ end sequence. In addition, breakpoint sequences present on chromosome 11 have also been well-characterized . PATRR breakpoint hot spot sequences fold into very long stem-loops [7, 10, 20, 21]. A total of twelve PATRR sequences and their translocation frequencies have been described . We analyzed secondary structures of the twelve PATRR sequences, however, two PATRRs that exhibit extreme examples of translocation frequency are described here. These two differ by over a factor of 100 in translocation frequency . The Chr 22 TYPE C (accession number AB538237.2) sequence shows a near perfect 294 bp stem, two TATAATATA motifs on the stem situated close to the top apex loop, but has an internal bulge with 3 nt on both sides of the stem (Additional file 2: Figure S2). The PATRR structure TYPE C from Chr 22 exhibits one of the highest translocation frequencies . In contrast, a PATRR sequence from Chr 11 (accession number AF391128) is one of four PATRRS that exhibits a very low frequency of translocation . Its predicted secondary structure shows a more imperfect stem with two large looped out regions, has a much smaller stem (87 bp), but it does have a TATAATATA motif close to the top apex loop on the 3′ side (Additional file 3: Figure S3).
A comparison of secondary structures and translocation frequencies of the twelve PATRR sequences suggests the following for PATRRS that display a high frequency of translocation. A near perfect long stem consisting of ~200 bp or more, a small top apex loop (<5 nt), greater than 90 % A:T base pairs in the upper third of the stem and a moderate abundance (~40 %) of G:C pairs in the bottom 2/3 of the stem appear to be important. Small internal loops in the stem are tolerated, but the presence of large internal loops, protruding stem loops, or short stems do not appear be to conducive to high frequency translocation. The TATAATATA sequence close to the top apex loop is common to most PATRRs. However, the motif is not found in all PATRR structures that exhibit translocation, e.g. the PATRR stem loop in NCBI Gene Bank accession #AB235190 nt sequences, albeit this example displays a low frequency of translocation .
The translocation breakpoint Type A, in addition to carrying the A + T palindromic breakpoint sequence, surprisingly contains two unrelated motifs; these reside on the 5′ flanking side of the breakpoint hot spot sequence (Additional file 4: Figure S4). These are RNA motifs that consist of an exon sequence found as exon 1 in different lncRNA transcripts with a high sequence identity, and a partial sequence of an intron found in different mRNA transcripts, also with a high identity.
10,000 bp Unit Variable Regions
TATAATATA copies in DNA variable regions
Properties of stem loops Variable Region #3
TATAATATA near loop
Total Variable #3 Region
Breakpoint Type C
Number of G:C bonds
Although approaching a breakpoint hot spot stem loop (Type C) structure, the human Variable Region #3 secondary structure lacks important signatures of Type C and other PATRR stem loops. Thus, a further maturation is needed to form a structure that is more analogous to a PATRR.
Random sequences of the same length and A + T content as the human Variable Region #3 stem loop 2 fold into imperfect stem loops with significant numbers of “mini-stem loops” protruding from the main stem (e.g., see Fig. 3b). A total of forty random sequences were generated and analyzed for secondary structure. They display widely different types of structures, but importantly, seven out of the forty random sequences display very long stem loops (Additional file 5: Table S1). On the other hand, these structures show many imperfections in stems with bulged and looped out bases and mini-stem loops. One stem loop from a random sequence that best simulates a PATRR-type structure is 123 bp in length (Additional file 6: Figure S5, stem loop 1). Thus, there is a probability that random mutations may play a role in building the secondary structures seen in the Variable Regions.
10,000 bp unit, translocation breakpoint inserts and RNA motifs
An analysis of base pair changes in the exon sequence alignment from lncRNAs (Fig. 6) shows that out of 212 bp, one bp change (position 101) occurs specifically in the human 10,000 bp region but none occur in the translocation breakpoint sequence, whereas other changes occur amongst the lncRNAs themselves. To a first approximation, this may indicate that the translocation breakpoint exon sequence is the original source of exon sequences found in the lncRNAs. The exon sequence found in the human 10,000 bp region shown in Fig. 6 is present in the reverse complement and is at positions 3993–4204, which is in a breakpoint sequence insert region that is not present in the comparable chimpanzee region.
The mRNA intron sequence fragment (133 bp), present in the breakpoint flanking sequence is in both the human (positions 5928–6060) and chimpanzee (2801–2730) genomic regions. This sequence is found repeated in six different protein genes at ~91 % identity compared to the breakpoint sequence (Fig. 7a, b). The intron motif (here termed intron #1) is also found repeated many times in different introns of individual mRNAs, e.g., intron #1 motif is repeated 41 times in the ankyrin 2, neuronal (ANK2) gene (NCBI accession NG_009006). What stands out in the intron sequence alignment is the deletion at position 123 found only in the translocation breakpoint sequence, the human 10,000 bp sequence and the homologous chimpanzee sequence (Fig. 7a). A deletion may have occurred in the breakpoint Type A sequence before incorporation into the chimpanzee genomic region and before branching of the human species. The intron #1 sequence may have a common molecular function as it is abundantly found in several mRNAs and in multiple introns within an mRNA, however, it does not exhibit a special secondary structure or display a significant open reading frame.
Phylogenetically conserved region
Sequences of positions 5757–9694 of the human 10,000 bp segment are highly conserved between human and chimpanzee with 97 % identity over 3937 bp of the human sequence. This raises the question as to why this non-protein coding region is so well conserved. However, there are two mRNA intron sequences in this region, one of which originates from a breakpoint flanking sequence insertion. The region also displays a high nt sequence identity to an lncRNA exon (exon 4) (Fig. 1b).
One intron (intron #1), is a fragment of an mRNA intron and is at positions 5928–6060 of the 10,000 bp unit; this has been discussed above. The sequence is conserved between human and chimpanzee to ~92 % and slightly less between human and the six mRNA introns (~87 %-90 %) (Fig. 7b).
Within the phylogenetically conserved region of the 10,000 bp unit, at human positions 8161–9421 there is a 1260 bp sequence that has an identity of 97 % with a segment of a Vega annotated lncRNA gene, AP003900.6 ENSG00000271308 (Chr. 21: 11,169,720-11,184,046). This 1260 bp sequence, found in the 10,000 bp unit includes the last exon sequence (exon 4) and short segments of flanking intron sequences of the AP003900.6 lncRNA gene. Exon 4 is also well conserved in the homologous chimpanzee region. This exon sequence adds to the variety of RNA motifs found in the 10,000 bp unit.
Positions 9423–9694 (272 bp) are well conserved between human, chimpanzee and other primates such as Rhesus Macaque but this sequence has similarities to a LINE1 element as determined by RepeatMasker [http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker].
It should be noted that Alu SINE elements have been previously found at breakpoint regions and these are associated with gene repeats within LCRs on 22q11.2 [24, 25]. No duplication of the highly conserved sequences (positions 5757–9694) of the human 3′ region of the 10,000 bp unit has been observed in human chromosome 22, but one copy of the 10,000 bp sequence containing the entire conserved sequence is present in each of chromosomes 13 and 21 (ND, unpublished data).
3′ end of 10,000 bp segment
Translocation breakpoint sequences and linked A + T-rich regions are present in different locations of 22q11.2
In a similar vein to the 10,000 bp region, the high A + T-containing region has 58 copies of the sequence TATAATATA, whereas twenty-five random sample sequences of the same length and A + T content average 1.56 copies. The p-value in this case is also <0.0001.
Breakpoint sequences have been extensively duplicated in 22q11.2, and we hypothesize that they may carry information to form adjacent highly biased A + T regions that mutate extensively and are found in several regions of 22q11.2.
Aside from protein coding regions, most of the human genome has not been defined in terms of function, although it is fascinating that much of the genome is transcribed into RNA and a large number of non-coding RNA genes have been characterized [26, 27]. In the analysis of a minute part of chromosome 22, the 10,000 bp region, there may be functions independent of regulation, expression of protein products, or RNA transcripts. With respect to RNA transcripts, no transcript with 100 % identity to annotated lncRNAs has yet been shown to originate from the 10,000 bp region (L. Wilming, Welcome Trust Sanger Institute, personal communication). However, we show here that the 10,000 bp unit has segments with highly variable sequences that appear to be reserved for biased genetic mutational processes. We hypothesize that these may lead to formation of molecular functions, e.g., PATRR formation. On the other hand, the highly conserved sequence at the 3′ half of the 10,000 bp region may be reserved for the storage of RNA motifs. The pattern, i.e., 5′ end segment reserved for mutations and 3′ end of 10,000 bp containing phylogenetically conserved sequences, is conserved between chimpanzee and human genomes, which supports the idea of functional roles.
The three sequence Variable Regions of the 10,000 bp unit, consisting of approximately 1000 bp each have mutations that may be viewed as a trial and error processes with a strong partiality to maintaining a very high A + T content and formation and maintenance of many copies of the TATAATATA motif. Conceptually, this can be viewed as a process analogous to a “biased random walk” as in chemotaxis [28–30]. The analogy is that some mutations eliminate a TATAATATA unit in the human segment (e.g., see Fig. 2) but a larger number of mutations are involved in forming the unit (Table 1). As an accurate alignment of sequences within most of the Variable Regions is not possible due to the presence of low complexity sequences, we can not determine the specific number of TATAATATA units added by base pair mutations or by sequence addition, and the number lost relative to the chimpanzee sequence. Although a function for the conserved TATAATATA motif in translocation has not been determined, what is shown here is a strong bias towards forming and/or maintaining the motif with significantly more copies created in the human genome relative to that in the chimpanzee. Cairns and co-workers  originally proposed the concept of non-random or directed mutations. Recently, Martincorena, Luscombe and coworkers [32, 33], by using phylogenetic and population genetic methods provide evidence for non-random mutation rates. The proposal here on Variable Region mutations is consistent with these previously formed concepts, but as a biased random process. A major question remains, and that is what is the molecular mechanism that drives the mutational process towards this putative biased random mutational process? The variable regions are all linked to translocation breakpoint flanking sequences, thus, it is possible the information lies in these flanking sequences, or in the variable sequences themselves with perhaps a mechanism analogous to microsatellite replication .
The translocation breakpoint 5′ end sequence flanking the stem loop hot spot seems to be a major player in carrying and spreading exon and intron motifs in the genome. Jacob intuitively suggested decades ago that building blocks are used over and over in biological processes to create new functions . What we see here are exon and intron sequences that are used by many different genes and are found in different and multiple lncRNAs and mRNAs, respectively.
Some intron segments are found repeated in different introns of the same mRNA, such as in ryanodine receptor 1 (skeletal) (RYR1) mRNA that has 36 repeats of the same sequence. This intron sequence is also is carried by the breakpoint sequence. A question is why do many different genes carry the same sequence? What may be special about these intron and exon sequences is that they perform a common function in different RNAs, or within the same mRNA.
Sequence Variable Regions at the 5′ end of the 10,000 bp non-coding segment of 22q11.2 show a biased random pattern of mutations that produces a very high A + T content, sequences that fold into long stem loops, and a high abundance of the nine nucleotide sequence TATAATATA; these Variable Sequence Regions are invariably linked to translocation breakpoint Type A 5′ flanking segments. We hypothesize that with further stem loop development these may function in DNA translocation. The 3′ half of the 10,000 bp region consists of sequences that are highly conserved between human and chimpanzee genomes, and this region contains various RNA motifs: both protein gene intron fragments and lncRNA gene exons, and some are carried by translocation breakpoint flanking sequences. As these motifs are well conserved between the two primate species and are found in multiple lncRNA or protein genes, they may be stored for future use in synthesis of new molecular functions. To our knowledge this is the first observation of translocation breakpoint sequences carrying and spreading RNA motifs and we suggest they may be reservoirs for use in formation of new lncRNA and protein genes.
In conclusion, this study defines a genomic region that is proposed to function independently of encoding protein or RNA genes, i.e., sections reserved for biased mutations and the storage of RNA motifs.
Genomic sequence searches
To find DNA nt sequences corresponding to the 10,000 bp human sequence (Homo sapiens chromosome 22, GRCh38 Primary Assembly) and chimpanzee homologous sequence (Pan troglodytes chromosome 22, Pan_troglodytes-2.1.4), the NCBI-Blast program was used (http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome).
The human and chimp blast pages were employed.
Alignment of global DNA sequences was by one of three methods: 1. EMBL Clustal W2 (http://www.ebi.ac.uk/Tools/msa/clustalw2/) ; 2. ExPaSy LALIGN (http://embnet.vital-it.ch/software/LALIGN_form.html) ; and 3. Emboss Needle: (http://www.ebi.ac.uk/Tools/psa/emboss_needle/nucleotide.htm)
EMBOSS: the European Molecular Biology Open Software Suite . The default settings were used in each case. All three programs gave a similar overall alignment between human and chimpanzee 10,000 bp sequences, however, it was not possible to obtain a consistent alignment within Variable Regions with the exception of some small segments.
Alignment of two or more sequences with reverse complement identities was by NCBI-Blast Align Sequences Nucleotide BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi?AGE_TYPE=BlastSearch&BLAST_SPEC=blast2seq&LINK_LOC=align2seq ). The NCBI Blast program was used to find exon and intron homologs to those found in the translocation breakpoint flanking sequence and 10,000 bp unit.
Search for lncRNAs in chromosome 22 22q11.2 region was with three web servers: 1. Vega/Sanger website (http://vega.sanger.ac.uk/Homo_sapiens/Location/Chromosome?r=22), 2. Ensembl/Wellcome Trust/Sanger Institute/European Bioinformatics Institute: (http://useast.ensembl.org/Homo_sapiens/Location/View?db=core;g=ENSG00000197421;r=22:18782111–18812514;t=ENST00000430306), and 3. UCSC Genome Broswer: (http://genome-euro.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr22%3A18890337-18900336&hgsid=198277878_3neHZpMdbjv5a37IhFXAN7DDTgGb).
Generated random sequences
Random sequences were generated by Molbiol.ru and adding the nucleotide length and % GC-content: (http://molbiol.ru/eng/scripts/01_16.html) Reverse complement sequences were by using the Sequence Manipulation Suite (http://www.bioinformatics.org/sms2/rev_comp.html).
DNA secondary structure modeling
Folding of DNA sequences for secondary structure features was with the mFold Web Server: (http://www.bioinformatics.org/sms2/rev_comp.html). Standard conditions (default setting) of folding temperature, ionic conditions and constraint values as were employed [36, 37]. Structures 1 which display the lowest delta G values were chosen.
Alu SINE and LINE-1 searches
RepeatMasker [http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker] was used to determine presence of Non-LTR retrotransposable elements Alu and LINE-1 in the 10,000 bp region of 22q.11.2.
p-values for the determined copy number of TATAATATA units in a chromosomal segment vs. the copy numbers of random generated sequences was determined by a conservative nonparametric test, Wilcoxon’s signed rank test .
I am grateful to Dr. Laurens Wilming of the Welcome Trust/Sanger Institute Havana manual annotation group for current information and clarifications of Vega annotated lncRNA genes, Dr. Ian Dunham of the Birney Research Group at EMBL-EBI for background information on DGCR6L and to Dr. Rory Johnson and Juna Carlevaro-Fita of Centre for Genomic Regulation Barcelona, Spain for searching the 10,000 bp unit for lncRNA transcripts. Dr. Beverly Emanuel, Perelman School of Medicine, University of Pennsylvania read the preliminary manuscript and provided criticism and suggestions. Dr. Jie Yang, Department of Family, Population and Preventive Medicine, Biostatistical Consulting Core, School of Medicine, Stony Brook University kindly provided p-values.
Partial funding provided by the Department of Molecular Genetics and Microbiology, School of Medicine, Stony Brook University, Stony Brook, NY.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Antshel KM, Kates WR, Roizen N, Fremont W, Shprintzen RJ. 22q11.2 deletion syndrome: genetics, neuroanatomy and cognitive/behavioral features keywords. Child Neuropsychol. 2005;11(1):5–19.View ArticlePubMedGoogle Scholar
- Ryan AK, Goodship JA, Wilson DI, Philip N, Levy A, Seidel H, et al. Spectrum of clinical features associated with interstitial chromosome 22q11 deletions: a European collaborative study. J Med Genet. 1997;34(10):798–804.PubMed CentralView ArticlePubMedGoogle Scholar
- Swillen A, Vogels A, Devriendt K, Fryns JP. Chromosome 22q11 deletion syndrome: update and review of the clinical features, cognitive-behavioral spectrum, and psychiatric complications. Am J Med Genet. 2000;97(2):128–35.View ArticlePubMedGoogle Scholar
- Fine SE, Weissman A, Gerdes M, Pinto-Martin J, Zackai EH, McDonald-McGinn DM, et al. Autism spectrum disorders and symptoms in children with molecularly confirmed 22q11.2 deletion syndrome. J Autism Dev Disord. 2005;35(4):461–70.PubMed CentralView ArticlePubMedGoogle Scholar
- Shaikh TH, Kurahashi H, Saitta SC, O’Hare AM, Hu P, Roe BA, et al. Chromosome 22-specific low copy repeats and the 22q11.2 deletion syndrome: genomic organization and deletion endpoint analysis. Hum Mol Genet. 2000;9(4):489–501.View ArticlePubMedGoogle Scholar
- Spiteri E, Babcock M, Kashork CD, Wakui K, Gogineni S, Lewis DA, et al. Frequent translocations occur between low copy repeats on chromosome 22q11.2 (LCR22s) and telomeric bands of partner chromosomes. Hum Mol Genet. 2003;12(15):1823–37.View ArticlePubMedGoogle Scholar
- Kurahashi H, Inagaki H, Yamada K, Ohye T, Taniguchi M, Emanuel BS, et al. Cruciform DNA structure underlies the etiology for palindrome-mediated human chromosomal translocations. J Biol Chem. 2004;279(34):35377–83.PubMed CentralView ArticlePubMedGoogle Scholar
- Babcock M, Yatsenko S, Hopkins J, Brenton M, Cao Q, de Jong P, et al. Hominoid lineage specific amplification of low-copy repeats on 22q11.2 (LCR22s) associated with velo-cardio-facial/digeorge syndrome. Hum Mol Genet. 2007;16(21):2560–71.View ArticlePubMedGoogle Scholar
- Shaikh TH, O’Connor RJ, Pierpont ME, McGrath J, Hacker AM, Nimmakayalu M, et al. Low copy repeats mediate distal chromosome 22q11.2 deletions: sequence analysis predicts breakpoint mechanisms. Genome Res. 2007;17(4):482–91.PubMed CentralView ArticlePubMedGoogle Scholar
- Kato T, Kurahashi H, Emanuel BS. Chromosomal translocations and palindromic AT-rich repeats. Curr Opin Genet Dev. 2012;22(3):221–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Inagaki H, Ohye T, Kogo H, Tsutsumi M, Kato T, Tong M, et al. Two sequential cleavage reactions on cruciform DNA structures cause palindrome-mediated chromosomal translocations. Nat Commun. 2013;4:1592. doi:10.1038/ncomms2595.View ArticlePubMedGoogle Scholar
- Kurahashi H, Inagaki H, Hosoba E, Kato T, Ohye T, Kogo H, et al. Molecular cloning of a translocation breakpoint hotspot in 22q11. Genome Res. 2007;17(4):461–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Edelmann L, Stankiewicz P, Spiteri E, Pandita RK, Shaffer L, Lupski JR, et al. Two functional copies of the DGCR6 gene are present on human chromosome 22q11 due to a duplication of an ancestral locus. Genome Res. 2001;11(2):208–17.PubMed CentralView ArticlePubMedGoogle Scholar
- Higgins DG, Thompson JD, Gibson TJ. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 1996;266:383–402.View ArticlePubMedGoogle Scholar
- Huang X, Miller W. A time-efficient linear-space local similarity algorithm. Adv Appl Math. 1991;12:337–57.View ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000;16(6):276–7.View ArticlePubMedGoogle Scholar
- Kogo H, Inagaki H, Ohye T, Kato T, Emanuel BS, Kurahashi H. Cruciform extrusion propensity of human translocation-mediating palindromic AT-rich repeats. Nucleic Acids Res. 2007;35(4):1198–208.PubMed CentralView ArticlePubMedGoogle Scholar
- Tong M, Kato T, Yamada K, Inagaki H, Kogo H, Ohye T, et al. Polymorphisms of the 22q11.2 breakpoint region influence the frequency of de novo constitutional t(11;22)s in sperm. Hum Mol Genet. 2010;19(13):2630–7.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhabinskaya D, Benham CJ. Competitive superhelical transitions involving cruciform extrusion. Nucleic Acids Res. 2013;41(21):9610–21.PubMed CentralView ArticlePubMedGoogle Scholar
- Kato T, Inagaki H, Tong M, Kogo H, Ohye T, Yamada K, et al. DNA secondary structure is influenced by genetic variation and alters susceptibility to de novo translocation. Mol Cytogenet. 2011;4:18. doi:10.1186/1755-8166-4-18.PubMed CentralView ArticlePubMedGoogle Scholar
- Gotter AL, Nimmakayalu MA, Jalali GR, Hacker AM, Vorstman J, Conforto Duffy D, et al. A palindrome-driven complex rearrangement of 22q11.2 and 8q24.1 elucidated using novel technologies. Genome Res. 2007;17(4):470–81.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilcoxon F. Individual comparisons by ranking Individual comparisons by ranking methods. Biometrics Bulletin. 1945;1(6):80–3.View ArticleGoogle Scholar
- Thomas G, Jacobs KB, Kraft P, Yeager M, Wacholder S, Cox DG, et al. A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat Genet. 2009;41(5):579–84.PubMed CentralView ArticlePubMedGoogle Scholar
- Babcock M, Pavlicek A, Spiteri E, Kashork CD, Ioshikhes I, Shaffer LG, et al. Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution. Genome Res. 2003;13(12):2519–32.PubMed CentralView ArticlePubMedGoogle Scholar
- Babcock M, Yatsenko S, Stankiewicz P, Lupski JR, Morrow BE. AT-rich repeats associated with chromosome 22q11.2 rearrangement disorders shape human genome architecture on Yq12. Genome Res. 2007;17(4):451–60.PubMed CentralView ArticlePubMedGoogle Scholar
- Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012;22(9):1775–89.PubMed CentralView ArticlePubMedGoogle Scholar
- Kellis M, Wold B, Snyder MP, Bernstein BE, Kundaje A, Marinov K, et al. Defining functional DNA elements in the human genome. Proc Natl Acad Sci U S A. 2014;111(17):6131–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Adler J, Tso WW. “Decision”-making in bacteria: chemotactic response of Escherichia coli to conflicting stimuli. Science. 1974;184(4143):1292–4.View ArticlePubMedGoogle Scholar
- Macnab RM, Koshland Jr DE. The gradient-sensing mechanism in bacterial chemotaxis. Proc Nat Acad Sci USA. 1972;69(9):2509–12.PubMed CentralView ArticlePubMedGoogle Scholar
- Angelani L, Di Leonardo R. Geometrically biased random walks in bacteria-driven micro-shuttles. New Journal of Physics. 2010;12:113017.View ArticleGoogle Scholar
- Cairns J, Overbaugh J, Miller S. The origin of mutants. Nature. 1988;335(6186):142–5.View ArticlePubMedGoogle Scholar
- Martincorena I, Seshasayee AS, Luscombe NM. Evidence of non-random mutation rates suggests an evolutionary risk management strategy. Nature. 2012;485(7396):95–8.View ArticlePubMedGoogle Scholar
- Martincorena I, Luscombe NM. Non-random mutation: the evolution of targeted hypermutation and hypomutation. Bioessays. 2013;35(2):123–30.View ArticlePubMedGoogle Scholar
- Richard GF, Kerrest A, Dujon B. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol Mol Biol Rev. 2008;72(4):686–727.PubMed CentralView ArticlePubMedGoogle Scholar
- Jacob F. Evolution and tinkering. Science. 1977;196(4295):1161–6.View ArticlePubMedGoogle Scholar
- Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31(13):3406–15.PubMed CentralView ArticlePubMedGoogle Scholar
- SantaLucia Jr J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA. 1998;95(4):1460–5.PubMed CentralView ArticlePubMedGoogle Scholar