Characterization of the past and current duplication activities in the human 22q11.2 region
© Guo et al; licensee BioMed Central Ltd. 2011
Received: 15 October 2010
Accepted: 26 January 2011
Published: 26 January 2011
Segmental duplications (SDs) on 22q11.2 (LCR22), serve as substrates for meiotic non-allelic homologous recombination (NAHR) events resulting in several clinically significant genomic disorders.
To understand the duplication activity leading to the complicated SD structure of this region, we have applied the A-Bruijn graph algorithm to decompose the 22q11.2 SDs to 523 fundamental duplication sequences, termed subunits. Cross-species syntenic analysis of primate genomes demonstrates that many of these LCR22 subunits emerged very recently, especially those implicated in human genomic disorders. Some subunits have expanded more actively than others, and young Alu SINEs, are associated much more frequently with duplicated sequences that have undergone active expansion, confirming their role in mediating recombination events. Many copy number variations (CNVs) exist on 22q11.2, some flanked by SDs. Interestingly, two chromosome breakpoints for 13 CNVs (mean length 65 kb) are located in paralogous subunits, providing direct evidence that SD subunits could contribute to CNV formation. Sequence analysis of PACs or BACs identified extra CNVs, specifically, 10 insertions and 18 deletions within 22q11.2; four were more than 10 kb in size and most contained young AluY s at their breakpoints.
Our study indicates that AluY s are implicated in the past and current duplication events, and moreover suggests that DNA rearrangements in 22q11.2 genomic disorders perhaps do not occur randomly but involve both actively expanded duplication subunits and Alu elements.
Segmental duplications (SDs) or low copy repeats (LCRs), defined as continuous non-repetitive DNA sequences that are found at two or more genomic locations, comprise ~5% of the human genome [1–3]. SDs can mediate meiotic unequal non-allelic homologous recombination (NAHR) events, resulting in genomic rearrangement and sometimes altered gene dosage within the intervening regions such as those on 22q11.2. Segmental duplications or low copy repeats on 22q11.2 (often referred to as LCR22s) [4, 5] are of great interest because they have been associated with four different human disorders : velo-cardio-facial/DiGeorge syndrome (VCFS/DGS) (MIM #192349  or MIM #188400 [8, 9]), the reciprocal duplication syndrome [5, 10], der(22) syndrome , and cat-eye syndrome . Despite extensive molecular studies in the past decade, the precise position of the breakpoints within the two LCR22s associated with most of these syndromes, LCR22-2 and -4, remain largely elusive. This is due to their high sequence similarity (97%-99%) and that there are eight related LCR22 blocks on 22q11.2, comprising over 11% of the region, making it difficult to identify paralogous sequences unique to one or the others [5, 13, 14]. Interestingly, characterization of the genomic sequence within and near LCR22s demonstrated that both Alu repeat elements and AT-rich repeats were enriched and likely involved in many of the past unequal crossover duplications that have shuffled DNAs among blocks and given rise to the current complex genomic architecture of LCR22s . This is consistent with findings from genome-wide analyses of human SDs [15, 16].
Segmental duplications on 22q11.2 and their gene content
Segmental duplications (SDs) or low copy repeats directly contribute to the genome dynamics and meiotic instability of human 22q11.2. In this study, we began by surveying the content and extent of duplications within the 22q11.2 region using SDs with sequence identity ≥ 90% and length ≥ 1 kb that have been annotated by Dr. Eichler's group through Whole Genome Assembly Comparison (WGAC) and Whole Genome Shotgun Sequences Detection (WSSD) [1, 2]. In total, 202 pairs of SD sequences (often termed duplicons) were located on 22q11.2 (chr22:17,000,000-24,000,000, NCBI36/hg18), accounting for ~1.8 Mb (26%) of its bases. The rest of the DNA sequence between SDs was referred here as "unique sequences" (the corresponding locations thereby as "unique regions" as they were not explicitly involved in duplications within this region). The sequence divergence of the SDs on 22q11.2 was relatively low, with a median of 3.6%, and with 58% of them diverging < 4% (Additional file 1, Figure S1). By comparison, the average sequence divergence for all human SDs was 6% while 25% of them diverged < 4%. The average length of SDs in 22q11.2 was 13 kb (ranging from 1 to 162 kb), while a significant negative correlation between length and divergence was observed for these SDs (r = -0.46, p < 6 × 10-12; Additional file 1, Figure S1), suggesting that either many old duplicated sequences have experienced significant nucleotide loss after their emergence or recent duplication events produced mainly large SDs.
Decomposition of 22q11.2 SDs to duplication subunits and blocks
Summary of LCR22 blocks and their subunits
Number of Duplicated Subunits
Mean Size (deviation) of Subunits
Subunits present only in Human (%)
Subunits in Chimp (%)
Subunits in Orangutan (%)
Subunits in Macaque (%)
As shown in Figure 2, the eight LCR22' blocks are separated and flanked by unique (i.e., non-SD) sequences. The clustering of subunits in these blocks suggests that some sequence features may have rendered them as targets of duplication "hotspots". The shared similarity at macro-scale in their subunit arrangements among LCR22-2', LCR22-3a' and LCR22-4' confirms that these three blocks may arise from few instances of large-scale duplication events involving many adjacent subunits simultaneously [26, 27]. Note that part of LCR22-3a' remains a missing gap in the reference human genome sequence. In contrast, LCR22-3b', LCR22-5', LCR22-6', LCR22-7', and LCR22-8' display a different architectural pattern consisting mostly of small subunits connected in a discrete manner. For example, the average size of subunits in LCR22-4' is 5,420 bp, which is approximately twice as large as that of LCR22-5' (2,593 bp) and four times larger than that of LCR22-7' (1,171 bp). Analysis of the relative abundance of large subunits, defined by either >5 kb, >10 kb or >20 kb (data not shown), also showed that large subunits were predominantly located in LCR22-2', LCR22-3a' and LCR22-4'. Since large SDs overall show low sequence divergence as described above (Additional file 1, Figure S1), this result further suggests that the bulk DNAs constituting these three blocks was likely generated more recently during evolution than other LCR22 blocks. The sporadic presence of a few small subunits specifically in each of the three "young" blocks indicates that new micro-scale duplications have occurred after their initial formation.
Occurrence of LCR22s in other primate genomes
As mentioned above, the majority of SDs on 22q11.2 shows < 4% sequence divergence. Accordingly, we estimated that the majority (>58%) of duplicated sequences emerged between 10-20 million years ago and after the divergence of human and macaque lineages, based on an estimation of 3% divergence between duplicated sequences per 10 million years . To further explore the evolutionary history of these SDs, we have surveyed and characterized the syntenic regions of human LCR22' sequences in the chimpanzee, orangutan, and macaque genomes.
This is also supported firstly by FISH mapping experiments using probes to four well-characterized genes in the LCR22 region that detected signals for the presence of LCR22-6, LCR22-7, LCR22-8 sequences in chimpanzee, orangutan and macaque . Secondly, phylogenetic analysis of BCR, GGT, GGTLA and USP18 genes or pseudogenes in human LCR22s also indicated LCR22-2, LCR22-3a and LCR22-4 were evolutionarily close but they were distant from the other LCR22 blocks . Thirdly, previous comparative analysis of SDs in four primate genomes indicated in particular that those located in the distal halves of LCR22-2 and LCR22-4, were generated more recently (Figure 3B). This important finding is probably relevant to the fact that most of the genomic disorders on 22q11.2 are mapped to LCR22-2, LCR22-3a and LCR22-4 regions, although ascertainment due to deletion of critical dosage sensitive genes associated with known syndromes provides significant bias.
Re-construction of large duplication events
One of the main goals of our study is to identify and characterize subunits that are implicated in frequent duplications, as such subunits may host recombination "hotspots" of genomic disorders in 22q11.2. We started by grouping subunits into paralogous families (see hypothetical example in Figure 1) to capture their putative intra-LCR22 duplication relationships. A total of 122 subunit families were assembled from the 523 subunits (Additional file 2, Table S1). The sizes of these families range from 2 to 16 subunits, with one third of these families having fewer than six members.
As shown with cartoons in Figure 1, both SD duplicons and subunits need to be appropriately merged and aligned in order to identify past duplication events from multiple candidates correctly. As such, we first merged physically overlapping SD duplicons to obtain duplication loci [see Methods for details, a duplication locus here was defined as a genomic region containing one or more (overlapping) duplicons not disrupted by unique sequence]. We determined a total of 147 duplication loci from the 22q11.2 SDs, and they were further separated to 33 groups based on sharing of paralogous subunits (Figure 2B). Finally, aligning all duplication loci to the largest locus of their respective group, as illustrated in Figure 1D, yielded a hierarchical structure representing putative duplication relationship of all SDs in 22q11.2 (Figure 2B). At the top first level, 33 distinct duplication loci were identified (top row in Figure 2B) and they accounted for 47% of all SD sequences in 22q11.2, suggesting that the remaining 53% SD sequences might have arisen from these 33 loci.
One feature emerging from the data in Figure 2B is that some subunit families are frequently located at the ends (i.e., breakpoints) of putative duplications, suggesting that they might have been highly active in mediating past duplication events. For example, at least six duplication events might have been mediated by subunits in the family that included a member at the 5' end of LCR22-2' (most left arrow in Figure 2B). Somewhat surprisingly, analysis of such subunit families significantly enriched at the boundaries of duplications (arrows in Figure 2B) revealed that 46% of the subunits implicated in frequent past duplication events harbored or were adjacent to Alu elements.
As shown in Figure 2B, most of the putative duplications in 22q11.2 involved relatively small (<10 Kb) duplicons and thus were not further pursued due to the limitation of our approach in resolving the donor and acceptor of a duplication event, but at least three duplication events operated on large duplicons (boxes in Figure 2B). The largest one (involving 40 subunits and containing 162 kb sequence) occurred between LCR22-2' and LCR22-4' (first box in Figure 2B). The other two large-scale duplications involved the largest subunit (~64 kb, second box in Figure 2); one occurred between LCR22-3a' and LCR22-4', and the other at the distal half of LCR22-4' (Figure 3C).
Further syntenic analysis (described above) showed that the duplication between LCR22-3a' and LCR22-4' occurred after the split of the macaque lineage, while the duplication at LCR22-4' might have occurred earlier (Figure 3A, C), although a previous complementary analysis comparing SDs in four primate genomes  indicated that both duplications occurred in the ancestral lineage of human and chimpanzee (Figure 3B). Future experiments, such as FISH mapping, are needed to resolve these two different findings. Interestingly, it appears that two independent duplications had inserted two subunits (blue and green subunits in Figure 3C) in front of the distal cyan subunit of LCR22-4' before the resulting sequence was then duplicated to LCR22-3a'. Alternatively, some subunits in LCR22-3a' may have originated from the proximal part of LCR22-4'. This uncertainty could not be resolved from sequence similarity, as the pair-wise sequence identity from the duplication events marked by the blue and red arrow is 99.5% and 99.6%, respectively, highlighting the challenge in reconstructing past duplication events accurately.
Repeat elements and duplications on 22q11.2
To search for potential sequence features mediating active duplications, we characterized the short sequences immediately adjacent (±10 bp) to duplicated subunits in 22q11.2. First, we calculated the abundances of different sequence features, e.g., Alu, LINE-1, gene, and pseudogene, and correlated them with the "duplication activity" of individual subunit families, measured by the number of blocks a family resided. The result showed that Alu/SINE elements were associated much more frequently with subunit families that have undergone active expansion (Figure 4B), suggesting Alu-mediated duplications might be responsible for most of the inter-block duplications. Interestingly, direct survey of the adjacent sequences of all 22q11.2 SD subunits also found that Alu was the most prevalent repeat, present in 32% of the 523 subunits. Of all the 339 Alu repeats next to the subunits, 30% of them were AluSx and 28% were AluY, followed by AluJo (11%), AluSq (8%), and AluSg (6%). A random simulation indicated that the associations of subunits with each of these different Alu types were significantly more than expected (p < 0.001). These findings are consistent with previous reports that Alu-mediated recombination events actively shuffled genes within LCR22 blocks and that young Alu elements (AluY and AluS) were frequently enriched at the end of SDs [15, 30].
CNVs flanked by paralogous subunits
Total of 13 previously detected CNVs (from the Database of Genomic Variants) with breakpoints located to the paralogous subunits (see Figure 7A)
Subunit family ID
Paired End Mapping
CNVs flanked by Alu elements
A total of 28 CNVs derived from alignment of clones to 22q11.2 (see Figure 7B, C for illustration)
IDs of the Corresponding Clones
Breakpoint Feature (one for gain, but left-right for loss)
Subunit/GSTTP (22673143-22673594, 22727394-22727845)
Segmental duplications on 22q11.2 are some of the most complicated SDs in the human genome but the value in understanding their structure and variation among humans is because they form substrates for NAHR events that can lead to gene dosage imbalance and genomic disorders. Such major disorders include VCFS/DGS, the reciprocal duplication in the same interval, and cat-eye syndrome, a partial tetrasomy [5–12]. Patients with VCFS/DGS and separately, the duplication syndrome have been ascertained for physical malformations including craniofacial and cardiac defects, but also due to cognitive or neurobehavioral disorders such as learning disabilities or schizophrenia (reviewed in: [33, 34]).
To understand structure features in 22q11.2 and mechanisms by which SDs on 22q11.2 confer genetic variation and CNV formation, we have decomposed them into fundamental duplication subunits to investigate sequence shuffling within LCR22s, past duplication events, and current duplications (i.e., CNVs). Our key finding is that some subunits are highly active in duplications and AluY, a young repeat emerging very recently during primate evolution , is significantly associated with them, suggesting subunits next to AluY or AluY itself may be responsible for historic and current genomic rearrangements in SDs on 22q11.2. We also found that LCR22-2', LCR22-3a', and LCR22-4', the three young blocks implicated most frequently in genomic disorders, contained duplicated sequences emerging more recently than the other SDs on 22q11.2. The high sequence identity among these three blocks may explain why most pathogenic deletions are mapped to these three blocks. Alternatively, phenotypes arising from genes whose function is sensitive to altered copy number will incur an ascertainment bias of these deletions.
Overall, SDs on 22q11.2 share many similar and known features with other SDs in the human genome in addition to Alu enrichment discussed above. The cluster of SDs in eight blocks is a testament of "preferential attachment"  or "duplication shadowing", meaning that unique regions next to SDs are ~10 times more likely to be duplicated than random regions . Our syntenic analysis also suggests that most (~70%) 22q11.2 SDs are shared between human and chimpanzee but only a small proportion of them is shared between human and macaque, a finding that is entirely consistent with previous proposition that SD activity increases after the divergence of African great apes (chimpanzee, gorilla and human) from the Asian great ape (orangutan) [29, 36]. Our current study used SDs in the human chromosome 22q11.2 as reference, and thus sequence amplifications occurring specifically in the chimpanzee, gorilla and macaque genomes were not analyzed, but previous studies have observed amplifications specific to those lineages [24, 29].
We found that Alu elements, especially young AluY, were enriched in the immediate adjacent regions of frequently duplicated sequences (subunits, duplication loci, and CNVs). Our results thus extend previous findings that have shown the presence of Alu elements at the endpoints of SDs at a higher frequency than expected by chance (24% vs 10%) [15, 16] and more specifically a three-fold enrichment of Alu in the junctions of LCR16 . In addition, we have previously shown that both simple and complex Alu-mediated duplications stimulated by crossovers at the ends of Alu elements may have contributed to the formation of unprocessed pseudogenes from the four LCR22 genes [14, 26]. In Figure 7, we show that most of the breakpoints are at the ends, while a subset in the middle of Alu elements, suggesting that homology based alignment is essential for CNV and SD formation, but likely there two distinct molecular mechanisms responsible, L1 endonuclease-mediated retrotransposition and NAHR events .
Taken together, these results suggest that de novo and disease-implicated recombination events between LCR22-2 and LCR22-4 may not occur randomly but more frequently at Alu-embedding subsequences, an interesting hypothesis deserving of further investigation in the future. In this regard, it is important to mention that common breakpoints found in the rearrangement of distal LCR22 blocks in two patients  is located to one of the highly active subunits identified in this study (the subunit in red color at the end of the first duplication group in Figure 2B), supporting that our map of duplications could be useful for studying human genomic disorders. Some interesting directions to further explore our findings are, (i) Alu sequences may be preferential sites of double strand breakage after homology based alignment , and (ii) chromatin modification in the vicinity of Alu sequences may make a region prone for duplications as local chromatin structure (e.g., accessibility) is an important factor influencing DNA duplication and its subsequent evolution [19, 39]. Along the same line, we should note here that AluY insertion sites have been reported to show elevated recombination rates . Furthermore, we found that the recently discovered recombination hotspot motif (CCNCCNTNNCCNC)  was significantly enriched at the breakpoints of both SD subunits (1.6-fold enrichment, p= 0.026) and CNVs (2.3-fold enrichment, p = 0.016; using data derived from our BAC mapping analysis) in 22q11.2.
In this report we describe our efforts of interpreting large duplication events and employing cross-genome comparison to narrow down the potential evolutionary periods. Our work provides some important insights, but also highlights the complexity and challenge ahead. It is difficult to identify and time individual duplication events that have left the mosaic genome architecture of LCR22s as shown in Figure 2 and 3. During our study, we explored other complementary approaches but without significant success. For example, we examined phylogenetic relationship of orthologous and paralogous subunits using both human and syntenic sequences from other primates, but the resulting phylogenetic trees were often difficult to interpret or were only able to help resolve the precise emerging times of a limited number of duplication subunits, suggesting extensive gene conversion may have occurred among paralogous subunits. It is clear that, to achieve more from cross-species comparison, we would need more great Ape genomes to be sampled more densely at finer scale of evolutionary time. Otherwise, it is like trying to re-construct primate evolution with too many missing fossils. Additionally, we believe that low-coverage (e.g., 1-4x) sequencing will provide limited help. As an example, the syntenic regions of human LCR22-2' and LCR22-4' in the reference chimpanzee genome contain some large gaps so that we were unable to extract critical evolutionary information for certain important duplication events in these two blocks - their synteny was considered ambiguous in our analysis. Our study suggests that special care must be taken for comparing duplicated regions across genomes to resolve the ambiguity between overlapping alleles and duplicated paralogs with high level of sequence identity (>98%). In order to employ comparative genomics to study the molecular mechanism of genome disorders involving in complex duplicated regions, we propose that an alternative and probably more effective strategy is to establish a good reference of common CNVs for those regions. In the case of LCR22s, this will mean specifically targeting LCR22s for deep sequencing with a large number of human samples. One critical challenge is how to distinguish duplications (i.e., CNVs) with high level of sequence identity (>98%) from allelic overlapping when sequence reads are too short to be aligned uniquely or assembled correctly.
Our detailed analysis of the human 22q11.1 region showed that many of its duplicated sequences emerged recently through both small and large-scale duplications. We also found a great number of copy number variations in 22q11.2 and some of them may be generated by DNA recombination mediated by paralogous subunits or young SINE, AluY. Our results suggest that genomic rearrangements in 22q11.2 do not occur randomly and active duplicated subunits, subunits adjacent to Alus, and AluY elements all play a role in making some sequences better substrates for recombination.
Segmental duplications, subunit identification and classification
A total of 202 pairs of segmental duplications from human chr22:17,000, 000-24, 000,000 (hg18) were obtained from the segmental duplication track in the UCSC genome browser (http://genome.ucsc.edu). SDs involving sequences outside of 22q11.2 were not included, as our goal was to study the intra-LCR22 duplications implicated in human disorders. The pair-wise alignment information of these SDs was provided to the program RepeatGluer  for decomposing the 404 SDs into 523 non-overlapping duplication subunits. This approach is motivated by previous study [21, 23]; a comparison of our subunits with those defined previously by Jiang et al with the same algorithm found that 69 subunits were only defined by us although the breakpoints for 84% of the common subunits differed <200 bp in the two definition (see Additional file 4, Figure S3 for details). This data suggest that inclusion of SDs outside 22q11.2 could have some impacts on our results.
Individual subunit sequences were classified to 122 families based on their sequence homology (>90% identity) (Figure 1) and segregated to eight duplication blocks based on their physical distance. In the latter analysis, we took the previous definition of eight LCR22 blocks as a guideline and assigned adjacent subunits (<500 kb) to the same block. The selection of 500 kb was to include as many SDs in the eight blocks as possible and to use one consistent threshold for all blocks. As a result, some non-SD sequences embedded in SDs were included in LCR22-5', -6', and 7'.
Constructing hierarchical map of putative duplication events
As shown in Figure 1, SD pair-wise alignment data are a good summary of the paralogous relationship for a pair or a group of SDs but they do not directly reveal the underlying historical duplication events. Our overall strategy for re-constructing past intra-LCR22 duplication events was to first merge overlapping duplicons to form individual duplication locus, and then group duplication loci based on SD pair-wise relationship, and then project all duplication loci to the largest duplication locus on a group based on sequence alignment (Figure 1C). The resulting alignment map illustrated the hierarchical order of putative duplications (Figure 2), as the donor and acceptor of a duplication event would have the same subunits arranged in the identical order, unless disruption had occurred. With the caveat that sometimes duplications could produce a merged sequence and uncertainty of donor assignment grew for shorter SDs, we have only interpreted this map for duplications involving long sequence and multiple subunits, i.e., at the top of the hierarchy.
Syntenic analysis of SDs on 22q11.2
Multiple genome alignment data from the Ensembl site (http://www.ensembl.org/info/docs/api/compara/index.html) was used to search syntenic sequences for human 22q11.2 SD sequences. More specifically, the pair-wise alignment data was generated by the program Blastz-net  between human and each of other three primate species: chimp, orangutan and macaque. We first analyzed the synteny of unique sequences in human 22q11.2 and utilized the result to establish a global syntenic reference map. Then, the occurrence of aligned sequence in the expected syntenic location on this map was considered as evidence that a human duplication subunit (or entire SD) had a syntenic partner in a subject species. To account for the confounding factors in sequencing and assembling duplicated sequences, we also considered syntenic sequence present if the syntenic location of a human sequence was a stretch of "N" nucleotides and homologous sequence was located in unassembled contigs (e.g., chr_random). Furthermore, we utilized the syntenic locations of two unique sequences (i.e., landmarks) bracketing each human duplicated sequence to help identify missing syntenic relationship; the synteny of a human sequence was considered absent if no aligned sequence was found in the expected syntenic location and the distance of the two adjacent landmarks in the target genome was 2x shorter than that in the human 22q11.2. All these measurements certainly cannot account for the draft nature of the non-human genomes fully, so some degree of uncertainty is expected from our syntenic analysis. On the other hand, we found supportive evidence for our syntenic results from all 15 chimp BACs available in GenBank and mapped onto LCR22. Specifically, 9 of these 15 chimp BACs were in the regions where 22 subunits were found missing in chimp; all of these 22 subunits were confirmed to be absent from their respective BAC sequences.
We also carried out similar syntenic analysis using either in-house constructed global alignments with BLASTZ  or cross-species liftOver data from the UCSC browser and obtained similar result, and thus was not discussed here. The comparative data of segmental duplications in human, chimp, orangutan and macaque based on Whole Genome Shotgun Sequences Detection was obtained from a previous study .
Analysis of copy number variations in LCR22s
Previously annotated CNVs were collected from three sources (the Database of Genomic Variants (http://projects.tcag.ca/variation/, downloaded on August 2009); [31, 43]). Also, 191 BACs or PACs were downloaded from the NCBI GenBank and aligned to the human reference genome using the FASTA software package . Gaps of > 200 bp in the alignments were defined as CNVs and annotated as insertions or deletions with respect to the reference genome.
Analysis of repetitive elements
In the search of sequence features associated with either subunits or CNVs, we annotated the ± 10 bp sequences adjacent to breakpoints as suggested previously [15, 45]. Our annotation included micro-sequence homology detection and the presence of repetitive elements as defined by RepeatMasker. Here, micro-homology was defined as > 80% identity of 10-bp sequences. To assess the statistical significance of the Alu enrichment in subunits (or CNVs), we randomly put these subunits in the human genome and calculated the expected number of subunits with Alu elements. After repeating this procedure 1,000 times, we derived an empirical p-value for Alu enrichment. This method was also employed to assess the significance of gene and pseudogene enrichment in SD regions.
low copy repeats
copy number variation.
We would like to acknowledge many helpful discussions with Melanie Babcock. This work was supported by NIH MH083121 and startup funds from Albert Einstein College of Medicine.
- Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297 (5583): 1003-1007. 10.1126/science.1072047.PubMedView ArticleGoogle Scholar
- She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, Church DM, Sutton G, Halpern AL, Eichler EE: Shotgun sequence assembly and recent segmental duplications within the human genome. Nature. 2004, 431 (7011): 927-930. 10.1038/nature03062.PubMedView ArticleGoogle Scholar
- Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE: Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 2001, 11 (6): 1005-1017. 10.1101/gr.GR-1871R.PubMed CentralPubMedView ArticleGoogle Scholar
- Halford S, Wadey R, Roberts C, Daw SC, Whiting JA, O'Donnell H, Dunham I, Bentley D, Lindsay E, Baldini A, et al: Isolation of a putative transcriptional regulator from the region of 22q11 deleted in DiGeorge syndrome, Shprintzen syndrome and familial congenital heart disease. Hum Mol Genet. 1993, 2 (12): 2099-2107. 10.1093/hmg/2.12.2099.PubMedView ArticleGoogle Scholar
- Edelmann L, Pandita RK, Morrow BE: Low-copy repeats mediate the common 3-Mb deletion in patients with velo-cardio-facial syndrome. Am J Hum Genet. 1999, 64 (4): 1076-1086. 10.1086/302343.PubMed CentralPubMedView ArticleGoogle Scholar
- McDermid HE, Morrow BE: Genomic disorders on 22q11. Am J Hum Genet. 2002, 70 (5): 1077-1088. 10.1086/340363.PubMed CentralPubMedView ArticleGoogle Scholar
- Shprintzen RJ, Goldberg RB, Lewin ML, Sidoti EJ, Berkman MD, Argamaso RV, Young D: A new syndrome involving cleft palate, cardiac anomalies, typical facies, and learning disabilities: velo-cardio-facial syndrome. Cleft Palate J. 1978, 15 (1): 56-62.PubMedGoogle Scholar
- DiGeorge AM, Harley RD: The association of aniridia, Wilms's tumor, and genital abnormalities. Trans Am Ophthalmol Soc. 1965, 63: 64-69.PubMed CentralPubMedGoogle Scholar
- Hassed SJ, Hopcus-Niccum D, Zhang L, Li S, Mulvihill JJ: A new genomic duplication syndrome complementary to the velocardiofacial (22q11 deletion) syndrome. Clin Genet. 2004, 65 (5): 400-404. 10.1111/j.0009-9163.2004.0212.x.PubMedView ArticleGoogle Scholar
- Ensenauer RE, Adeyinka A, Flynn HC, Michels VV, Lindor NM, Dawson DB, Thorland EC, Lorentz CP, Goldstein JL, McDonald MT, et al: Microduplication 22q11.2, an emerging syndrome: clinical, cytogenetic, and molecular analysis of thirteen patients. Am J Hum Genet. 2003, 73 (5): 1027-1040. 10.1086/378818.PubMed CentralPubMedView ArticleGoogle Scholar
- Zackai EH, Emanuel BS: Site-specific reciprocal translocation, t(11;22) (q23;q11), in several unrelated families with 3:1 meiotic disjunction. Am J Med Genet. 1980, 7 (4): 507-521. 10.1002/ajmg.1320070412.PubMedView ArticleGoogle Scholar
- Knoll JH, Asamoah A, Pletcher BA, Wagstaff J: Interstitial duplication of proximal 22q: phenotypic overlap with cat eye syndrome. Am J Med Genet. 1995, 55 (2): 221-224. 10.1002/ajmg.1320550214.PubMedView ArticleGoogle Scholar
- Shaikh TH, Kurahashi H, Saitta SC, O'Hare AM, Hu P, Roe BA, Driscoll DA, McDonald-McGinn DM, Zackai EH, Budarf ML, et al: Chromosome 22-specific low copy repeats and the 22q11.2 deletion syndrome: genomic organization and deletion endpoint analysis. Hum Mol Genet. 2000, 9 (4): 489-501. 10.1093/hmg/9.4.489.PubMedView ArticleGoogle Scholar
- Babcock M, Pavlicek A, Spiteri E, Kashork CD, Ioshikhes I, Shaffer LG, Jurka J, Morrow BE: Shuffling of genes within low-copy repeats on 22q11 (LCR22) by Alu-mediated recombination events during evolution. Genome Res. 2003, 13 (12): 2519-2532. 10.1101/gr.1549503.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003, 73 (4): 823-834. 10.1086/378594.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey JA, Eichler EE: Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet. 2006, 7 (7): 552-564. 10.1038/nrg1895.PubMedView ArticleGoogle Scholar
- Ohno S: Evolution by gene duplication. 1970, London: George Allen and UnwinView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290 (5494): 1151-1155. 10.1126/science.290.5494.1151.PubMedView ArticleGoogle Scholar
- Zheng D: Asymmetric histone modifications between the original and derived loci of human segmental duplications. Genome Biol. 2008, 9 (7): R105-10.1186/gb-2008-9-7-r105.PubMed CentralPubMedView ArticleGoogle Scholar
- Harrow J, Denoeud F, Frankish A, Reymond A, Chen CK, Chrast J, Lagarde J, Gilbert JG, Storey R, Swarbreck D, et al: GENCODE: producing a reference annotation for ENCODE. Genome Biol. 2006, 7 (Suppl 1): S4 1-9. 10.1186/gb-2006-7-s1-s4.View ArticleGoogle Scholar
- Jiang Z, Tang H, Ventura M, Cardone MF, Marques-Bonet T, She X, Pevzner PA, Eichler EE: Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nat Genet. 2007, 39 (11): 1361-1368. 10.1038/ng.2007.9.PubMedView ArticleGoogle Scholar
- Pevzner PA, Tang H, Tesler G: De novo repeat classification and fragment assembly. Genome Res. 2004, 14 (9): 1786-1796. 10.1101/gr.2395204.PubMed CentralPubMedView ArticleGoogle Scholar
- Jiang Z, Hubley R, Smit A, Eichler EE: DupMasker: a tool for annotating primate segmental duplications. Genome Res. 2008, 18 (8): 1362-1368. 10.1101/gr.078477.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Babcock M, Yatsenko S, Hopkins J, Brenton M, Cao Q, de Jong P, Stankiewicz P, Lupski JR, Sikela JM, Morrow BE: Hominoid lineage specific amplification of low-copy repeats on 22q11.2 (LCR22s) associated with velo-cardio-facial/digeorge syndrome. Hum Mol Genet. 2007, 16 (21): 2560-2571. 10.1093/hmg/ddm197.PubMedView ArticleGoogle Scholar
- Edelmann L, Pandita RK, Spiteri E, Funke B, Goldberg R, Palanisamy N, Chaganti RS, Magenis E, Shprintzen RJ, Morrow BE: A common molecular basis for rearrangement disorders on chromosome 22q11. Hum Mol Genet. 1999, 8 (7): 1157-1167. 10.1093/hmg/8.7.1157.PubMedView ArticleGoogle Scholar
- Pavlicek A, House R, Gentles AJ, Jurka J, Morrow BE: Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome. Genome Res. 2005, 15 (11): 1487-1495. 10.1101/gr.4281205.PubMed CentralPubMedView ArticleGoogle Scholar
- Bailey JA, Yavor AM, Viggiano L, Misceo D, Horvath JE, Archidiacono N, Schwartz S, Rocchi M, Eichler EE: Human-specific duplication and mosaic transcripts: the recent paralogous structure of chromosome 22. Am J Hum Genet. 2002, 70 (1): 83-100. 10.1086/338458.PubMed CentralPubMedView ArticleGoogle Scholar
- Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L: Ensembl 2009. Nucleic Acids Res. 2009, D690-697. 10.1093/nar/gkn828. 37 Database
- Marques-Bonet T, Kidd JM, Ventura M, Graves TA, Cheng Z, Hillier LW, Jiang Z, Baker C, Malfavon-Borja R, Fulton LA, et al: A burst of segmental duplications in the genome of the African great ape ancestor. Nature. 2009, 457 (7231): 877-881. 10.1038/nature07744.PubMed CentralPubMedView ArticleGoogle Scholar
- Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV, Weichenrieder O, Devine SE: Active Alu retrotransposons in the human genome. Genome Res. 2008, 18 (12): 1875-1883. 10.1101/gr.081737.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, et al: Origins and functional impact of copy number variation in the human genome. Nature. 2009, 464 (7289): 704-712. 10.1038/nature08516.PubMed CentralPubMedView ArticleGoogle Scholar
- Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet. 2004, 36 (9): 949-951. 10.1038/ng1416.PubMedView ArticleGoogle Scholar
- Bassett AS, Scherer SW, Brzustowicz LM: Copy Number Variations in Schizophrenia: Critical Review and New Perspectives on Concepts of Genetics and Disease. Am J Psychiatry. 2010, 167 (8): 899-914. 10.1176/appi.ajp.2009.09071016.PubMed CentralPubMedView ArticleGoogle Scholar
- Karayiorgou M, Simon TJ, Gogos JA: 22q11.2 microdeletions: linking DNA structural variation to brain dysfunction and schizophrenia. Nat Rev Neurosci. 2010, 11 (6): 402-416. 10.1038/nrn2841.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, Grubert F, Chen X, Weissman S, Snyder M, Gerstein MB: Analysis of copy number variants and segmental duplications in the human genome: Evidence for a change in the process of formation in recent evolutionary history. Genome Res. 2008, 18 (12): 1865-1874. 10.1101/gr.081422.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Marques-Bonet T, Girirajan S, Eichler EE: The origins and impact of primate segmental duplications. Trends Genet. 2009, 25 (10): 443-454. 10.1016/j.tig.2009.08.002.PubMed CentralPubMedView ArticleGoogle Scholar
- Johnson ME, Cheng Z, Morrison VA, Scherer S, Ventura M, Gibbs RA, Green ED, Eichler EE: Recurrent duplication-driven transposition of DNA during hominoid evolution. Proc Natl Acad Sci USA. 2006, 103 (47): 17626-17631. 10.1073/pnas.0605426103.PubMed CentralPubMedView ArticleGoogle Scholar
- Shaikh TH, O'Connor RJ, Pierpont ME, McGrath J, Hacker AM, Nimmakayalu M, Geiger E, Emanuel BS, Saitta SC: Low copy repeats mediate distal chromosome 22q11.2 deletions: sequence analysis predicts breakpoint mechanisms. Genome Res. 2007, 17 (4): 482-491. 10.1101/gr.5986507.PubMed CentralPubMedView ArticleGoogle Scholar
- Zheng D: Gene duplication in the epigenomic era: Roles of chromatin modifications. epigenetics. 2008, 3 (5): 250-253. 10.4161/epi.3.5.6991.PubMedView ArticleGoogle Scholar
- Witherspoon DJ, Watkins WS, Zhang Y, Xing J, Tolpinrud WL, Hedges DJ, Batzer MA, Jorde LB: Alu repeats increase local recombination rates. BMC Genomics. 2009, 10: 530-10.1186/1471-2164-10-530.PubMed CentralPubMedView ArticleGoogle Scholar
- Myers S, Freeman C, Auton A, Donnelly P, McVean G: A common sequence motif associated with recombination hot spots and genome instability in humans. Nat Genet. 2008, 40 (9): 1124-1129. 10.1038/ng.213.PubMedView ArticleGoogle Scholar
- Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W: Human-mouse alignments with BLASTZ. Genome Res. 2003, 13 (1): 103-107. 10.1101/gr.809403.PubMed CentralPubMedView ArticleGoogle Scholar
- Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, Hansen N, Teague B, Alkan C, Antonacci F, et al: Mapping and sequencing of structural variation from eight human genomes. Nature. 2008, 453 (7191): 56-64. 10.1038/nature06862.PubMed CentralPubMedView ArticleGoogle Scholar
- Pearson WR: Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol. 2000, 132: 185-219.PubMedGoogle Scholar
- Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, Kim PM, Palejev D, Carriero NJ, Du L, et al: Paired-end mapping reveals extensive structural variation in the human genome. Science. 2007, 318 (5849): 420-426. 10.1126/science.1149504.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.