Association of microsatellite pairs with segmental duplications in insect genomes
© Behura and Severson; licensee BioMed Central Ltd. 2013
Received: 17 July 2013
Accepted: 16 December 2013
Published: 21 December 2013
Segmental duplications (SDs), also known as low-copy repeats, are DNA sequences of length greater than 1 kb which are duplicated with a high degree of sequence identity (greater than 90%) causing instability in genomes. SDs are generally found in the genome as mosaic forms of duplicated sequences which are generated by a two-step process: first, multiple duplicated sequences are aggregated at specific genomic regions, and then, these primary duplications undergo multiple secondary duplications. However, the mechanism of how duplicated sequences are aggregated in the first place is not well understood.
By analyzing the distribution of microsatellite sequences among twenty insect species in a genome-wide manner it was found that pairs of microsatellites along with the intervening sequences were duplicated multiple times in each genome. They were found as low copy repeats or segmental duplications when the duplicated loci were greater than 1 kb in length and had greater than 90% sequence similarity. By performing a sliding-window genomic analysis for number of paired microsatellites and number of segmental duplications, it was observed that regions rich in repetitive paired microsatellites tend to get richer in segmental duplication suggesting a “rich-gets-richer” mode of aggregation of the duplicated loci in specific regions of the genome. Results further show that the relationship between number of paired microsatellites and segmental duplications among the species is independent of the known phylogeny suggesting that association of microsatellites with segmental duplications may be a species-specific evolutionary process. It was also observed that the repetitive microsatellite pairs are associated with gene duplications but those sequences are rarely retained in the orthologous genes between species. Although some of the duplicated sequences with microsatellites as termini were found within transposable elements (TEs) of Drosophila, most of the duplications are found in the TE-free and gene-free regions of the genome.
The study clearly suggests that microsatellites are instrumental in extensive sequence duplications that may contribute to species-specific evolution of genome plasticity in insects.
KeywordsSegmental duplication Genome dynamics Microsatellite Insect genomes Duplication shadowing Gene duplication
Microsatellites are tandem repeats of simple sequences (usually consisting of motifs less than 6 bp long) which are found ubiquitously across eukaryotic genomes. Generally, microsatellite loci are assumed to be selectively neutral [1–3]. However, increasing evidence now suggests that microsatellites are associated with important roles in genome structure and evolution and are often subjected to selective pressure [4–10].
Moreover, non-random genomic distributions of microsatellites are well documented in eukaryotes [4, 11–15]. As much as 25% of the microsatellites are localized close to each other, generally within 10 bp, in different eukaryotic genomes as found by Kofler et al.. Furthermore, Kofler et al. showed that these microsatellites are localized close to each other in the genomes at a higher frequency than expected under the assumption of random genomic distribution. In addition, simple sequence coding sequences are distributed differentially in the genomes as evident from analysis of 25 insect species . Also, the Mouse Genome Sequencing Consortium,  has revealed that the ends of chromosome arms in mice are associated with higher density of microsatellites than other chromosomal regions. However, the functional and evolutionary relevance of non-random genomic distribution of microsatellites is poorly understood .
Studies have indicated possible association of microsatellites with segmentally duplicated sequences in some organisms [17–22]. Segmental duplications (SDs), also known as low-copy repeats, are generally defined as DNA sequences of length greater than 1 kb which are duplicated with high degree of sequence identity (>90%) . SDs are important features of genomes as they may have functional consequences in genomic instability and diseases as evident in humans . SDs are generally found in the genome as mosaic forms of duplicated sequences . A two-step process generates such mosaic structures . In the first step, multiple duplicated sequences are aggregated at specific genomic regions. In the second step, these primary duplications undergo multiple secondary duplications. However, the mechanism of how duplicated sequences are aggregated in the first place is not clear.
The present study is a systematic investigation to determine the distribution of microsatellite sequences in segmental duplications of different insect genomes (n=20). Although microsatellites are extensively used as genetic markers for population and ecological investigations of insects , the relationship of microsatellite sequences with segmental duplications has not been established in spite of availability of several insect genome sequences. Here we show that specific microsatellite pairs along with the intervening sequences are repeated with different frequencies in the genome and many of the low copy repeats of these loci are segmentally duplicated, henceforth called as microsatellite-associated SDs or mSDs. The results further show that these repeated microsatellite pairs (rMP) tend to aggregate at different genome regions along with the segmentally duplicated sequences suggesting a role for microsatellites in segmental duplications in insect genomes.
A total of 20 insect genomes were investigated in this study. They included twelve Drosophila species [D. melanogaster, D. simulans, D. sechellia, D. yakuba, D. erecta, D. ananassae, D. pseudoobscura, D. persimilis, D. willistoni, D. grimshawi, D. virilis, D. mojavensis], three mosquito species [Aedes aegypti (A. aegypti), Anopheles gambiae (A. gambiae), Culex quinquefasciatus (C. quinquefasciatus)], the wasp (Nasonia vitripennis), the honey bee (Apis mellifera), the beetle (Tribolium castaneum), the silk worm (Bombyx mori) and the pea aphid (Acyrthosiphon pisum). The insect names have been abbreviated as the first letter of the genus followed by three letters of the species names throughout the text and the illustrations. The genome sequences of the twelve Drosophila species were downloaded from FlyBase (http://www.flybase.org). The genome assembly version for each of these was r1.3 except D. melanogaster (r5.27), D. pseudoobscura (r2.10) and for D. virilis (r1.2). The genome sequences of the three mosquitoes were downloaded from VectorBase (http://www.vectorbase.org). The A. mellifera genome sequence was downloaded from http://hymenopteragenome.org/. The Nasonia genome sequence (N. vitripennis_OGS_v1.2) was obtained from http://www.hgsc.bcm.tmc.edu. The aphid genome sequence was obtained from the AphidBase (http://www.aphidbase.com/aphidbase/). The silkworm genome sequences were obtained from the SilkDB (http://www.silkdb.org/silkdb/). The genome sequence of T. castaneum was obtained from the BeetleBase (http://beetlebase.org/).
Non-random association of microsatellite pairs
The SciRoKo software  was used to identify the mono-, di-, tri-, tetra- and hexa-nucleotide simple sequence repeats (SSRs) or microsatellites in each genome. Both perfect and imperfect SSRs were detected by using the default parameters, with fixed penalty = 5 for mismatches between motif sequences. From the output files of SciRoKo (that generates microsatellite sequences, their position in the genome with start and end coordinates in chromosomes/scaffolds/supercontigs), distances between neighboring microsatellites were calculated in each species. When two microsatellites of the same repeat motifs had the same intervening distance at more than one location, they were counted as repetitive microsatellite pair (rMP). We assumed that presence of microsatellite pairs with the same motifs and same intervening distance at multiple locations in a genome was due to a random chance. Test of this null assumption was performed by calculating statistical significance of the hypergeometric probability as follows. First, the number of microsatellite pairs associated with the same intervening distance but different SSRs (n1) and the number of pairs associated with the same SSR pairs but with different intervening distances (n2) was determined in each genome. The total number of possible combinations for these two groups of SSR pairs was calculated as C(n, n1)* C(n, n2), where ‘n’ is the total number of microsatellites identified in the genome minus one (i.e. interventions between microsatellites), and ‘C’ represents the function of combination. Thus, C(n, n1) was calculated as the number of possibilities for choosing the ‘n1’ pairs from all the detected microsatellites in a genome. Of these, C(n1,n3) was calculated as the number of possibilities for choosing the same SSR pairs with the same intervening distance (n3). Thus, the number of combinations of the same SSR pairs having different intervening distance was calculated as C(n − n1, n2 − n3). From these, the cumulative probability of hypergeometric distribution of SSR pairs with the same intervening distance was calculated as . Thus, 1-p value provided the statistical significance to reject or not to reject the null assumption. The multiple testing by Bonferroni correction method was conducted to adjust the individual p values. The threshold values less than 0.05 were considered statistically significant unless stated otherwise. The association was further tested in shuffled sequences of A. aegypti supercontigs. Here we assumed that the distribution of SSR pairs was independent of the sequence structure of the genomic sequences and hence sequence shuffling would not affect their distribution. To test this assumption, the supercontig sequences were shuffled and sampled (n = 1,000 sequences, each of 1 kb in length) using the R code ‘ShuffleAndExtract’ (http://tata-box-blog.blogspot.com/search/label/R). The sequences generated from three independent shuffling experiments were then analyzed separately for distribution of rMPs using hypergeometric tests as described above.
A canonical correlation test  was performed using the number of rMPs associated with different intervening distances (< 10 bp, ≥ 10 bp but < 100 bp, ≥ 100 bp but <1 kb, ≥ 1 kb but < 5 kb, ≥ 5 kb but < 10 kb and ≥ 10 kb but < 50 kb) among the 20 species. Euclidean distance measures were used in the correlation test and significance of correlation was determined by permutation test (n = 9,999 random) according to methods of Anderson and Willis .
Intervening sequences of paired microsatellites
The intervening DNA sequences of the paired microsatellites were extracted using the coordinates of the microsatellite ends in the genome sequences by the R package SeqINR  or the GALAXY server (https://main.g2.bx.psu.edu/). The pair-wise alignments of duplicated sequences and the percent sequence identity of the alignments were performed using the R package ‘Biostrings’. The phylogenetic analyses were conducted using the Neighbor-Joining method in MEGA4 . The evolutionary distances were computed using the maximum composite likelihood method  and were in the units of the number of base substitutions per site. The estimates of average evolutionary divergences between different groups of rMP loci (e.g. genic versus non-genic) were also calculated by MEGA4. All the sequence polymorphism analyses including calculation of total number of mutations, number of polymorphic sites, the average number of nucleotide differences among duplicated sequences, and significance of Tajima D statistics were conducted by DnaSP v 5.10 .
To determine the genomic distribution patterns of paired microsatellites, genome assemblies (where sequences have been assigned to chromosomes) were binned to determine the total number of rMPs and the total number of pairs as mSDs. The size and number of bins were variable depending upon the chromosome length but they were mostly in megabases (Mb). For example, the A. gambiae genome was binned as < 1 Mb, 1–5 Mb, 5–10 Mb, 10–20 Mb, 20–30 Mb, 30–40 Mb, 40–50 Mb and > 50 Mb for each chromosome. The total numbers of rMPs and the mSDs across individual windows were counted. The Spearman rank order correlation test was performed with the total number of rMPs and the total number of mSDs among the binned regions to determine if regions rich in paired microsatellites accumulated more segmental duplications than regions poor in paired microsatellites. The p-value < 0.05 was considered significant.
Association of mSD sequences with gene duplications and transposons
The genomic positions of mSDs were used to determine if they were localized in genic regions. The start and end coordinates of annotated genes (both coding and non-coding) of each genome (Biomart dataset: Ensembl Metazoa 16) were used to determine if mSDs were localized within or overlapping with the genes. The gene ontology (GO) terms (downloaded from Biomart) of the genes associated with mSDs were analyzed. The rank orders of GO terms were used to determine the top ranking functions of these genes. The orthology and paralogy relationships of insect genes were obtained from Biomart (Metazoa) database. Based on sequence identity between paralogous copies, the nearly identical paralogs  were identified. The transposable element (TE) sequences annotated from D. melanogaster were analyzed to determine association of mSDs with transposable elements. The TE sequences were downloaded from ftp://ftp.flybase.net/genomes/aaa/transposable_elements/ReAS/v1/consensus_fasta/. The start and end coordinates of TEs in relation to mSDs were analyzed to determine TE-mSD associations.
Identification of repetitive paired microsatellites
Total counts and density (counts/Mb) of SSRs in each genome
Frequency and inter-SSR distance of paired microsatellites in different insects
< 10 bp
10 bp −100 bp
100 bp −1 kb
1 kb -5 kb
5 kb - 10 kb
10 kb - 50 kb
50 kb - 100 kb
Non-random distribution of microsatellite pairs
Inter- and intra-chromosomal distribution of rMPs
Sequence polymorphisms among duplicated copies of mSD [(TG)n-1022 bp- (CA)n] in sex and autosomes of A. gambiae
rMP dependent aggregation of segmental duplications
Number of repetitive microsatellite pairs (rMPs) and the microsatellite pairs associated with segmental duplications (mSDs) in insects
Genic versusnon-genic association of mSDs
By mapping the mSD sequences to the annotated gene locations, we identified duplicated copies which are localized within or overlapping the coding and non-coding genes (Additional file 8). These genes represented different gene ontologies (GO) in different insects (Additional file 9), among which the ‘protein binding’ or ‘nucleus’ gene ontologies represented the top ranking predicted functions across species. The genic duplications accounted for only ~5-15% of the mSDs identified in different species indicating that the majority of duplications occur in the intergenic regions. The lower abundance of mSDs in genic regions compared to the non-genic regions may be related to differential selection pressure between genic and non-genic regions. For example, the duplications of (TG)n-1022 bp-(CA)n in the A. gambiae genome have genic copies (2L:47741478–47742671, 2R:4134771–4135970, 3L:10536049–10537278, 3R:12823084–12824255, 3R:13862599–13863800 and 3R:33983540–33984801) (see Figure 5). We determined the evolutionary divergence of the genic versus non-genic duplications (see Methods). It was found that the genic copies have lower average evolutionary divergence than that of the non-genic copies (0.32 versus 0.58, respectively) indicating a possibility of selection constraint on genic duplications.
In addition, we also investigated whether genes of A. aegypti that overlap with mSDs are also associated with the same mSD copies in the one-to-one orthologous copies in C. quinquefasciatus and A. gambiae. Our results show that these mSDs are never retained in orthologous genes (data not shown), indicating the possibility of biased selection of mSD sequences of orthologous genes. Such bias is most likely associated with purifying selection as such microsatellites in gene rich segmental duplications are known to be associated with such selection bias . Furthermore, our analysis indicated that the mSDs within protein coding genes are mostly localized in the intron regions (data not shown). Hence, the lack of retention of mSDs in orthologous genes may also be due to higher rate of intron evolution than the coding sequences among the genes .
To further confirm if mSDs are associated with gene duplications, we identified the ‘nearly identical paralogous’ genes (NIPs, see Emrich et al. for definition). We were able to find several NIPs in the C. quinquefasciatus and D. melanogaster genomes (Additional file 10) that were associated with mSDs. However, we didn’t find any NIP associated with an mSD in A. aegypti, A. mellifera, A. gambiae and T. castaneum. Thus, if microsatellite mediated SDs have a role in gene duplication in insects; it is likely that such association is species-specific.
To determine if mSDs may have association with transposable elements (TEs) , we analyzed the annotated TEs to identify sequence duplications that are anchored by microsatellite pairs (Additional file 11). A list of the paired microsatellites associated with different TEs in D. melanogaster is provided in Additional file 12. It shows that the total repertoire of mSDs associated with TEs is only a minor fraction of the total number of mSDs observed in Drosophila. This suggests that the mSDs are found primarily in TE-free and gene-free regions of the genome.
In this study, we identified microsatellites that are repeated as pairs and investigated their association with segmentally duplicated sequences in insect genomes. We adopted a conservative approach to identify the repetitive microsatellite pairs in the genome by imposing the criterion that each pair has exactly the same intervening distance. However, we observed that, in some cases, the intervening distances are not exactly same but are similar (± 1 to 20 bp) among the microsatellite pairs. For example, the microsatellite pairs (ATTT)n and (TC)n are repeated 6 times wherein the intervening distance is exactly 3,807 bp (rMP family# 55, Additional file 2) compared to the other 14 duplications of the same SSR pairs but with intervening distance ~3,810 bp (Figure 9). The variation in intervening distances between the microsatellites may have resulted due to increase or decrease of repeat length of one or both of the microsatellites, possibly by slippage events during replications [35, 36]. Slippage creates a loop in one of strands that gives rise to an insertion or a deletion in the subsequent replications depending upon if the loop is formed in the replicating strand or in the template strand respectively. This leads to an increase or decrease in repeat length of microsatellites. In most of the microsatellite pairs we identified, one of the microsatellites was variable in length while the length of the other microsatellite remained unchanged. It is known that sequence composition , imperfection in microsatellite motifs  and the local mutation rate of microsatellite loci  have roles in modulating the repeat length of the microsatellites that may account for variable intervening distances of paired microsatellites. Furthermore, differential selection of simple sequence coding repeats [10, 38] may also account for the variation in distance between microsatellite pairs.
More than two microsatellites repeated together in the genome were also identified from our analysis. For example, a cluster of microsatellites [(A)21..66bp..(CTG)16..346bp..(TA)24..253bp..(CA)27..335bp..(AGGA)23..711bp..(AG)21..1299bp..(CGGCA)15..225bp..(A)21] is repeated three times in a tandem manner within the region 2L: 9475131–9483718 of D. melanogaster genome. However, such repeats containing more than two microsatellites were exceptionally low in frequency in the insect genomes (data not shown). On the other hand, repeats consisting of only two microsatellites are abundant in each species which was also observed by Kofler et al..
Segmental duplications have been characterized in few organisms, mostly in the human and D. melanogaster genomes . They are poorly studied in other species in spite of availability of draft genome sequences for many eukaryotes. Our study is a first effort in this direction to identify segmentally duplicated sequences from genome assemblies of different insects. In this study, the segmental duplications represent only a proportion of duplications where we find microsatellites at the sequence ends of the duplications. Although a comprehensive discovery of all the segmental duplications of these insects was not the aim of the present study, our results show that the repetitions of microsatellite pairs are associated with segmental duplications in insects but with extremely variable frequency. The A. aegypti and N. vitripennis genomes have more than one thousand mSDs whereas the T. castenium genome has only seven mSDs (Table 4) indicating that microsatellite anchored segmental duplications may be determined by species specific evolutionary processes.
Our results further showed that genomic regions with higher numbers of repetitive microsatellite pairs accumulate a greater number of segmental duplications than regions poor in paired microsatellites (Figure 7). This is a classic ‘rich-gets-richer’ mechanism where more segmental duplications tend to occur in regions that already have more duplicated sequences . Such a mode of enrichment of SDs in specific chromosomal regions has relevance to ‘duplication shadowing’ effects in genome [33, 40]. For example, duplication shadowing in the human genome contributes to ~10 fold increased probability of sequence duplication in specific regions compared to their distribution in other regions . We observed such a pattern of segmental duplications in chromosome 2L (2L: 21421283–21534584) of D. melanogaster where two (ATTT)n and (TC)n are repeated as sequence ends of each duplication (Figure 9). In this case, each SD contains four histone genes His1, His2A, His2B and His4. It was found that the entire ~3810 bp sequence representing the segmental duplication maps to a single cDNA (accession # AY119274) suggesting that the duplicated sequence containing the four genes is expressed as a common primary transcript. It is possible that duplication shadowing of gene regions may be an evolutionary strategy to modulate expression of specific genes as evident in primates . Moreover, Korbel et al. also found that segmental duplications of larger sequences enclosing specific protein coding genes often contribute to the expansion of protein-coding gene families. Although the role of microsatellites in this process is not known, it has been found that microsatellites in the flanking sequences of genes may have a regulatory role in gene expression . Moreover, simple sequence repeats in the coding region can influence translational selection of genes that can modulate expression level of those genes . These reports indicate that microsatellite mediated segmental duplications may have an effect on expression of the genes when they are associated with segmental duplications in the genome.
The paired microsatellites identified from our investigation may be targets of non-homologous end joining (NHEJ), which is one of the mechanisms of segmental duplication . Such processes are generally mediated by microhomologies (< 25 bp homology) at the ends of target sequences similar to the termini microsatellites of mSDs found in this study. Consistently, association of microsatellites has been indicated in genomic rearrangements  as well as segmental duplications . Furthermore, it has been shown that microsatellites are enriched at breakpoints of SDs suggesting the possible role of microsatellite repeats in the genesis of SDs . Hence, our results further corroborate that microsatellites, by repetition as pairs, are likely to have a role in the genesis of SDs in insect genomes.
It is also likely that mechanisms other than involving microsatellites have roles in segmental duplications. Non-allelic homologous recombination (NAHR) during meiosis using pre-existing repeat elements (such as Alu repeats) can also lead to segmental duplications . Moreover, several factors such as length, orientation, degree of sequence similarity and the distance between the duplicated copies may lead to differential degrees of genomic rearrangements of sequences in genome . It is possible that the genesis of segmental duplications may also be controlled by the same mechanisms that generate copy number variations (CNV) in genomes. CNVs are caused by different rearrangement events of sequences including deletions, duplications, inversions, and translocations . However, Kim et al. found that only a minor portion (< 10%) of CNVs is associated with segmental duplications in the human genome suggesting independent mechanisms of genesis of SDs than that of CNVs.
Our data further suggests that duplications of paired microsatellites are localized mostly in the non-genic regions. In addition to that, the paired microsatellites in the genic regions are predominantly in the intron regions (data not shown). We also found several mSDs that are associated with different transposable elements (TEs) in the D. melanogaster genome (Additional file 12). Therefore, the role of microsatellites in intron evolution and retrotransposition events cannot be ruled out [32, 47]. Given the role of transposition events in genome structure and function [48, 49], it is likely that microsatellites are instrumental in extensive sequence transposition and duplication in the genome.
In this study, we have shown that microsatellites have significant association with segmental duplications in insect genomes. The repetitive paired microsatellites tend to accumulate in regions rich in segmental duplications suggesting a “rich-gets-richer” mode of aggregation of the duplicated sequences in the genome. Results further suggest that these repetitive sequences are also associated with gene duplications in specific insect genomes. The study clearly suggests that repetition of paired microsatellites contribute to extensive sequence duplications in insect genomes.
SKB is a Research Assistant Professor in the Department of Biological Sciences and the Eck Institute for Global Health at the University of Notre Dame, Indiana. He has a broad interest in insect genomics and evolution with emphasis on disease transmitting vector species. DWS is a Professor of Biological Sciences and the Director of Eck Institute for Global Health at the University of Notre Dame, Indiana. His work focuses on genetic and genomic analysis of mosquito vector competence to various pathogens as well as on development and application of molecular tools to investigate population biology of mosquitoes.
The authors are thankful to Daine Lovin and Joanne Cunningham for critically reading the manuscript.
- Tachida H, Iizuka M: Persistence of repeated sequences that evolve by replication slippage. Genetics. 1992, 131: 471-478.PubMed CentralPubMedGoogle Scholar
- Awadalla P, Ritland K: Microsatellite variation and evolution in the Mimulus guttatus species complex with contrasting mating systems. Mol Biol Evol. 1997, 14: 1023-1034. 10.1093/oxfordjournals.molbev.a025708.View ArticlePubMedGoogle Scholar
- Schlötterer C, Wiehe T: Microsatellites, a neutral marker to infer selective sweeps. Microsatellites: Evolution and Applications. Edited by: Goldstein DB, Schlötterer C. 1999, Oxford: Oxford University Press, 238-247.Google Scholar
- Li YC, Korol AB, Fahima T, Beiles A, Nevo E: Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review. Mol Ecol. 2002, 11: 2453-2465. 10.1046/j.1365-294X.2002.01643.x.View ArticlePubMedGoogle Scholar
- Rockman MV, Wray GA: Abundant raw material for cis-regulatory evolution in humans. Mol Biol Evol. 2002, 19: 1991-2004. 10.1093/oxfordjournals.molbev.a004023.View ArticlePubMedGoogle Scholar
- Ellegren H: Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004, 5: 435-445. 10.1038/nrg1348.View ArticlePubMedGoogle Scholar
- Huntley M, Golding GB: Evolution of simple sequence in proteins. J Mol Evol. 2000, 51: 131-140.PubMedGoogle Scholar
- Kashi Y, King DG: Simple sequence repeats as advantageous mutators in evolution. Trends Genet. 2006, 22: 253-259. 10.1016/j.tig.2006.03.005.View ArticlePubMedGoogle Scholar
- Behura SK: Molecular marker systems in insects: current trends and future avenues. Mol Ecol. 2006, 15: 3087-3113. 10.1111/j.1365-294X.2006.03014.x.View ArticlePubMedGoogle Scholar
- Behura SK, Severson DW: Genome-wide comparative analysis of simple sequence coding repeats among 25 insect species. Gene. 2012, 504: 226-232. 10.1016/j.gene.2012.05.020.PubMed CentralView ArticlePubMedGoogle Scholar
- Scotti I, Magni F, Fink R, Powell W, Binelli G, et al: Microsatellite repeats are not randomly distributed within Norway spruce (Picea abies K.) expressed sequences. Genome. 2000, 43: 41-46.View ArticlePubMedGoogle Scholar
- Toth G, Gaspari Z, Jurka J: Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000, 10: 967-981. 10.1101/gr.10.7.967.PubMed CentralView ArticlePubMedGoogle Scholar
- Grover A, Aishwarya V, Sharma PC: Biased distribution of microsatellite motifs in the rice genome. Mol Genet Genomics. 2007, 277: 469-480. 10.1007/s00438-006-0204-y.View ArticlePubMedGoogle Scholar
- Kofler R, Schlötterer C, Luschützky E, Lelley T: Survey of microsatellite clustering in eight fully sequenced species sheds light on the origin of compound microsatellites. BMC Genomics. 2008, 9: 612-10.1186/1471-2164-9-612.PubMed CentralView ArticlePubMedGoogle Scholar
- Rhode C, Roodt-Wilding R: Bioinformatic survey of Haliotis midae microsatellites reveals a non-random distribution of repeat motifs. Biol Bull. 2011, 221: 147-154.PubMedGoogle Scholar
- Mouse Genome Sequencing Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticleGoogle Scholar
- Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, et al: Recent segmental duplications in the human genome. Science. 2002, 297: 1003-1007. 10.1126/science.1072047.View ArticlePubMedGoogle Scholar
- Balaresque P, Toupance B, Heyer E, Crouau-Roy B: Evolutionary dynamics of duplicated microsatellites shared by sex chromosomes. J Mol Evol. 2003, 57 (Suppl 1): S128-S137.View ArticlePubMedGoogle Scholar
- Bailey JA, Eichler EE: Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet. 2006, 7: 552-564.View ArticlePubMedGoogle Scholar
- Fiston-Lavier AS, Anxolabehere D, Quesneville H: A model of segmental duplication formation in Drosophila melanogaster. Genome Res. 2007, 17: 1458-1470. 10.1101/gr.6208307.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, et al: Analysis of copy number variants and segmental duplications in the human genome: evidence for a change in the process of formation in recent evolutionary history. Genome Res. 2008, 18: 1865-1874. 10.1101/gr.081422.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Sharma PC, Roorkiwal M, Grover A: Purifying selection bias against microsatellites in gene rich segmental duplications in the rice genome. Int J Evol Biol. 2012, 2012: 970920-PubMed CentralView ArticlePubMedGoogle Scholar
- Eichler EE: Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet. 2001, 17: 661-669. 10.1016/S0168-9525(01)02492-1.View ArticlePubMedGoogle Scholar
- Kahn CL, Raphael BJ: Analysis of segmental duplications via duplication distance. Bioinformatics. 2008, 24: i133-i138. 10.1093/bioinformatics/btn292.View ArticlePubMedGoogle Scholar
- Kofler R, Schlötterer C, Lelley T: SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007, 23: 1683-1685. 10.1093/bioinformatics/btm157.View ArticlePubMedGoogle Scholar
- Anderson MJ, Willis TJ: Canonical analysis of principal coordinates: a useful method of constrained ordination for ecology. Ecology. 2003, 84: 511-525. 10.1890/0012-9658(2003)084[0511:CAOPCA]2.0.CO;2.View ArticleGoogle Scholar
- Charif D, Lobry JR: SeqinR 1.0-2: A contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis. Structural approaches to sequence evolution. Edited by: Bastolla U, Porto M, Roman HE, Vendruscolo M. 2007, Berlin, Heidelberg: Springer Berlin Heidelberg, 207-232.View ArticleGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
- Tamura K, Nei M, Kumar S: Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci U S A. 2004, 101: 11030-11035. 10.1073/pnas.0404206101.PubMed CentralView ArticlePubMedGoogle Scholar
- Librado P, Rozas J: DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009, 25: 1451-1452. 10.1093/bioinformatics/btp187.View ArticlePubMedGoogle Scholar
- Emrich SJ, Li L, Wen TJ, Yandeau-Nelson MD, Fu Y, et al: Nearly identical paralogs: implications for maize (Zea mays L.) genome evolution. Genetics. 2007, 175: 429-439.PubMed CentralView ArticlePubMedGoogle Scholar
- Rodríguez-Trelles F, Tarrío R, Ayala FJ: Origins and evolution of spliceosomal introns. Annu Rev Genet. 2006, 40: 47-76. 10.1146/annurev.genet.40.110405.090625.View ArticlePubMedGoogle Scholar
- Cheng Z, Ventura M, She X, Khaitovich P, Graves T, et al: A genome-wide comparison of recent chimpanzee and human segmental duplications. Nature. 2005, 437: 88-93. 10.1038/nature04000.View ArticlePubMedGoogle Scholar
- Hughes AL, Friedman R, Ekollu V, Rose JR: Non-random association of transposable elements with duplicated genomic blocks in Arabidopsis thaliana. Mol Phylogenet Evol. 2003, 29: 410-416. 10.1016/S1055-7903(03)00262-8.View ArticlePubMedGoogle Scholar
- Schlötterer C, Tautz D: Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 1992, 1992 (20): 211-215.View ArticleGoogle Scholar
- Richard GF, Pâques F: Mini- and microsatellite expansions: the recombination connection. EMBO Rep. 2000, 1: 122-126. 10.1093/embo-reports/kvd031.PubMed CentralView ArticlePubMedGoogle Scholar
- Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Slatkin M, Freimer NB: Mutational processes of simple-sequence repeat loci in human populations. Proc Natl Acad Sci U S A. 1994, 91: 3166-3170. 10.1073/pnas.91.8.3166.PubMed CentralView ArticlePubMedGoogle Scholar
- Huntley MA, Golding GB: Selection and slippage creating serine homopolymers. Mol Biol Evol. 2006, 23: 2017-2025. 10.1093/molbev/msl073.View ArticlePubMedGoogle Scholar
- Levasseur A, Pontarotti P: The role of duplications in the evolution of genomes highlights the need for evolutionary-based approaches in comparative genomics. Biol Direct. 2011, 6: 11-10.1186/1745-6150-6-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Kirsch S, Münch C, Jiang Z, Cheng Z, Chen L, et al: Evolutionary dynamics of segmental duplications from human Y-chromosomal euchromatin/heterochromatin transition regions. Genome Res. 2008, 18: 1030-1042. 10.1101/gr.076711.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Korbel JO, Kim PM, Chen X, Urban AE, Weissman S, et al: The current excitement about copy-number variation: how it relates to gene duplications and protein families. Curr Opin Struct Biol. 2008, 18: 366-374. 10.1016/j.sbi.2008.02.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Linardopoulou EV, Williams EM, Fan Y, Friedman C, Young JM, et al: Human subtelomeres are hot spots of interchromosomal recombination and segmental duplication. Nature. 2005, 437: 94-100. 10.1038/nature04029.PubMed CentralView ArticlePubMedGoogle Scholar
- Ugarković D, Plohl M: Variation in satellite DNA profiles–causes and effects. EMBO J. 2002, 21: 5955-5999. 10.1093/emboj/cdf612.View ArticlePubMedGoogle Scholar
- Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003, 73: 823-834. 10.1086/378594.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee JA, Lupski JR: Genomic rearrangements and gene copy-number alterations as a cause of nervous system disorders. Neuron. 2006, 52: 103-121. 10.1016/j.neuron.2006.09.027.View ArticlePubMedGoogle Scholar
- Hastings PJ, Lupski JR, Rosenberg SM, Ira G: Mechanisms of change in gene copy number. Nat Rev Genet. 2009, 10: 551-564. 10.1038/nrg2593.PubMed CentralView ArticlePubMedGoogle Scholar
- Roy SW: The origin of recent introns: transposons?. Genome Biol. 2004, 5: 251-10.1186/gb-2004-5-12-251.PubMed CentralView ArticlePubMedGoogle Scholar
- Hurst GD, Werren JH: The role of selfish genetic elements in eukaryotic evolution. Nat Rev Genet. 2001, 2: 597-606. 10.1038/35084545.View ArticlePubMedGoogle Scholar
- Werren JH: Selfish genetic elements, genetic conflict, and evolutionary innovation. Proc Natl Acad Sci U S A. 2011, 108 (Suppl 2): 10863-10870.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.