Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions
BMC Genomics volume 22, Article number: 421 (2021)
Microsatellite repeats are ubiquitous in organism genomes and play an important role in the chromatin organization, regulation of gene activity, recombination and DNA replication. Although microsatellite distribution patterns have been studied in most phylogenetic lineages, they are unclear in fish species.
Here, we present the first systematic examination of microsatellite distribution in coding and non-coding regions of 14 fish genomes. Our study showed that the number and type of microsatellites displayed nonrandom distribution for both intragenic and intergenic regions, suggesting that they have potential roles in transcriptional or translational regulation and DNA replication slippage theories alone were insufficient to explain the distribution patterns. Our results showed that microsatellites are dominant in non-coding regions. The total number of microsatellites ranged from 78,378 to 1,012,084, and the relative density varied from 4925.76 bp/Mb to 25,401.97 bp/Mb. Overall, (A + T)-rich repeats were dominant. The dependence of repeat abundance on the length of the repeated unit (1–6 nt) showed a great similarity decrease, whereas more tri-nucleotide repeats were found in exonic regions than tetra-nucleotide repeats of most species. Moreover, the incidence of different repeated types appeared species- and genomic-specific. These results highlight potential mechanisms for maintaining microsatellite distribution, such as selective forces and mismatch repair systems.
Our data could be beneficial for the studies of genome evolution and microsatellite DNA evolutionary dynamics, and facilitate the exploration of microsatellites structural, function, composition mode and molecular markers development in these species.
Microsatellites, also termed as simple sequence repeats (SSRs), are short tandemly repeated sequences with 1-to-6 base pair (bp) motifs [1, 2]. They are ubiquitous and highly abundant in eukaryote, prokaryote and virus genomes [3,4,5], making up around 3% of the human genome . Microsatellite instability is an important and unique form of mutation that is responsible for, or strongly implicated in, over 40 human neurological, neurodegenerative and neuromuscular disorders  and associations have also been observed in other complex diseases [2, 3, 8, 9]. Undoubtedly, microsatellites have attracted considerable attention due to their roles in the organization of chromosome structure, DNA recombination and replication, and gene expression and cell cycle dynamics .
Microsatellite analysis is used for a wide range of biological questions. Unique polymorphism of normal and disease-causing repeats can be used for disease diagnosis and prognosis [11,12,13]. Microsatellite repeats are advantageous as genetic markers due to their high polymorphism, informativeness and co-dominance, and have been used to construct quantitative trait loci (QTL) maps, genetic linkage maps [14,15,16,17,18] and DNA fingerprinting . These features also provide the foundation for their successful application in other fundamental and applied fields of biology, including population and conservation genetics, genetic dissection of complex traits and marker-assisted breeding programs [10, 20,21,22].
Microsatellite content generally correlates positively with genome size [23,24,25]. The distribution of microsatellites exhibit different properties in genomes with different functionality [26,27,28,29,30,31], contradicting earlier studies stating that they are randomly distributed and simply represent “junk” DNA sequences . Microsatellites are ubiquitously distributed across the entire genome, including protein-coding and non-coding regions [6, 33,34,35]. Previous studies have indicated that microsatellite occurrence differs significantly in coding and non-coding regions , and some microsatellite types were preferred and often common in genome-specific regions [26, 29]. Excessive microsatellite repeats occur in non-coding regions of eukaryotic organisms , whereas they are relatively rare in coding regions, ranging between 7 and 10% of higher plants [38, 39] and between 9 and 15% of vertebrates [40,41,42]. Meanwhile, multiple studies have demonstrated that the hotspots of microsatellite distribution may be related with various phenotypic traits [43, 44]. In the genome of Saccharomyces cerevisiae about 17% of genes contain microsatellite repeats in open reading frames (ORFs) [45, 46] and the repeats are specifically enriched in regulatory genes that encode transcription factors, DNA-RNA binding proteins and chromatin modifiers . Microsatellite repeats in cis-regulatory elements and promoters, which frequently occur (e.g., ~ 25% promoters in yeast contain tandem repeats), regulate the process of gene expression [48, 49]. The (TTAGGG) n tracts constitute a substantial portion of the telomeric regions and are recognized by telomerase, which can be related to stability of chromosomes and nucleolus organizing regions [10, 50, 51].
xMicrosatellites are inherently unstable with high mutation rates from about 10− 6 to 10− 2 per locus per generation, resulting from DNA replication slippage [52, 53]. Mutation rates vary among microsatellite types (perfect, compound or interrupted), base composition of the repeat , repeat types (di-, tri- and tetranucleotide) [55, 56] and lengths , and heterozygosity [57, 58], but also among chromosome position, cell division, the GC content in flanking DNA and taxonomic groups [59,60,61,62]. Microsatellite instability has a strong influence on genomic microsatellite abundance and various functions and is explained by two mutually exclusive mutational mechanisms: (i) DNA replication slippage theory suggests that during DNA replication, the nascent and template strand realign out of register, and if DNA synthesis continues unabated on this molecule the repeat number of the microsatellite is altered [21, 63, 64]. The stability of the slipped structure has been maintained by hairpin, triplex, cruciform or quadruplex arrangement of DNA strands [65,66,67,68,69]. (ii) Unequal recombination theory assumes that large scale contractions and expansions of the repeat array involved the processes of DNA unequal recombination, including unequal crossing over and gene conversion , via a number of transposable elements, the best known are Alu and other short interspersed elements [64, 71]. Non-reciprocal recombination, random genetic drift and selective forces could have a significant effect on the accumulation of tandem-repetitive sequences in genomes [63, 65, 70].
So far, systematic research regarding microsatellite variation and characterization have been conducted on phylogenetic lineages, including humans , primates [72,73,74], plants and fungi [36, 75,76,77,78,79,80,81,82], and viruses [83, 84]. Yet microsatellite distribution patterns in fishes, an important branch of biological evolution, remained unclear. Here, 14 fish genomes have been used to indicate the microsatellite distribution patterns. The main objectives of the present study were to examine the distribution patterns of microsatellite in different fish genomes. The specific aims were 1) to examine the abundance and frequency of microsatellites in several important fish genomes, and 2) to compare the compositional differences of microsatellites in different taxa and genome-specific regions. We anticipate our study will provide foundational knowledge of microsatellite dynamics in fish species, helping us to better understand microsatellite distribution, and provide strong support for further exploration of genome structure and microsatellite functions.
Materials and methods
Genome sequences from 14 fish species, including model fishes (Danio rerio, Oryzias latipes, Astyanax mexicanus), commercial species (Cyprinus carpio, Oncorhynchus mykiss, Oncorhynchus kisutch, Oreochromis niloticus, Ictalurus punctatus, Esox Lucius, Cynoglossus semilaevis,), ornamental fishes (Poecilia reticulate, Takifugu rubripes, Nothobranchius furzeri), and “living fossil” fish species (Lepisosteus oculatus), were used in this study. Most genome sequences were downloaded from the Ensembl Genome Browser (Ensembl, Available online: http://asia.ensembl.org/index.html). The sequences of Cyprinus carpio, Nothobranchius furzeri, Oncorhynchus kisutch, and Oncorhynchus mykiss were obtained from the National Centre for Biotechnology Information (NCBI. Available online: http://www.ncbi.nlm.nih.gov/). We also obtained genome annotations to identify microsatellite locations in the genomes. The genomic (chromosomal) sequences that had complete genome annotations were included in this study. We filtered the unknown bases (Ns) in genome sequences using the Perl script and obtained the valid length of sequences for further analysis. The details of the genome sequences are listed in Table 1.
Microsatellites were identified from genome sequences using the Krait v0.9.0 program, a robust and ultrafast tool with a user-friendly graphic interface for genome-wide investigation of microsatellites . We employed the perfect search model of the program to investigate the all motifs according to minimum repeats or minimum length of microsatellite. In the present study, we defined the perfect microsatellites as being mononucleotide repeats ≥12-bp, dinucleotide repeats ≥14-bp, trinucleotide repeats ≥15-bp, tetranucleotide repeats ≥16-bp, pentanucleotide repeats ≥20-bp and hexanucleotidue repeats ≥24-bp, and the length of flanking sequence was constrained to 200 bp, as previously described [72, 74]. We mainly examined the distribution of perfect repeats ≥12-bp long. The rationale for choosing the small cutoff value was that the microsatellites are often disrupted by single base substitutions [6, 33]. The occurrence of repeats in exons, introns and intergenic regions have been identified from the annotations of the 14 fish genome sequences using Perl scripts. The SciRoKo software tool  and the NCBI Graphical Sequence Viewer program (https://www.ncbi.nlm.nih.gov/projects/sviewer/) were employed to increase the reliability of the results for examined microsatellite repeats.
Repeats with unit patterns being circular per-mutations and/or reverse complements of each other were grouped together as one type. The total number of the non-overlapping type was 501 for 1–6 nt long motifs, with 1-nt motif containing two 2 types: A and C (A = T and C = G), 2-nt motif containing 4 types: AT, AG, AC and GC (AT = TA, AG = GA = CT = TC, AC=CA = GT = TG, and GC=CG), and 3–6-nt motif containing 10, 33, 102 and 350 types [41, 87].
Distribution patterns of microsatellite repeats in the fish genomes
We examined the number, relative frequency (microsatellite numbers per Mb of the sequence), relative density (total microsatellite length per Mb of the sequence), GC content and the coverage degree (percentage of total microsatellites length in sequence) of microsatellites with motif lengths of 1–6 nucleotides in the 14 fish genomes (Table 2). We assigned 4-letter name abbreviations to the 14 species and these have been henceforth used to simplify results and discussion (e.g. Danio rerio = Drer, O. niloticus = Onil; see Table 2). The total number of microsatellites, ranging from 78,378 (Locu) to 1,012,084 (Drer), differed between fish species and the coverage degree varied from 0.18% (Locu) to 5.29% (Trub) (Table 2). The lowest relative frequency and relative density of microsatellites were both found in Olat (249.99 loci/Mb and 4925.76 bp/Mb, respectively) (Table 2). The highest relative frequency and density of microsatellites was found in Csem (3445.94 loci/Mb) and Ipun (25,401.97 bp/Mb), respectively. The GC content ranged from 10.94% (Trub) to 48.20% (Okis) (Table 2).
The main distribution pattern of di- (mononucleotide SSRs) > mono- (dinucleotide SSRs) > tetra- (trinucleotide SSRs) > tri- (tetranucleotide SSRs) > penta- (pentanucleotide SSRs) > hexanucleotide (hexanucleotide SSRs) was shared by six fish genomes (i.e., Drer, Trub, Ipun, Amex, Eluc, and Omyk), while a mono- > di- > tetra- > tri- > penta- > hexanucleotide pattern was observed in Olat, Ccar and Pret (Table 2). The di- > mono- > tri- > tetra- > penta- > hexanucleotide pattern was shared by Onil and Csem, whereas Nfur exhibited a di- > tetra- > mono- > tri- > penta- > hexanucleotide pattern (Table 2). The 1-nt or 2-nt repeats had a higher percentage motif abundance in the fish genomes than any other motif length, while the 2-nt repeats represented more than 60% motif abundance in Omyk, Eluc and Okis (Table 3). There was an almost equal distribution of motif abundance percentages between the first three motifs (1–3 nt) in Locu, with 3-nt and 1-nt repeats being almost identical (34.92, 34.99%, respectively). The percentage of 4-nt repeats was remarkably uniform across all taxa except for Drer and Locu, which had marginally greater or lesser percentages of this motif length (Table 3). Microsatellites with longer motifs (5–6 nt) showed lower percentages compared to the short motif repeats (1–4 nt). The 6-nt repeats had the lowest percentages among these motif lengths, ranging from 0.21% (Drer) to 0.85% (Trub) (Table 3).
Motif abundance percentages of mononucleotide repeats within intergenic regions, introns and exons varied across species, with intergenic regions ranging from 0.19% (Okis) to 7.19% (Trub), introns ranging from 9.87% (Nfur) to 45.06% (Olat) and exons ranging from 0.37% (Okis) to 3.58% (Pret) (Table 3). Among the two types of mononucleotide repeats, poly(A/T) was generally far more abundant than poly(C/G) in these fish genomes, except that the reverse was found in the Trub, Omyk and Okis genome sequences (Supplement Table 1, Tables 4 and 5). Drer had the maximum repeat number of A (or T) (192,264) followed by Ccar, Ipun, Amex and Pret. Pret contained the maximum number (4549) of C (or G). Although poly(A/T) tracts were clearly more abundant than poly(C/G) in exons (Table 4), this difference was not consistently observed in introns (Table 5) and intergenic regions (Table 6).
Among genome-specific regions, there was a lower percentage of dimer repeats (AT, AC, AG, CG) in exons compared to non-coding regions, ranging from 0.37% (Ccar) to 1.77% (Onil). Within the non-coding regions, intronic regions have a higher proportion of dinucleotide repeats compared to the intergenic regions (Table 3). We found that (AC) n repeats were generally more numerous in specific genomic regions, except that Amex had greater numbers of (AG) n repeats in exonic regions and Okis had greater (AT) n repeats in intronic regions (Tables 4 and 5). The number of (AT) n repeats observed the greatest variation between genome-specific regions and species. For example, intronic or intergenic regions of Drer have similar numbers of (AT) n repeats to (AG) n repeats, whereas exon numbers of (AT) n repeats were considerably less than (AG) n repeats. Olat had more (AT) n repeats than (AG) n in exons, but the opposite was found in other genomes. Finally, (CG) n repeats were very infrequent or absent in these genomes.
Motif abundance percentages of trinucleotide repeats in the exons of six fish species were greater than in intergenic regions, these six species being Olat (1.90%), Csem (1.81%), Pret (1.63%), Onil (1.53%) and Eluc (0.78%) (Table 3). Meanwhile, motif abundance percentages of trinucleotide repeats in the exons of Locu, Csem and Olat were greater than other motif lengths (e.g. mono-, di-). Among the different trinucleotide repeats, (AAT) n repeats were generally the most numerous repeats in intronic and intergenic regions of different taxa (Tables 5 and 6), except for Okis where (ACT) n repeats were the most numerous in intergenic regions (Table 6). There was no one trinucleotide repeat in exonic regions that was typically more numerous than another across the different fish species. For example, (AAT) n repeats were most numerous in Drer and Ipun, while (ATC) n repeats were greater in Ccar and Nfur and (AGG) n repeats were greatest in the 10 remaining species (Table 4). Repeats such as ACT, ACT, AGC, ACG and CCG were generally in low numbers in each specific genomic region. Furthermore, CCG repeats were absent in the intergenic regions of Eluc, Omyk, Ccar and Okis (Table 6).
Tetranucleotide repeats were frequent in each genomic region and were generally dependent on the base composition of the repeat unit (Tables 7, 8, 9 and Supplement Table 1). Overall, repeats with > 50% of A + T (e.g. AAAT, ATAG and AATC repeats) were more abundant in studied fish genomes (Supplement Table 1). There were, however, a few notable exceptions. For example, (ACAG) n repeats were the most numerous in Eluc, Omyk and Okis (Supplement Table 1). We found that the (AAAB) n repeats (where B denotes any base other than A) were most numerous in exonic regions in five fish species (i.e. Olat, Drer, Onil, Ccar and Ipun), the (ACAG) n repeats were numerous in Eluc, Omyk and Okis, and (ATCC) n repeats were most common of the remaining four fish species (Table 7). Similar to exons, the most common tetranucleotide repeat in intergenic regions was (AAAB) n, except for (ATCC) n in Olat, (AATC) n in Eluc and (ATAG) n in Amex (Table 9). In introns, (AATB) n or (ACAG) n were the most common tetranucleotide repeats in studied fish (Table 8). We also found some repeats with > 50% of C + G (e.g. ACGC, AGGG and AGCG repeats) were in the top 50% of tetranucleotide repeats in specific genome regions (Tables 7, 8, 9).
As expected, the occurrence pentanucleotide repeats was less than tetranucoeitde repeats in different genome regions. We found a general distribution pattern of pentanucleotide repeats for all species, where (A + T)-rich repeats were the most abundant. Yet, we still found notable exceptions where (C + G)-rich repeats were dominant in specific genomic locations, including AGAGG and ACTGG in introns or intergenic regions of Trub, Csem and Okis and ACTGC in exons of Eluc (Tables 7, 8, 9). Although AGAGG repeats in introns and exons were relatively abundant in Csem, it was also the only species that lacked this repeat in intergenic regions in this study (Supplement Table 1). We also found that the CpG-containing repeats were present in the top 50% of pentanucleotide repeats, including (ATACG) n or (CCCGG) n tracts in intronic regions of Eluc and Locu, (CCCGG) n, (AATCG) n or (ACCGG) n tracts in exonic regions of Trub, Amex and Pret, and (ATACG) n or (ACCGG) n tracts in intergenic regions of Eluc and Pret (Tables 7, 8, 9).
Hexanucleotide repeats were the least numerous in specific genomic regions, except for the exons of Trub (Table 3). In exonic and intronic regions, a dominance of (C + G)-rich repeats was found in the majority of the genomes (Tables 7 and 8). The repeat motifs present in intergenic regions were highly variable and relatively (A + T)-rich (Table 9). Except for in Olat, Onil, Ccar, Okis and Okis, the CpG-containing repeats were common in the top 50% of hexanucleotide repeats in intronic and exonic regions, and half of species had CpG-containing in the top 50% of hexanucleotide repeats in intergenic regions (Tables 7, 8, 9). A few telomere-like repeats were found in introns or intergenic regions, excluding Pret. However, the (AATCCC) n and (AACCCT) n tracts were observed in exonic regions of Trub and Omyk, respectively (Table 8).
Iteration number and length distribution of microsatellites in fish genomes
Iteration number and length of microsatellites are both important factors determining microsatellite mutation rates, and it could be extremely important not only for genomic stability, but also with regard to the evolution of additional genomic features such as codon usage. To assess expandability of the repeats, iteration number of microsatellites was plotted against microsatellite length of various quantity intervals: <20, 20–50, 50–100, 100–200, 200–300, and >300 (Fig. 1). The details of all iteration numbers and densities of microsatellites in fish genomes are given in the Supplement Table 2. Usually, the frequency of microsatellites has a tendency to converge to a small iteration number. In other words, short microsatellites were observed more frequently in the fish genomes than long microsatellites. When the iteration number was less than 20, the repeat tracts varying motif lengths from mono- to hexa-nucleotide (1–6 nt) comprised more than 83.93, 67.22, 90.38, 88.93, 92.58 and 90.42%, respectively (Fig. 1 and Supplement Table 2). However, a few special microsatellites were found where the iteration number exceeded 300, for example 1-nt microsatellites in Csem, Eluc, Amex, Ccar and Okis, 2-nt microsatellites in Drer, Csem, Eluc, Nfur, Amex, Ccar and Okis, 3-nt microsatellites in Nfur, Amex, Ccar, 4-nt microsatellites in Drer, Csem, Nfur, Amex and Ccar, 5-nt microsatellites in Amex and Ccar, and 6-nt microsatellites in Csem, Amex and Ccar (Supplement Table 2).
In this study, we examined the microsatellites composed of motifs 1–6 bp long in the entire genomes of 14 fish species and analyzed their distribution and frequency in different genomic regions. Microsatellite occurrence significantly differed with the coverage degree varying from 0.18 to 5.29%. Comparison of microsatellite repeat occurrence in the genomes of humans (3%) , primates (0.83–0.88%) [72,73,74], birds (0.13–0.49%) , plants and fungi (0.04–0.15%) [75, 76, 80, 89, 90], with our data indicates that microsatellite occurrence differs between different species and this might be a general phenomenon across taxa . In fact, differences might even occur between closely related species as humans and chimpanzees , and within the genus of Drosophila [92, 93].
Another clear trend to emerge from this analysis was that the observed dependence of microsatellite abundance on repeated unit length and iteration number was very much biased from the expected trend of gradual decrease, which was consistent with a previous study . Our research also indicated that microsatellite density is not strictly positively correlated with genome size. Although it was well known that the microsatellite density generally correlates positively with genome size [26, 36, 94], our contradictory results have been found in other studies [72, 83, 88, 95]. Overall, the comparative analysis of microsatellites indicated that there was great variation of microsatellite content across the 14 fish species. This might be indicative that differential selective constraints may play an important role in microsatellite evolution and result in the accumulated preference for different microsatellite types (Saeed2016&Ellegren2004& Schlötterer2000).
During genome evolution, microsatellite repeats mutation may provide a molecular mechanism for faster adaptation to environmental stress by increasing the quantities of DNA and providing the raw materials for adaptive evolution of organisms. Generally, microsatellite instability of dinucleotide repeats is higher than trinucleotide, tetranucleotide and pentanucleotide repeats . In other words, the mutation rate of microsatellite dependence on repeated unit length is biased from the trend of gradual decrease. This could explain the high numbers of mono−/di-nucleotide motif microsatellites and the low numbers of penta−/hexa-nucleotide motif microsatellites in the genomes. We should note that the frequency of tetranucleotide repeats was more than trinucleotide repeats in most of the 14 genomes. However, there was a trend that trinucleotide repeats were more frequent than tetranucleotide repeats in exonic regions, and less than tetranucleotide repeats in intronic and intergenic regions of most genomes. We suggest that the lower number of trinucleotide repeats cannot only be explained by conservation since they attribute triplet codes to form parts of genes. However, there may be a mechanism (e.g. mismatch repair system) in the exonic regions to maintain the higher number of trinucleotide repeats.
As is evident from Tables 2, 3, 4, 5, 6, 7, 8 and 9, poly(A/T) tracts were more common than poly(C/G) tracts in these genomes. Poly(A/T) tracts were particularly common in exonic and intergenic regions, but this was opposite in intronic regions of some taxa (e.g., Trub, Omyk and Okis) and this has also been observed in the human genome . The higher frequency of poly(A) tracts can be attributed to the re-integration of processed genes into the genome from mRNA with an attached poly(A) tail, while poly(C/G) are not part of this integrative mechanism. An alternative explanation is that a long A-rich tail is known to be necessary for the universal retrotransposon in eukaryotic genomes, such as Alu, LINE-1 and L1 retrotransposons [97,98,99]. Meanwhile, the formation of pseudogenes may attribute to this higher proportion of (A + T)-rich repeats [36, 100]. However, the mutation mechanism of microsatellite DNA provides a basis for this phenomenon. The variable frequencies of poly(A) and poly(C) could be due to the difference in stability between (GC) n and (AT) n repeats. (GC) n repeats are more stable than (AT) n repeats and hence it would be more difficult for the poly(C) sequences to slip during replication during the evolution of microsatellite DNA [6, 95, 101]. In the intronic regions, the higher than expected frequencies of poly(C/G) tracts in some species may be due to duplication events of key DNA sequences during evolution or the integrity of chromosomes may depend on a higher order DNA sequence organization that includes the presence of poly(C/G) tracts .
In the case of dimeric repeats, we found (AC) n tract was common and the (GC) n tract was rare. Assuming that, on the microsatellite DNA stability, (GC)-rich regions are relatively stable, there is less replication slippage generating the repeated motifs of microsatellites . On a genomic scale, microsatellite sequences are presumably at equilibrium, where (AC) n or (AG) n repeats should be more abundant than (AT) n or n repeats. However, we found the opposite distribution of microsatellite motifs in the genome of Amex. We suggest that there is interspecific variation in the mechanisms of mutation or repair of specific motifs  or there might be variation in the selective constraints that are associated with different microsatellite motifs .
Compared to other microsatellite motifs, the trinucleotide repeat undergoes strict regulation under evolutionary stress. While the (AAT) n tracts were common in intronic and intergenic regions of the fish genomes, (AGG) n tracts were typically more numerous than other repeat types in exons. Therefore, different genome fractions may characterize different microsatellite abundances resulting from the functions of genome evolution and selective constraints . Combined with the above, inconsistent distribution patterns where (ACT) n tracts were numerous in intergenic regions of Okis and (AAT) n tracts were common in exons of Drer and Ipun indicated that the distribution of microsatellites reflected the bias of the base composition in the genomes fractions. Other biases, such as the (CCG) n tracts in Trub and the (ACC) n tracts in Ccar, suggest that selective forces probably play various roles in specific genomes and differ from each other in a species-specific manner .
It should be noted that we found extremely rare (CCG) n and (ACG) n repeats in these genomes. A reasonable explanation for this rarity is the presence of the highly mutable CpG dinucleotide within the motif. Rarity of CpG is almost certainly a consequence of the methylation. In vertebrate genomes, a CpG-containing island occurs at about one-fifth of the expected frequency [105, 106] because between 60 and 90% of CpGs are methylated at the 5 position on the cytosine ring and there is a failure of the DNA repair mechanism to recognize deamination of 5-methylcytosine to produce thymine [107, 108]. However, experiments have shown that clusters of non-methylated CpG may attribute to the lack of CpG suppression in the HTF islands, where an approximate 1% DNA fraction accounted for the total genome from a variety of vertebrates [109, 110]. The HTF fraction is extremely rich in cleavable sites for mCpG-sensitive restriction enzymes and sequences chosen at random from the HTF fraction belong to islands of DNA several hundred base pairs long that contain CpG at more than 10 times its density in bulk DNA. This would help to explain the phenomenon that (ACG) n or (CCG) n tracts were abundant in introns of all fishes, in contrast to the rarity or absence of this motif in intergenic regions. An alternative explanation is that a specific mechanism exists to maintain the observed level of CpG-containing repeats in introns. The role of cytosine methylation in histone deacetylation, chromatin remodeling, and gene silencing may account for this phenomenon .
In the tetranucleotide microsatellites, the (AAAB) n tracts (B denotes any base other than A) seem to be more common, followed by 25% G + C content, and then 75% G + C content and 100% G + C content. Previous studies have indicated that DNA sequence composition could have a profound influence on microsatellite incidence [26, 33]. Kristitin et al. (2002) suggested that the G + C content of microsatellites might have influenced the mutation rate because the tetranucleotide repeats with 25% G + C content were not statistically different from each other, but each was significantly different from the repeats with 50% G + C content . Meanwhile, the attribution of selective forces and DNA mismatch repair system for the distribution patterns could not be ignored, because of several exceptions observed in our study, for example (ACAG) n tracts were abundant in Omky and Okis.
The longer microsatellites (5–6 nt) have an advantage of being more polymorphic than the shorter ones (1–4 nt), as mutation rates generally increase with an increase in the number of repeat units [33, 113]. The significant differences in the repeat types and motif length of microsatellites between studied fish species seems to be due to their genome-specific characteristics. In conclusion, though it remains unclear why certain repeat motifs are more common than others, or the reason they vary so much between different fish species, several observations presented here suggest that individual genomes and genome-specific regions may be characterized by unique microsatellite profiles. This was also supported by the reports of taxon-specific repeats or genome-specific region repeats [6, 36]. The study of microsatellites may help us understand numerous aspects of genome organization and functions.
Availability of data and materials
Genome data are available using the links provided by the Ensembl team (Astyanax mexicanus: http://ftp.ensembl.org/pub/release-103/fasta/astyanax_mexicanus/dna/; Danio rerio: http://ftp.ensembl.org/pub/release-103/fasta/danio_rerio/dna/; Cynoglossus semilaevis: http://ftp.ensembl.org/pub/release-103/fasta/cynoglossus_semilaevis/dna/; Esox Lucius: http://ftp.ensembl.org/pub/release-103/fasta/esox_lucius/dna/; Ictalurus punctatus: http://ftp.ensembl.org/pub/release-103/fasta/ictalurus_punctatus/dna/; Lepisosteus oculatus: http://ftp.ensembl.org/pub/release-103/fasta/lepisosteus_oculatus/dna/; Oryzias latipes: http://ftp.ensembl.org/pub/release-103/fasta/oryzias_latipes_hni/dna/; Oreochromis niloticus: http://ftp.ensembl.org/pub/release-103/fasta/oreochromis_niloticus/dna/; Poecilia reticulate: http://ftp.ensembl.org/pub/release-103/fasta/poecilia_reticulata/dna/; Takifugu rubripes: http://ftp.ensembl.org/pub/release-103/fasta/takifugu_rubripes/dna/), and NCBI team (Cyprinus carpio: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/951/615/GCF_000951615.1_common_carp_genome/; Nothobranchius furzeri: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/001/465/895/GCF_001465895.1_Nfu_20140520/; Oncorhynchus kisutch: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/002/021/735/GCF_002021735.2_Okis_V2/; Oncorhynchus mykiss: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/013/265/735/GCF_013265735.2_USDA_OmykA_1.1/).
Dieringer D, Schlötterer C. Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. Genome Res. 2003;13(10):2242–51. https://doi.org/10.1101/gr.1416703.
Zavodna M, Bagshaw A, Brauning R, Gemmell NJ. The effects of transcription and recombination on mutational dynamics of short tandem repeats. Nucleic Acids Res. 2018;46(3):1321–30. https://doi.org/10.1093/nar/gkx1253.
Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, et al. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res. 2015;25(5):736–49. https://doi.org/10.1101/gr.185892.114.
Ahmed MM, Shen C, Khan AQ, Wahid MA, Shaban M, Lin Z. A comparative genomics approach revealed evolutionary dynamics of microsatellite imperfection and conservation in genus Gossypium. Hereditas. 2017;154(1):1–12.
Hatcher E, Wang C, Lefkowitz E. Genome variability and gene content in chordopoxviruses: dependence on microsatellites. Viruses. 2015;7(4):2126–46. https://doi.org/10.3390/v7042126.
Subramanian S, Mishra RK, Singh L. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol. 2003;4(2):1–10.
Pearson CE, Edamura KN, Cleary JD. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet. 2005;6(10):729–42. https://doi.org/10.1038/nrg1689.
Gelsomino F, Barbolini M, Spallanzani A, Pugliese G, Cascinu S. The evolving role of microsatellite instability in colorectal cancer: a review. Cancer Treat Rev. 2016;51:19–26. https://doi.org/10.1016/j.ctrv.2016.10.005.
Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet. 2018;19(5):286–98. https://doi.org/10.1038/nrg.2017.115.
Chistiakov DA, Hellemans B, Volckaert FAM. Microsatellites and their genomic distribution, evolution, function and applications: a review with special reference to fish genetics. Aquaculture. 2006;255(1):1–29. https://doi.org/10.1016/j.aquaculture.2005.11.031.
Brouwer JR, Willemsen R, Oostra BA. Microsatellite repeat instability and neurological disease. BioEssays. 2009;31(1):71–83. https://doi.org/10.1002/bies.080122.
Gao FB, Richter JD. Microsatellite expansion diseases: repeat toxicity found in translation. Neuron. 2017;93(2):249–51. https://doi.org/10.1016/j.neuron.2017.01.001.
Sinden RR. Origins of instability. Nature. 2001;411(6839):757–8. https://doi.org/10.1038/35081234.
Dib C, Fauré S, Fizames C, Samson D, Drouot N, Vignal A, et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature. 1996;380(6570):152–4. https://doi.org/10.1038/380152a0.
Dietrich WF, Miller JC, Steen RG, Merchant M, Damron D, Nahf R, et al. A genetic map of the mouse with 4,006 simple sequence length polymorphisms. Nat Genet. 1994;7(2):220–45. https://doi.org/10.1038/ng0694supp-220.
Kaye C, Milazzo J, Rozenfeld S, Lebrun MH, Tharreau D. The development of simple sequence repeat markers for Magnaporthe grisea and their integration into an established genetic linkage map. Fungal Genet Biol. 2003;40(3):207–14. https://doi.org/10.1016/j.fgb.2003.08.001.
Ren P, Peng W, You W, Huang Z, Guo Q, Chen N, et al. Genetic mapping and quantitative trait loci analysis of growth-related traits in the small abalone Haliotis diversicolor using restriction-site-associated DNA sequencing. Aquaculture. 2016;454:163–70. https://doi.org/10.1016/j.aquaculture.2015.12.026.
Campoy JA, Ruiz D, Egea J, Rees DJG, Celton JM, Martínez-Gómez P. Inheritance of flowering time in apricot (Prunus armeniaca L.) and analysis of linked quantitative trait loci (QTLs) using simple sequence repeat (SSR) markers. Plant Mol Biol Rep. 2011;29(2):404–10. https://doi.org/10.1007/s11105-010-0242-9.
Chambers GK, Curtis C, Millar CD, Huynen L, Lambert DM. DNA fingerprinting in zoology: past, present, future. Invest Genet. 2014;5(1):1–11.
Rafiei V, Banihashemi Z, Jiménez-Díaz RM, Navas-Cortés JA, Landa BB, Jiménez-Gasco MM, et al. Comparison of genotyping by sequencing and microsatellite markers for unravelling population structure in the clonal fungus Verticillium dahliae. Plant Pathol. 2018;67(1):76–86. https://doi.org/10.1111/ppa.12713.
Bhargava A, Fuentes FF. Mutational dynamics of microsatellites. Mol Biotechnol. 2010;44(3):250–66. https://doi.org/10.1007/s12033-009-9230-4.
Vieira MLC, Santini L, Diniz AL, Munhoz CDF. Microsatellite markers: what they mean and why they are so useful. Genet Mol Biol. 2016;39(3):312–28. https://doi.org/10.1590/1678-4685-GMB-2016-0027.
Garner TWJ. Genome size and microsatellites: the effect of nuclear size on amplification potential. Genome. 2002;45(1):212–5. https://doi.org/10.1139/g01-113.
Hancock J. Microsatellites and other simple sequences: genomic context and mutational mechanisms. New York: Oxford University Press; 1999.
Primmer CR, Raudsepp T, Chowdhary BP, Moller AP, Ellegren H. Low frequency of microsatellites in the avian genome. Genome Res. 1997;7(5):471–82. https://doi.org/10.1101/gr.7.5.471.
Katti MV, Ranjekar PK, Gupta VS. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol. 2001;18(7):1161–7. https://doi.org/10.1093/oxfordjournals.molbev.a003903.
Karlin S, Brocchieri L, Bergman A, Mrázek J, Gentles AJ. Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci U S A. 2002;99(1):333–8. https://doi.org/10.1073/pnas.012608599.
Rockman MV, Wray GA. Abundant raw material for cis-regulatory evolution in humans. Mol Biol Evol. 2002;19(11):1991–2004. https://doi.org/10.1093/oxfordjournals.molbev.a004023.
Li YC, Korol AB, Fahima T, Nevo E. Microsatellites within genes: structure, function, and evolution. Mol Biol Evol. 2004;21(6):991–1007. https://doi.org/10.1093/molbev/msh073.
Hause RJ, Pritchard CC, Shendure J, Salipante SJ. Classification and characterization of microsatellite instability across 18 cancer types. Nat Med. 2016;22(11):1342–50. https://doi.org/10.1038/nm.4191.
Ranathunge C, Wheeler GL, Chimahusky ME, Kennedy MM, Morrison JI, Baldwin BS, et al. Transcriptome profiles of sunflower reveal the potential role of microsatellites in gene expression divergence. Mol Ecol. 2018;27(5):1188–99. https://doi.org/10.1111/mec.14522.
Orgel LE, Crick FHC. Selfish DNA: the ultimate parasite. Nature. 1980;284(5757):604–7. https://doi.org/10.1038/284604a0.
Ellegren H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004;5(6):435–45. https://doi.org/10.1038/nrg1348.
Rajendrakumar P, Biswal AK, Balachandran SM, Srinivasarao K, Sundaram RM. Simple sequence repeats in organellar genomes of rice: frequency and distribution in genic and intergenic regions. Bioinformatics. 2007;23(1):1–4. https://doi.org/10.1093/bioinformatics/btl547.
Kim CK, Lee GS, Mo JS, Bae SH, Lee TH. Molecular marker database for efficient use in agricultural breeding programs. Bioinformation. 2015;11(9):444–6. https://doi.org/10.6026/97320630011444.
Tóth G, Gáspári Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000;10(7):967–81. https://doi.org/10.1101/gr.10.7.967.
Metzgar D, Bytof J, Wills C. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000;10(1):72–80.
Wang Z, Weber JL, Zhong G, Tanksley SD. Survey of plant short tandem DNA repeats. Theor Appl Genet. 1994;88(1):1–6. https://doi.org/10.1007/BF00222386.
Varshney RK, Thiel T, Stein N, Langridge P, Graner A. In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett. 2002;7(2A):537–46.
Moran C. Microsatellite repeats in pig (Sus domestica) and chicken (Gallus domesticus) genomes. J Hered. 1993;84(4):274–80. https://doi.org/10.1093/oxfordjournals.jhered.a111339.
Jurka J, Pethiyagoda C. Simple repetitive DNA sequences from primates: compilation and analysis. J Mol Evol. 1995;40(2):120–6. https://doi.org/10.1007/BF00167107.
Lith HA, Zutphen LFM. Characterization of rabbit DNA micros extracted from the EMBL nucleotide sequence database. Anim Genet. 1996;27(6):387–95. https://doi.org/10.1111/j.1365-2052.1996.tb00505.x.
Hammock EAD, Young LJ. Microsatellite instability generates diversity in brain and sociobehavioral traits. Science. 2005;308(5728):1630–4. https://doi.org/10.1126/science.1111427.
Gylfe AE, Tuupanen S, Hänninen U, Kondelin J, Ristolainen H, Katainen R, et al. Abstract 5193: novel candidate oncogenes with mutation hot spots in microsatellite unstable colorectal cancer. Cancer Res. 2014;74(19):5193.
Gemayel R, Vinces MD, Legendre M, Verstrepen KJ. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev Genet. 2010;44(1):445–77. https://doi.org/10.1146/annurev-genet-072610-155046.
Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MG, et al. Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res. 2005;15(4):537–51. https://doi.org/10.1101/gr.3096505.
Mularoni L, Ledda A, Toll-Riera M, Albà MM. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res. 2010;20(6):745–54. https://doi.org/10.1101/gr.101261.109.
Gemayel R, Cho J, Boeynaems S, Verstrepen KJ. Beyond junk-variable tandem repeats as facilitators of rapid evolution of regulatory and coding sequences. Genes. 2012;3(3):461–80. https://doi.org/10.3390/genes3030461.
Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ. Unstable tandem repeats in promoters confer transcriptional evolvability. Science. 2009;324(5931):1213–6. https://doi.org/10.1126/science.1170097.
Morin GB. The human telomere terminal transferase enzyme is a ribonucleoprotein that synthesizes TTAGGG repeats. Cell. 1989;59(3):521–9. https://doi.org/10.1016/0092-8674(89)90035-4.
Casas-Vila N, Scheibe M, Freiwald A, Kappei D, Butter F. Identification of TTAGGG-binding proteins in Neurospora crassa, a fungus with vertebrate-like telomere repeats. BMC Genomics. 2015;16(1):1–9.
Sand L, Szuhai K, Hogendoorn P. Sequencing overview of Ewing sarcoma: a journey across genomic, epigenomic and transcriptomic landscapes. Int J Mol Sci. 2015;16(7):16176–215. https://doi.org/10.3390/ijms160716176.
Lai Y, Sun F. The relationship between microsatellite slippage mutation rate and the number of repeat units. Mol Biol Evol. 2003;20(12):2123–31. https://doi.org/10.1093/molbev/msg228.
Bachtrog D, Agis M, Imhof M, Schlötterer C. Microsatellite variability differs between dinucleotide repeat motifs—evidence from Drosophila melanogaster. Mol Biol Evol. 2000;17(9):1277–85. https://doi.org/10.1093/oxfordjournals.molbev.a026411.
Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R. Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Natl Acad Sci U S A. 1997;94(3):1041–6. https://doi.org/10.1073/pnas.94.3.1041.
Schug MD, Hutter CM, Wetterstrand KA, Gaudette MS, Mackay TF, Aquadro CF. The mutation rates of di-, tri- and tetranucleotide repeats in Drosophila melanogaster. Mol Biol Evol. 1998;15(12):1751–60. https://doi.org/10.1093/oxfordjournals.molbev.a025901.
Amos W, Flint J, Xu X. Heterozygosity increases microsatellite mutation rate, linking it to demographic history. BMC Genet. 2008;9(1):1–10.
Amos W. Heterozygosity increases microsatellite mutation rate. Biol Lett. 2016;12(1):20150902.
Primmer CR, Ellegren H, Saino N, Møller AP. Directional evolution in germline microsatellite mutations. Nat Genet. 1996;13(4):391–3. https://doi.org/10.1038/ng0896-391.
Ellegren H. Heterogeneous mutation processes in human microsatellite DNA sequences. Nat Genet. 2000;24(4):400–2. https://doi.org/10.1038/74249.
Whittaker JC, Harbord RM, Boxall N, Mackay I, Dawson G, Sibly RM. Likelihood-based estimation of microsatellite mutation rates. Genetics. 2003;164(2):781–7.
Seyfert AL, Cristescu MEA, Frisse L, Schaack S, Thomas WK, Lynch M. The rate and spectrum of microsatellite mutation in Caenorhabditis elegans and Daphnia pulex. Genetics. 2008;178(4):2113–21. https://doi.org/10.1534/genetics.107.081927.
Schlötterer C. Evolutionary dynamics of microsatellite DNA. Chromosoma. 2000;109(6):365–71. https://doi.org/10.1007/s004120000089.
Noble L. Microsatellites — evolution and applications. Heredity. 1999;83(5):633–4. https://doi.org/10.1038/sj.hdy.6886482.
Madesis P, Ganopoulos I, Tsaftaris A. Microsatellites: evolution and contribution. In: Kantartzi SK, Totowa NJ, editors. Microsatellites: Methods and Protocols. New York: Humana Press; 2013. p. 1–13.
Saeed AF, Wang R, Wang S. Microsatellites in pursuit of microbial genome evolution. Front Microbiol. 2016;6:1462.
Weber JL, Wong C. Mutation of human short tandem repeats. Hum Mol Genet. 1993;2(8):1123–8. https://doi.org/10.1093/hmg/2.8.1123.
Pearson CE, Sinden RR. Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA. Curr Opin Struct Biol. 1998;8(3):321–30. https://doi.org/10.1016/S0959-440X(98)80065-1.
Sinden RR. Biological implications of the DNA structures associated with disease-causing triplet repeats. Am J Hum Genet. 1999;64(2):346–53. https://doi.org/10.1086/302271.
Richard GF, Pâques F. Mini- and microsatellite expansions: the recombination connection. EMBO Rep. 2000;1(2):122–6. https://doi.org/10.1093/embo-reports/kvd031.
Charlesworth B, Sniegowski P, Stephan W. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature. 1994;371(6494):215–20. https://doi.org/10.1038/371215a0.
Liu S, Hou W, Sun T, Xu Y, Li P, Yue B, et al. Genome-wide mining and comparative analysis of microsatellites in three macaque species. Mol Gen Genomics. 2017;292(3):537–50. https://doi.org/10.1007/s00438-017-1289-1.
Xu Y, Li W, Hu Z, Zeng T, Shen Y, Liu S, et al. Genome-wide mining of perfect microsatellites and tetranucleotide orthologous microsatellites estimates in six primate species. Gene. 2018;643:124–32. https://doi.org/10.1016/j.gene.2017.12.008.
Xu Y, Hu Z, Wang C, Zhang X, Li J, Yue B. Characterization of perfect microsatellite based on genome-wide and chromosome level in rhesus monkey (Macaca mulatta). Gene. 2016;592(2):269–75. https://doi.org/10.1016/j.gene.2016.07.016.
Karaoglu H, Lee CMY, Meyer W. Survey of simple sequence repeats in completed fungal genomes. Mol Biol Evol. 2005;22(3):639–49. https://doi.org/10.1093/molbev/msi057.
Li C-Y, Liu L, Yang J, Li J-B, Su Y, Zhang Y, et al. Genome-wide analysis of microsatellite sequence in seven filamentous fungi. Interdiscip Sci: Comput Life Sci. 2009;1(2):141–50. https://doi.org/10.1007/s12539-009-0014-5.
Lim S, Notley-McRobb L, Lim M, Carter DA. A comparison of the nature and abundance of microsatellites in 14 fungal genomes. Fungal Genet Biol. 2004;41(11):1025–36. https://doi.org/10.1016/j.fgb.2004.08.004.
Murat C, Riccioni C, Belfiori B, Cichocki N, Labbé J, Morin E, et al. Distribution and localization of microsatellites in the Perigord black truffle genome and identification of new molecular markers. Fungal Genet Biol. 2011;48(6):592–601. https://doi.org/10.1016/j.fgb.2010.10.007.
Ohm RA, de Jong JF, Lugones LG, Aerts A, Kothe E, Stajich JE, et al. vanKuyk PA, Horton JS, Grigoriev IV, Wösten HAB. Genome sequence of the model mushroom Schizophyllum commune. Nat Biotechnol. 2010;28(9):957–63. https://doi.org/10.1038/nbt.1643.
Qian J, Xu H, Song J, Xu J, Zhu Y, Chen S. Genome-wide analysis of simple sequence repeats in the model medicinal mushroom Ganoderma lucidum. Gene. 2013;512(2):331–6. https://doi.org/10.1016/j.gene.2012.09.127.
Zhao X, Tan Z, Feng H, Yang R, Li M, Jiang J, et al. Microsatellites in different potyvirus genomes: survey and analysis. Gene. 2011;488(1):52–6. https://doi.org/10.1016/j.gene.2011.08.016.
Mrázek J, Guo X, Shah A. Simple sequence repeats in prokaryotic genomes. Proc Natl Acad Sci U S A. 2007;104(20):8472–7. https://doi.org/10.1073/pnas.0702412104.
Burranboina K, Abraham S, Murugan K, Bayyappa M, Yogisharadhya R, Raghavendra G. Genome wide identification and analysis of microsatellite repeats in the largest DNA viruses (Poxviridae family): an insilico approach. Annu Res Rev Biol. 2018;22(1):1–11. https://doi.org/10.9734/ARRB/2018/38367.
Zhou L, Deng L, Fu Y, Wu X, Zhao X, Chen Y, et al. Comparative analysis of microsatellites and compound microsatellites in T4-like viruses. Gene. 2016;575(2):695–701. https://doi.org/10.1016/j.gene.2015.09.053.
Du L, Zhang C, Liu Q, Zhang X, Yue B. Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design. Bioinformatics. 2018;34(4):681–3. https://doi.org/10.1093/bioinformatics/btx665.
Kofler R, Schlötterer C, Lelley T. SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007;23(13):1683–5. https://doi.org/10.1093/bioinformatics/btm157.
Luo W, Nie Z, Zhan F, Wei J, Wang W, Gao Z. Rapid development of microsatellite markers for the endangered fsh Schizothorax biddulphi (Günther) using next generation sequencing and cross-species amplification. Int J Mol Sci. 2012;13(11):14946–55. https://doi.org/10.3390/ijms131114946.
Huang J, Li W, Jian Z, Yue B, Yan Y. Genome-wide distribution and organization of microsatellites in six species of birds. Biochem Syst Ecol. 2016;67:95–102. https://doi.org/10.1016/j.bse.2016.05.023.
Cai G, Leadbetter CW, Muehlbauer MF, Molnar TJ, Hillman BI. Genome-wide microsatellite identification in the fungus Anisogramma anomala using Illumina sequencing and genome assembly. PLoS One. 2013;8(11):e82408. https://doi.org/10.1371/journal.pone.0082408.
Wang Y, Chen M, Wang H, Wang JF, Bao D. Microsatellites in the genome of the edible mushroom, Volvariella volvacea. Biomed Res Int. 2014;2014:1–10.
Webster MT, Smith NGC, Ellegren H. Microsatellite evolution inferred from human– chimpanzee genomic sequence alignments. Proc Natl Acad Sci U S A. 2002;99(13):8748–53. https://doi.org/10.1073/pnas.122067599.
Pascual M, Schug MD, Aquadro CF. High density of long dinucleotide microsatellites in Drosophila subobscura. Mol Biol Evol. 2000;17(8):1259–67. https://doi.org/10.1093/oxfordjournals.molbev.a026409.
Schlötterer C, Harr B. Drosophila virilis has long and highly polymorphic microsatellites. Mol Biol Evol. 2000;17(11):1641–6. https://doi.org/10.1093/oxfordjournals.molbev.a026263.
Hancock JM. Simple sequences in a ‘minimal ’ genome. Nat Genet. 1996;14(1):14–5. https://doi.org/10.1038/ng0996-14.
Qi WH, Jiang XM, Du LM, Xiao GS, Hu TZ, Yue BS, et al. Genome-wide survey and analysis of microsatellite sequences in bovid species. PLoS One. 2015;10(7):e0133667. https://doi.org/10.1371/journal.pone.0133667.
Perinchery G, Nojima D, Goharderakhshan R, Tanaka Y, Alonzo J, Dahiya R. Microsatellite instability of dinucleotide tandem repeat sequences is higher than trinucleotide, tetranucleotide and pentanucleotide repeat sequences in prostate cancer. Int J Oncol. 2000;16(6):1203–9.
Borodulina OR, Golubchikova JS, Ustyantsev IG, Kramerov DA. Polyadenylation of RNA transcribed from mammalian SINEs by RNA polymerase III: complex requirements for nucleotide sequences. Biochim Biophys Acta. 2016;1859(2):355–65. https://doi.org/10.1016/j.bbagrm.2015.12.003.
Kaessmann H, Vinckenbosch N, Long M. RNA-based gene duplication: mechanistic and evolutionary insights. Nat Rev Genet. 2009;10(1):19–31. https://doi.org/10.1038/nrg2487.
Richardson SR, Morell S, Faulkner GJ. L1 retrotransposons and somatic mosaicism in the brain. Annu Rev Genet. 2014;48(1):1–27. https://doi.org/10.1146/annurev-genet-120213-092412.
Prasad MD. Survey and analysis of microsatellites in the silkworm, Bombyx mori: frequency, distribution, mutations, marker potential and their conservation in heterologous species. Genetics. 2005;169(1):197–214. https://doi.org/10.1534/genetics.104.031005.
Gur-Arie R, Cohen CJ, Eitan Y, Shelef L, Hallerman EM, Kashi Y. Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res. 2000;10(1):62–71.
Murray V. The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. Comput Biol Chem. 2015;54:13–7. https://doi.org/10.1016/j.compbiolchem.2014.11.006.
Schlötterer C, Tautz D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 1992;20(2):211–5. https://doi.org/10.1093/nar/20.2.211.
Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002;30(2):194–200. https://doi.org/10.1038/ng822.
Russell GJ, Walker PMB, Elton RA, Subak-Sharpe JH. Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol. 1976;108(1):1–20. https://doi.org/10.1016/S0022-2836(76)80090-3.
Swartz MN, Trautner TA, Kornberg A. Enzymatic synthesis of deoxyribonucleic acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J Biol Chem. 1962;237(6):1961–7. https://doi.org/10.1016/S0021-9258(19)73967-2.
Coulondre C, Miller JH, Farabaugh PJ, Gilbert W. Molecular basis of base substitution hotspots in Escherichia coli. Nature. 1978;274(5673):775–80. https://doi.org/10.1038/274775a0.
Bird AP. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8(7):1499–504. https://doi.org/10.1093/nar/8.7.1499.
Cooper DN, Taggart MH, Bird AP. Unmethlated domains in vertebrate DNA. Nucleic Acids Res. 1983;11(3):647–58. https://doi.org/10.1093/nar/11.3.647.
Bird A, Taggart M, Frommer M, Miller OJ, Macleod D. A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell. 1985;40(1):91–9. https://doi.org/10.1016/0092-8674(85)90312-5.
Razin A. CpG methylation, chromatin structure and gene silencing—a three-way connection. EMBO J. 1998;17(17):4905–80. https://doi.org/10.1093/emboj/17.17.4905.
Eckert KA, Yan G, Hile SE. Mutation rate and specificity analysis of tetranucleotide microsatellite DNA alleles in somatic human cells. Mol Carcinog. 2002;34(3):140–50. https://doi.org/10.1002/mc.10058.
Wierdl M, Dominska M, Petes TD. Microsatellite instability in yeast: dependence on the length of the microsatellite. Genetics. 1997;146(3):769–79. https://doi.org/10.1093/genetics/146.3.769.
We would like to thank Lianming Du for assistance with microsatellite extraction.
This work was supported by the Yalong River Hydropower Development Company, Ltd. (No. YLDC-ZBA-2018116).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lei, Y., Zhou, Y., Price, M. et al. Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions. BMC Genomics 22, 421 (2021). https://doi.org/10.1186/s12864-021-07752-6
- Genomic regions
- Distribution patterns
- Fish species