Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions

Lei, Yi; Zhou, Yu; Price, Megan; Song, Zhaobin

doi:10.1186/s12864-021-07752-6

Research
Open access
Published: 07 June 2021

Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions

Yi Lei¹,
Yu Zhou¹,
Megan Price¹ &
…
Zhaobin Song^1,2

BMC Genomics volume 22, Article number: 421 (2021) Cite this article

3196 Accesses
18 Citations
1 Altmetric
Metrics details

Abstract

Background

Microsatellite repeats are ubiquitous in organism genomes and play an important role in the chromatin organization, regulation of gene activity, recombination and DNA replication. Although microsatellite distribution patterns have been studied in most phylogenetic lineages, they are unclear in fish species.

Results

Here, we present the first systematic examination of microsatellite distribution in coding and non-coding regions of 14 fish genomes. Our study showed that the number and type of microsatellites displayed nonrandom distribution for both intragenic and intergenic regions, suggesting that they have potential roles in transcriptional or translational regulation and DNA replication slippage theories alone were insufficient to explain the distribution patterns. Our results showed that microsatellites are dominant in non-coding regions. The total number of microsatellites ranged from 78,378 to 1,012,084, and the relative density varied from 4925.76 bp/Mb to 25,401.97 bp/Mb. Overall, (A + T)-rich repeats were dominant. The dependence of repeat abundance on the length of the repeated unit (1–6 nt) showed a great similarity decrease, whereas more tri-nucleotide repeats were found in exonic regions than tetra-nucleotide repeats of most species. Moreover, the incidence of different repeated types appeared species- and genomic-specific. These results highlight potential mechanisms for maintaining microsatellite distribution, such as selective forces and mismatch repair systems.

Conclusions

Our data could be beneficial for the studies of genome evolution and microsatellite DNA evolutionary dynamics, and facilitate the exploration of microsatellites structural, function, composition mode and molecular markers development in these species.

Peer Review reports

Background

Microsatellites, also termed as simple sequence repeats (SSRs), are short tandemly repeated sequences with 1-to-6 base pair (bp) motifs [1, 2]. They are ubiquitous and highly abundant in eukaryote, prokaryote and virus genomes [3,4,5], making up around 3% of the human genome [6]. Microsatellite instability is an important and unique form of mutation that is responsible for, or strongly implicated in, over 40 human neurological, neurodegenerative and neuromuscular disorders [7] and associations have also been observed in other complex diseases [2, 3, 8, 9]. Undoubtedly, microsatellites have attracted considerable attention due to their roles in the organization of chromosome structure, DNA recombination and replication, and gene expression and cell cycle dynamics [10].

Microsatellite analysis is used for a wide range of biological questions. Unique polymorphism of normal and disease-causing repeats can be used for disease diagnosis and prognosis [11,12,13]. Microsatellite repeats are advantageous as genetic markers due to their high polymorphism, informativeness and co-dominance, and have been used to construct quantitative trait loci (QTL) maps, genetic linkage maps [14,15,16,17,18] and DNA fingerprinting [19]. These features also provide the foundation for their successful application in other fundamental and applied fields of biology, including population and conservation genetics, genetic dissection of complex traits and marker-assisted breeding programs [10, 20,21,22].

Microsatellite content generally correlates positively with genome size [23,24,25]. The distribution of microsatellites exhibit different properties in genomes with different functionality [26,27,28,29,30,31], contradicting earlier studies stating that they are randomly distributed and simply represent “junk” DNA sequences [32]. Microsatellites are ubiquitously distributed across the entire genome, including protein-coding and non-coding regions [6, 33,34,35]. Previous studies have indicated that microsatellite occurrence differs significantly in coding and non-coding regions [36], and some microsatellite types were preferred and often common in genome-specific regions [26, 29]. Excessive microsatellite repeats occur in non-coding regions of eukaryotic organisms [37], whereas they are relatively rare in coding regions, ranging between 7 and 10% of higher plants [38, 39] and between 9 and 15% of vertebrates [40,41,42]. Meanwhile, multiple studies have demonstrated that the hotspots of microsatellite distribution may be related with various phenotypic traits [43, 44]. In the genome of Saccharomyces cerevisiae about 17% of genes contain microsatellite repeats in open reading frames (ORFs) [45, 46] and the repeats are specifically enriched in regulatory genes that encode transcription factors, DNA-RNA binding proteins and chromatin modifiers [47]. Microsatellite repeats in cis-regulatory elements and promoters, which frequently occur (e.g., ~ 25% promoters in yeast contain tandem repeats), regulate the process of gene expression [48, 49]. The (TTAGGG) n tracts constitute a substantial portion of the telomeric regions and are recognized by telomerase, which can be related to stability of chromosomes and nucleolus organizing regions [10, 50, 51].

xMicrosatellites are inherently unstable with high mutation rates from about 10^− 6 to 10^− 2 per locus per generation, resulting from DNA replication slippage [52, 53]. Mutation rates vary among microsatellite types (perfect, compound or interrupted), base composition of the repeat [54], repeat types (di-, tri- and tetranucleotide) [55, 56] and lengths [21], and heterozygosity [57, 58], but also among chromosome position, cell division, the GC content in flanking DNA and taxonomic groups [59,60,61,62]. Microsatellite instability has a strong influence on genomic microsatellite abundance and various functions and is explained by two mutually exclusive mutational mechanisms: (i) DNA replication slippage theory suggests that during DNA replication, the nascent and template strand realign out of register, and if DNA synthesis continues unabated on this molecule the repeat number of the microsatellite is altered [21, 63, 64]. The stability of the slipped structure has been maintained by hairpin, triplex, cruciform or quadruplex arrangement of DNA strands [65,66,67,68,69]. (ii) Unequal recombination theory assumes that large scale contractions and expansions of the repeat array involved the processes of DNA unequal recombination, including unequal crossing over and gene conversion [70], via a number of transposable elements, the best known are Alu and other short interspersed elements [64, 71]. Non-reciprocal recombination, random genetic drift and selective forces could have a significant effect on the accumulation of tandem-repetitive sequences in genomes [63, 65, 70].

So far, systematic research regarding microsatellite variation and characterization have been conducted on phylogenetic lineages, including humans [41], primates [72,73,74], plants and fungi [36, 75,76,77,78,79,80,81,82], and viruses [83, 84]. Yet microsatellite distribution patterns in fishes, an important branch of biological evolution, remained unclear. Here, 14 fish genomes have been used to indicate the microsatellite distribution patterns. The main objectives of the present study were to examine the distribution patterns of microsatellite in different fish genomes. The specific aims were 1) to examine the abundance and frequency of microsatellites in several important fish genomes, and 2) to compare the compositional differences of microsatellites in different taxa and genome-specific regions. We anticipate our study will provide foundational knowledge of microsatellite dynamics in fish species, helping us to better understand microsatellite distribution, and provide strong support for further exploration of genome structure and microsatellite functions.

Materials and methods

Genomic sequences

Genome sequences from 14 fish species, including model fishes (Danio rerio, Oryzias latipes, Astyanax mexicanus), commercial species (Cyprinus carpio, Oncorhynchus mykiss, Oncorhynchus kisutch, Oreochromis niloticus, Ictalurus punctatus, Esox Lucius, Cynoglossus semilaevis,), ornamental fishes (Poecilia reticulate, Takifugu rubripes, Nothobranchius furzeri), and “living fossil” fish species (Lepisosteus oculatus), were used in this study. Most genome sequences were downloaded from the Ensembl Genome Browser (Ensembl, Available online: http://asia.ensembl.org/index.html). The sequences of Cyprinus carpio, Nothobranchius furzeri, Oncorhynchus kisutch, and Oncorhynchus mykiss were obtained from the National Centre for Biotechnology Information (NCBI. Available online: http://www.ncbi.nlm.nih.gov/). We also obtained genome annotations to identify microsatellite locations in the genomes. The genomic (chromosomal) sequences that had complete genome annotations were included in this study. We filtered the unknown bases (Ns) in genome sequences using the Perl script and obtained the valid length of sequences for further analysis. The details of the genome sequences are listed in Table 1.

Table 1 Data source and genome sizes of the 14 fish species studied in the present study

Full size table

Microsatellite identification

Microsatellites were identified from genome sequences using the Krait v0.9.0 program, a robust and ultrafast tool with a user-friendly graphic interface for genome-wide investigation of microsatellites [85]. We employed the perfect search model of the program to investigate the all motifs according to minimum repeats or minimum length of microsatellite. In the present study, we defined the perfect microsatellites as being mononucleotide repeats ≥12-bp, dinucleotide repeats ≥14-bp, trinucleotide repeats ≥15-bp, tetranucleotide repeats ≥16-bp, pentanucleotide repeats ≥20-bp and hexanucleotidue repeats ≥24-bp, and the length of flanking sequence was constrained to 200 bp, as previously described [72, 74]. We mainly examined the distribution of perfect repeats ≥12-bp long. The rationale for choosing the small cutoff value was that the microsatellites are often disrupted by single base substitutions [6, 33]. The occurrence of repeats in exons, introns and intergenic regions have been identified from the annotations of the 14 fish genome sequences using Perl scripts. The SciRoKo software tool [86] and the NCBI Graphical Sequence Viewer program (https://www.ncbi.nlm.nih.gov/projects/sviewer/) were employed to increase the reliability of the results for examined microsatellite repeats.

Repeats with unit patterns being circular per-mutations and/or reverse complements of each other were grouped together as one type. The total number of the non-overlapping type was 501 for 1–6 nt long motifs, with 1-nt motif containing two 2 types: A and C (A = T and C = G), 2-nt motif containing 4 types: AT, AG, AC and GC (AT = TA, AG = GA = CT = TC, AC=CA = GT = TG, and GC=CG), and 3–6-nt motif containing 10, 33, 102 and 350 types [41, 87].

Results

Distribution patterns of microsatellite repeats in the fish genomes

We examined the number, relative frequency (microsatellite numbers per Mb of the sequence), relative density (total microsatellite length per Mb of the sequence), GC content and the coverage degree (percentage of total microsatellites length in sequence) of microsatellites with motif lengths of 1–6 nucleotides in the 14 fish genomes (Table 2). We assigned 4-letter name abbreviations to the 14 species and these have been henceforth used to simplify results and discussion (e.g. Danio rerio = Drer, O. niloticus = Onil; see Table 2). The total number of microsatellites, ranging from 78,378 (Locu) to 1,012,084 (Drer), differed between fish species and the coverage degree varied from 0.18% (Locu) to 5.29% (Trub) (Table 2). The lowest relative frequency and relative density of microsatellites were both found in Olat (249.99 loci/Mb and 4925.76 bp/Mb, respectively) (Table 2). The highest relative frequency and density of microsatellites was found in Csem (3445.94 loci/Mb) and Ipun (25,401.97 bp/Mb), respectively. The GC content ranged from 10.94% (Trub) to 48.20% (Okis) (Table 2).

Table 2 Microsatellite distribution as frequency, density and GC content of different fish genomes

Full size table

The main distribution pattern of di- (mononucleotide SSRs) > mono- (dinucleotide SSRs) > tetra- (trinucleotide SSRs) > tri- (tetranucleotide SSRs) > penta- (pentanucleotide SSRs) > hexanucleotide (hexanucleotide SSRs) was shared by six fish genomes (i.e., Drer, Trub, Ipun, Amex, Eluc, and Omyk), while a mono- > di- > tetra- > tri- > penta- > hexanucleotide pattern was observed in Olat, Ccar and Pret (Table 2). The di- > mono- > tri- > tetra- > penta- > hexanucleotide pattern was shared by Onil and Csem, whereas Nfur exhibited a di- > tetra- > mono- > tri- > penta- > hexanucleotide pattern (Table 2). The 1-nt or 2-nt repeats had a higher percentage motif abundance in the fish genomes than any other motif length, while the 2-nt repeats represented more than 60% motif abundance in Omyk, Eluc and Okis (Table 3). There was an almost equal distribution of motif abundance percentages between the first three motifs (1–3 nt) in Locu, with 3-nt and 1-nt repeats being almost identical (34.92, 34.99%, respectively). The percentage of 4-nt repeats was remarkably uniform across all taxa except for Drer and Locu, which had marginally greater or lesser percentages of this motif length (Table 3). Microsatellites with longer motifs (5–6 nt) showed lower percentages compared to the short motif repeats (1–4 nt). The 6-nt repeats had the lowest percentages among these motif lengths, ranging from 0.21% (Drer) to 0.85% (Trub) (Table 3).

Table 3 Microsatellite distribution as percent motif abundance (%) among 14 genomes

Full size table

Mononucleotide repeats

Motif abundance percentages of mononucleotide repeats within intergenic regions, introns and exons varied across species, with intergenic regions ranging from 0.19% (Okis) to 7.19% (Trub), introns ranging from 9.87% (Nfur) to 45.06% (Olat) and exons ranging from 0.37% (Okis) to 3.58% (Pret) (Table 3). Among the two types of mononucleotide repeats, poly(A/T) was generally far more abundant than poly(C/G) in these fish genomes, except that the reverse was found in the Trub, Omyk and Okis genome sequences (Supplement Table 1, Tables 4 and 5). Drer had the maximum repeat number of A (or T) (192,264) followed by Ccar, Ipun, Amex and Pret. Pret contained the maximum number (4549) of C (or G). Although poly(A/T) tracts were clearly more abundant than poly(C/G) in exons (Table 4), this difference was not consistently observed in introns (Table 5) and intergenic regions (Table 6).

Table 4 Total numbers of Mono-, Di-, and Trinucleotide repeats in exons among 14 fish genomes

Full size table

Table 5 Total numbers of Mono-, Di-, and Trinucleotide repeats in introns among 14 fish genomes

Full size table

Table 6 Total numbers of Mono-, Di-, and Trinucleotide repeats in intergenic regions among 14 fish genomes

Full size table

Dinucleotide repeats

Among genome-specific regions, there was a lower percentage of dimer repeats (AT, AC, AG, CG) in exons compared to non-coding regions, ranging from 0.37% (Ccar) to 1.77% (Onil). Within the non-coding regions, intronic regions have a higher proportion of dinucleotide repeats compared to the intergenic regions (Table 3). We found that (AC) n repeats were generally more numerous in specific genomic regions, except that Amex had greater numbers of (AG) n repeats in exonic regions and Okis had greater (AT) n repeats in intronic regions (Tables 4 and 5). The number of (AT) n repeats observed the greatest variation between genome-specific regions and species. For example, intronic or intergenic regions of Drer have similar numbers of (AT) n repeats to (AG) n repeats, whereas exon numbers of (AT) n repeats were considerably less than (AG) n repeats. Olat had more (AT) n repeats than (AG) n in exons, but the opposite was found in other genomes. Finally, (CG) n repeats were very infrequent or absent in these genomes.

Trinucleotide repeats

Motif abundance percentages of trinucleotide repeats in the exons of six fish species were greater than in intergenic regions, these six species being Olat (1.90%), Csem (1.81%), Pret (1.63%), Onil (1.53%) and Eluc (0.78%) (Table 3). Meanwhile, motif abundance percentages of trinucleotide repeats in the exons of Locu, Csem and Olat were greater than other motif lengths (e.g. mono-, di-). Among the different trinucleotide repeats, (AAT) n repeats were generally the most numerous repeats in intronic and intergenic regions of different taxa (Tables 5 and 6), except for Okis where (ACT) n repeats were the most numerous in intergenic regions (Table 6). There was no one trinucleotide repeat in exonic regions that was typically more numerous than another across the different fish species. For example, (AAT) n repeats were most numerous in Drer and Ipun, while (ATC) n repeats were greater in Ccar and Nfur and (AGG) n repeats were greatest in the 10 remaining species (Table 4). Repeats such as ACT, ACT, AGC, ACG and CCG were generally in low numbers in each specific genomic region. Furthermore, CCG repeats were absent in the intergenic regions of Eluc, Omyk, Ccar and Okis (Table 6).

Tetranucleotide repeats

Tetranucleotide repeats were frequent in each genomic region and were generally dependent on the base composition of the repeat unit (Tables 7, 8, 9 and Supplement Table 1). Overall, repeats with > 50% of A + T (e.g. AAAT, ATAG and AATC repeats) were more abundant in studied fish genomes (Supplement Table 1). There were, however, a few notable exceptions. For example, (ACAG) n repeats were the most numerous in Eluc, Omyk and Okis (Supplement Table 1). We found that the (AAAB) n repeats (where B denotes any base other than A) were most numerous in exonic regions in five fish species (i.e. Olat, Drer, Onil, Ccar and Ipun), the (ACAG) n repeats were numerous in Eluc, Omyk and Okis, and (ATCC) n repeats were most common of the remaining four fish species (Table 7). Similar to exons, the most common tetranucleotide repeat in intergenic regions was (AAAB) n, except for (ATCC) n in Olat, (AATC) n in Eluc and (ATAG) n in Amex (Table 9). In introns, (AATB) n or (ACAG) n were the most common tetranucleotide repeats in studied fish (Table 8). We also found some repeats with > 50% of C + G (e.g. ACGC, AGGG and AGCG repeats) were in the top 50% of tetranucleotide repeats in specific genome regions (Tables 7, 8, 9).

Table 7 The most frequent Tetra-, Penta-, and Hexanucleotide repeats in introns^a

Full size table

Table 8 The most frequent Tetra-, Penta-, and Hexanucleotide repeats in exons^a

Full size table

Table 9 The most frequent Tetra-, Penta-, and Hexanucleotide repeats in intergenic regions^a

Full size table

Pentanucleotide repeats

As expected, the occurrence pentanucleotide repeats was less than tetranucoeitde repeats in different genome regions. We found a general distribution pattern of pentanucleotide repeats for all species, where (A + T)-rich repeats were the most abundant. Yet, we still found notable exceptions where (C + G)-rich repeats were dominant in specific genomic locations, including AGAGG and ACTGG in introns or intergenic regions of Trub, Csem and Okis and ACTGC in exons of Eluc (Tables 7, 8, 9). Although AGAGG repeats in introns and exons were relatively abundant in Csem, it was also the only species that lacked this repeat in intergenic regions in this study (Supplement Table 1). We also found that the CpG-containing repeats were present in the top 50% of pentanucleotide repeats, including (ATACG) n or (CCCGG) n tracts in intronic regions of Eluc and Locu, (CCCGG) n, (AATCG) n or (ACCGG) n tracts in exonic regions of Trub, Amex and Pret, and (ATACG) n or (ACCGG) n tracts in intergenic regions of Eluc and Pret (Tables 7, 8, 9).

Hexanucleotide repeats

Hexanucleotide repeats were the least numerous in specific genomic regions, except for the exons of Trub (Table 3). In exonic and intronic regions, a dominance of (C + G)-rich repeats was found in the majority of the genomes (Tables 7 and 8). The repeat motifs present in intergenic regions were highly variable and relatively (A + T)-rich (Table 9). Except for in Olat, Onil, Ccar, Okis and Okis, the CpG-containing repeats were common in the top 50% of hexanucleotide repeats in intronic and exonic regions, and half of species had CpG-containing in the top 50% of hexanucleotide repeats in intergenic regions (Tables 7, 8, 9). A few telomere-like repeats were found in introns or intergenic regions, excluding Pret. However, the (AATCCC) n and (AACCCT) n tracts were observed in exonic regions of Trub and Omyk, respectively (Table 8).

Iteration number and length distribution of microsatellites in fish genomes

Iteration number and length of microsatellites are both important factors determining microsatellite mutation rates, and it could be extremely important not only for genomic stability, but also with regard to the evolution of additional genomic features such as codon usage. To assess expandability of the repeats, iteration number of microsatellites was plotted against microsatellite length of various quantity intervals: <20, 20–50, 50–100, 100–200, 200–300, and >300 (Fig. 1). The details of all iteration numbers and densities of microsatellites in fish genomes are given in the Supplement Table 2. Usually, the frequency of microsatellites has a tendency to converge to a small iteration number. In other words, short microsatellites were observed more frequently in the fish genomes than long microsatellites. When the iteration number was less than 20, the repeat tracts varying motif lengths from mono- to hexa-nucleotide (1–6 nt) comprised more than 83.93, 67.22, 90.38, 88.93, 92.58 and 90.42%, respectively (Fig. 1 and Supplement Table 2). However, a few special microsatellites were found where the iteration number exceeded 300, for example 1-nt microsatellites in Csem, Eluc, Amex, Ccar and Okis, 2-nt microsatellites in Drer, Csem, Eluc, Nfur, Amex, Ccar and Okis, 3-nt microsatellites in Nfur, Amex, Ccar, 4-nt microsatellites in Drer, Csem, Nfur, Amex and Ccar, 5-nt microsatellites in Amex and Ccar, and 6-nt microsatellites in Csem, Amex and Ccar (Supplement Table 2).

Discussion

In this study, we examined the microsatellites composed of motifs 1–6 bp long in the entire genomes of 14 fish species and analyzed their distribution and frequency in different genomic regions. Microsatellite occurrence significantly differed with the coverage degree varying from 0.18 to 5.29%. Comparison of microsatellite repeat occurrence in the genomes of humans (3%) [6], primates (0.83–0.88%) [72,73,74], birds (0.13–0.49%) [88], plants and fungi (0.04–0.15%) [75, 76, 80, 89, 90], with our data indicates that microsatellite occurrence differs between different species and this might be a general phenomenon across taxa [33]. In fact, differences might even occur between closely related species as humans and chimpanzees [91], and within the genus of Drosophila [92, 93].

Another clear trend to emerge from this analysis was that the observed dependence of microsatellite abundance on repeated unit length and iteration number was very much biased from the expected trend of gradual decrease, which was consistent with a previous study [36]. Our research also indicated that microsatellite density is not strictly positively correlated with genome size. Although it was well known that the microsatellite density generally correlates positively with genome size [26, 36, 94], our contradictory results have been found in other studies [72, 83, 88, 95]. Overall, the comparative analysis of microsatellites indicated that there was great variation of microsatellite content across the 14 fish species. This might be indicative that differential selective constraints may play an important role in microsatellite evolution and result in the accumulated preference for different microsatellite types (Saeed2016&Ellegren2004& Schlötterer2000).

During genome evolution, microsatellite repeats mutation may provide a molecular mechanism for faster adaptation to environmental stress by increasing the quantities of DNA and providing the raw materials for adaptive evolution of organisms. Generally, microsatellite instability of dinucleotide repeats is higher than trinucleotide, tetranucleotide and pentanucleotide repeats [96]. In other words, the mutation rate of microsatellite dependence on repeated unit length is biased from the trend of gradual decrease. This could explain the high numbers of mono−/di-nucleotide motif microsatellites and the low numbers of penta−/hexa-nucleotide motif microsatellites in the genomes. We should note that the frequency of tetranucleotide repeats was more than trinucleotide repeats in most of the 14 genomes. However, there was a trend that trinucleotide repeats were more frequent than tetranucleotide repeats in exonic regions, and less than tetranucleotide repeats in intronic and intergenic regions of most genomes. We suggest that the lower number of trinucleotide repeats cannot only be explained by conservation since they attribute triplet codes to form parts of genes. However, there may be a mechanism (e.g. mismatch repair system) in the exonic regions to maintain the higher number of trinucleotide repeats.

As is evident from Tables 2, 3, 4, 5, 6, 7, 8 and 9, poly(A/T) tracts were more common than poly(C/G) tracts in these genomes. Poly(A/T) tracts were particularly common in exonic and intergenic regions, but this was opposite in intronic regions of some taxa (e.g., Trub, Omyk and Okis) and this has also been observed in the human genome [6]. The higher frequency of poly(A) tracts can be attributed to the re-integration of processed genes into the genome from mRNA with an attached poly(A) tail, while poly(C/G) are not part of this integrative mechanism. An alternative explanation is that a long A-rich tail is known to be necessary for the universal retrotransposon in eukaryotic genomes, such as Alu, LINE-1 and L1 retrotransposons [97,98,99]. Meanwhile, the formation of pseudogenes may attribute to this higher proportion of (A + T)-rich repeats [36, 100]. However, the mutation mechanism of microsatellite DNA provides a basis for this phenomenon. The variable frequencies of poly(A) and poly(C) could be due to the difference in stability between (GC) n and (AT) n repeats. (GC) n repeats are more stable than (AT) n repeats and hence it would be more difficult for the poly(C) sequences to slip during replication during the evolution of microsatellite DNA [6, 95, 101]. In the intronic regions, the higher than expected frequencies of poly(C/G) tracts in some species may be due to duplication events of key DNA sequences during evolution or the integrity of chromosomes may depend on a higher order DNA sequence organization that includes the presence of poly(C/G) tracts [102].

In the case of dimeric repeats, we found (AC) n tract was common and the (GC) n tract was rare. Assuming that, on the microsatellite DNA stability, (GC)-rich regions are relatively stable, there is less replication slippage generating the repeated motifs of microsatellites [103]. On a genomic scale, microsatellite sequences are presumably at equilibrium, where (AC) n or (AG) n repeats should be more abundant than (AT) n or n repeats. However, we found the opposite distribution of microsatellite motifs in the genome of Amex. We suggest that there is interspecific variation in the mechanisms of mutation or repair of specific motifs [63] or there might be variation in the selective constraints that are associated with different microsatellite motifs [33].

Compared to other microsatellite motifs, the trinucleotide repeat undergoes strict regulation under evolutionary stress. While the (AAT) n tracts were common in intronic and intergenic regions of the fish genomes, (AGG) n tracts were typically more numerous than other repeat types in exons. Therefore, different genome fractions may characterize different microsatellite abundances resulting from the functions of genome evolution and selective constraints [104]. Combined with the above, inconsistent distribution patterns where (ACT) n tracts were numerous in intergenic regions of Okis and (AAT) n tracts were common in exons of Drer and Ipun indicated that the distribution of microsatellites reflected the bias of the base composition in the genomes fractions. Other biases, such as the (CCG) n tracts in Trub and the (ACC) n tracts in Ccar, suggest that selective forces probably play various roles in specific genomes and differ from each other in a species-specific manner [36].

It should be noted that we found extremely rare (CCG) n and (ACG) n repeats in these genomes. A reasonable explanation for this rarity is the presence of the highly mutable CpG dinucleotide within the motif. Rarity of CpG is almost certainly a consequence of the methylation. In vertebrate genomes, a CpG-containing island occurs at about one-fifth of the expected frequency [105, 106] because between 60 and 90% of CpGs are methylated at the 5 position on the cytosine ring and there is a failure of the DNA repair mechanism to recognize deamination of 5-methylcytosine to produce thymine [107, 108]. However, experiments have shown that clusters of non-methylated CpG may attribute to the lack of CpG suppression in the HTF islands, where an approximate 1% DNA fraction accounted for the total genome from a variety of vertebrates [109, 110]. The HTF fraction is extremely rich in cleavable sites for mCpG-sensitive restriction enzymes and sequences chosen at random from the HTF fraction belong to islands of DNA several hundred base pairs long that contain CpG at more than 10 times its density in bulk DNA. This would help to explain the phenomenon that (ACG) n or (CCG) n tracts were abundant in introns of all fishes, in contrast to the rarity or absence of this motif in intergenic regions. An alternative explanation is that a specific mechanism exists to maintain the observed level of CpG-containing repeats in introns. The role of cytosine methylation in histone deacetylation, chromatin remodeling, and gene silencing may account for this phenomenon [111].

In the tetranucleotide microsatellites, the (AAAB) n tracts (B denotes any base other than A) seem to be more common, followed by 25% G + C content, and then 75% G + C content and 100% G + C content. Previous studies have indicated that DNA sequence composition could have a profound influence on microsatellite incidence [26, 33]. Kristitin et al. (2002) suggested that the G + C content of microsatellites might have influenced the mutation rate because the tetranucleotide repeats with 25% G + C content were not statistically different from each other, but each was significantly different from the repeats with 50% G + C content [112]. Meanwhile, the attribution of selective forces and DNA mismatch repair system for the distribution patterns could not be ignored, because of several exceptions observed in our study, for example (ACAG) n tracts were abundant in Omky and Okis.

The longer microsatellites (5–6 nt) have an advantage of being more polymorphic than the shorter ones (1–4 nt), as mutation rates generally increase with an increase in the number of repeat units [33, 113]. The significant differences in the repeat types and motif length of microsatellites between studied fish species seems to be due to their genome-specific characteristics. In conclusion, though it remains unclear why certain repeat motifs are more common than others, or the reason they vary so much between different fish species, several observations presented here suggest that individual genomes and genome-specific regions may be characterized by unique microsatellite profiles. This was also supported by the reports of taxon-specific repeats or genome-specific region repeats [6, 36]. The study of microsatellites may help us understand numerous aspects of genome organization and functions.

Availability of data and materials

References

Dieringer D, Schlötterer C. Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. Genome Res. 2003;13(10):2242–51. https://doi.org/10.1101/gr.1416703.
Article CAS PubMed PubMed Central Google Scholar
Zavodna M, Bagshaw A, Brauning R, Gemmell NJ. The effects of transcription and recombination on mutational dynamics of short tandem repeats. Nucleic Acids Res. 2018;46(3):1321–30. https://doi.org/10.1093/nar/gkx1253.
Article CAS PubMed Google Scholar
Fungtammasan A, Ananda G, Hile SE, Su MS, Sun C, Harris R, et al. Accurate typing of short tandem repeats from genome-wide sequencing data and its applications. Genome Res. 2015;25(5):736–49. https://doi.org/10.1101/gr.185892.114.
Article CAS PubMed PubMed Central Google Scholar
Ahmed MM, Shen C, Khan AQ, Wahid MA, Shaban M, Lin Z. A comparative genomics approach revealed evolutionary dynamics of microsatellite imperfection and conservation in genus Gossypium. Hereditas. 2017;154(1):1–12.
Article Google Scholar
Hatcher E, Wang C, Lefkowitz E. Genome variability and gene content in chordopoxviruses: dependence on microsatellites. Viruses. 2015;7(4):2126–46. https://doi.org/10.3390/v7042126.
Article CAS PubMed PubMed Central Google Scholar
Subramanian S, Mishra RK, Singh L. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol. 2003;4(2):1–10.
Article Google Scholar
Pearson CE, Edamura KN, Cleary JD. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet. 2005;6(10):729–42. https://doi.org/10.1038/nrg1689.
Article CAS PubMed Google Scholar
Gelsomino F, Barbolini M, Spallanzani A, Pugliese G, Cascinu S. The evolving role of microsatellite instability in colorectal cancer: a review. Cancer Treat Rev. 2016;51:19–26. https://doi.org/10.1016/j.ctrv.2016.10.005.
Article CAS PubMed Google Scholar
Hannan AJ. Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet. 2018;19(5):286–98. https://doi.org/10.1038/nrg.2017.115.
Article CAS PubMed Google Scholar
Chistiakov DA, Hellemans B, Volckaert FAM. Microsatellites and their genomic distribution, evolution, function and applications: a review with special reference to fish genetics. Aquaculture. 2006;255(1):1–29. https://doi.org/10.1016/j.aquaculture.2005.11.031.
Article CAS Google Scholar
Brouwer JR, Willemsen R, Oostra BA. Microsatellite repeat instability and neurological disease. BioEssays. 2009;31(1):71–83. https://doi.org/10.1002/bies.080122.
Article CAS PubMed PubMed Central Google Scholar
Gao FB, Richter JD. Microsatellite expansion diseases: repeat toxicity found in translation. Neuron. 2017;93(2):249–51. https://doi.org/10.1016/j.neuron.2017.01.001.
Article CAS PubMed Google Scholar
Sinden RR. Origins of instability. Nature. 2001;411(6839):757–8. https://doi.org/10.1038/35081234.
Article CAS PubMed Google Scholar
Dib C, Fauré S, Fizames C, Samson D, Drouot N, Vignal A, et al. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature. 1996;380(6570):152–4. https://doi.org/10.1038/380152a0.
Article CAS PubMed Google Scholar
Dietrich WF, Miller JC, Steen RG, Merchant M, Damron D, Nahf R, et al. A genetic map of the mouse with 4,006 simple sequence length polymorphisms. Nat Genet. 1994;7(2):220–45. https://doi.org/10.1038/ng0694supp-220.
Article CAS PubMed Google Scholar
Kaye C, Milazzo J, Rozenfeld S, Lebrun MH, Tharreau D. The development of simple sequence repeat markers for Magnaporthe grisea and their integration into an established genetic linkage map. Fungal Genet Biol. 2003;40(3):207–14. https://doi.org/10.1016/j.fgb.2003.08.001.
Article CAS PubMed Google Scholar
Ren P, Peng W, You W, Huang Z, Guo Q, Chen N, et al. Genetic mapping and quantitative trait loci analysis of growth-related traits in the small abalone Haliotis diversicolor using restriction-site-associated DNA sequencing. Aquaculture. 2016;454:163–70. https://doi.org/10.1016/j.aquaculture.2015.12.026.
Article CAS Google Scholar
Campoy JA, Ruiz D, Egea J, Rees DJG, Celton JM, Martínez-Gómez P. Inheritance of flowering time in apricot (Prunus armeniaca L.) and analysis of linked quantitative trait loci (QTLs) using simple sequence repeat (SSR) markers. Plant Mol Biol Rep. 2011;29(2):404–10. https://doi.org/10.1007/s11105-010-0242-9.
Article CAS Google Scholar
Chambers GK, Curtis C, Millar CD, Huynen L, Lambert DM. DNA fingerprinting in zoology: past, present, future. Invest Genet. 2014;5(1):1–11.
Article Google Scholar
Rafiei V, Banihashemi Z, Jiménez-Díaz RM, Navas-Cortés JA, Landa BB, Jiménez-Gasco MM, et al. Comparison of genotyping by sequencing and microsatellite markers for unravelling population structure in the clonal fungus Verticillium dahliae. Plant Pathol. 2018;67(1):76–86. https://doi.org/10.1111/ppa.12713.
Article CAS Google Scholar
Bhargava A, Fuentes FF. Mutational dynamics of microsatellites. Mol Biotechnol. 2010;44(3):250–66. https://doi.org/10.1007/s12033-009-9230-4.
Article CAS PubMed Google Scholar
Vieira MLC, Santini L, Diniz AL, Munhoz CDF. Microsatellite markers: what they mean and why they are so useful. Genet Mol Biol. 2016;39(3):312–28. https://doi.org/10.1590/1678-4685-GMB-2016-0027.
Article PubMed PubMed Central Google Scholar
Garner TWJ. Genome size and microsatellites: the effect of nuclear size on amplification potential. Genome. 2002;45(1):212–5. https://doi.org/10.1139/g01-113.
Article CAS PubMed Google Scholar
Hancock J. Microsatellites and other simple sequences: genomic context and mutational mechanisms. New York: Oxford University Press; 1999.
Google Scholar
Primmer CR, Raudsepp T, Chowdhary BP, Moller AP, Ellegren H. Low frequency of microsatellites in the avian genome. Genome Res. 1997;7(5):471–82. https://doi.org/10.1101/gr.7.5.471.
Article CAS PubMed Google Scholar
Katti MV, Ranjekar PK, Gupta VS. Differential distribution of simple sequence repeats in eukaryotic genome sequences. Mol Biol Evol. 2001;18(7):1161–7. https://doi.org/10.1093/oxfordjournals.molbev.a003903.
Article CAS PubMed Google Scholar
Karlin S, Brocchieri L, Bergman A, Mrázek J, Gentles AJ. Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci U S A. 2002;99(1):333–8. https://doi.org/10.1073/pnas.012608599.
Article CAS PubMed PubMed Central Google Scholar
Rockman MV, Wray GA. Abundant raw material for cis-regulatory evolution in humans. Mol Biol Evol. 2002;19(11):1991–2004. https://doi.org/10.1093/oxfordjournals.molbev.a004023.
Article CAS PubMed Google Scholar
Li YC, Korol AB, Fahima T, Nevo E. Microsatellites within genes: structure, function, and evolution. Mol Biol Evol. 2004;21(6):991–1007. https://doi.org/10.1093/molbev/msh073.
Article CAS PubMed Google Scholar
Hause RJ, Pritchard CC, Shendure J, Salipante SJ. Classification and characterization of microsatellite instability across 18 cancer types. Nat Med. 2016;22(11):1342–50. https://doi.org/10.1038/nm.4191.
Article CAS PubMed Google Scholar
Ranathunge C, Wheeler GL, Chimahusky ME, Kennedy MM, Morrison JI, Baldwin BS, et al. Transcriptome profiles of sunflower reveal the potential role of microsatellites in gene expression divergence. Mol Ecol. 2018;27(5):1188–99. https://doi.org/10.1111/mec.14522.
Article CAS PubMed Google Scholar
Orgel LE, Crick FHC. Selfish DNA: the ultimate parasite. Nature. 1980;284(5757):604–7. https://doi.org/10.1038/284604a0.
Article CAS PubMed Google Scholar
Ellegren H. Microsatellites: simple sequences with complex evolution. Nat Rev Genet. 2004;5(6):435–45. https://doi.org/10.1038/nrg1348.
Article CAS PubMed Google Scholar
Rajendrakumar P, Biswal AK, Balachandran SM, Srinivasarao K, Sundaram RM. Simple sequence repeats in organellar genomes of rice: frequency and distribution in genic and intergenic regions. Bioinformatics. 2007;23(1):1–4. https://doi.org/10.1093/bioinformatics/btl547.
Article CAS PubMed Google Scholar
Kim CK, Lee GS, Mo JS, Bae SH, Lee TH. Molecular marker database for efficient use in agricultural breeding programs. Bioinformation. 2015;11(9):444–6. https://doi.org/10.6026/97320630011444.
Article PubMed PubMed Central Google Scholar
Tóth G, Gáspári Z, Jurka J. Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000;10(7):967–81. https://doi.org/10.1101/gr.10.7.967.
Article PubMed PubMed Central Google Scholar
Metzgar D, Bytof J, Wills C. Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000;10(1):72–80.
CAS PubMed PubMed Central Google Scholar
Wang Z, Weber JL, Zhong G, Tanksley SD. Survey of plant short tandem DNA repeats. Theor Appl Genet. 1994;88(1):1–6. https://doi.org/10.1007/BF00222386.
Article CAS PubMed Google Scholar
Varshney RK, Thiel T, Stein N, Langridge P, Graner A. In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. Cell Mol Biol Lett. 2002;7(2A):537–46.
CAS PubMed Google Scholar
Moran C. Microsatellite repeats in pig (Sus domestica) and chicken (Gallus domesticus) genomes. J Hered. 1993;84(4):274–80. https://doi.org/10.1093/oxfordjournals.jhered.a111339.
Article CAS PubMed Google Scholar
Jurka J, Pethiyagoda C. Simple repetitive DNA sequences from primates: compilation and analysis. J Mol Evol. 1995;40(2):120–6. https://doi.org/10.1007/BF00167107.
Article CAS PubMed Google Scholar
Lith HA, Zutphen LFM. Characterization of rabbit DNA micros extracted from the EMBL nucleotide sequence database. Anim Genet. 1996;27(6):387–95. https://doi.org/10.1111/j.1365-2052.1996.tb00505.x.
Article PubMed Google Scholar
Hammock EAD, Young LJ. Microsatellite instability generates diversity in brain and sociobehavioral traits. Science. 2005;308(5728):1630–4. https://doi.org/10.1126/science.1111427.
Article CAS PubMed Google Scholar
Gylfe AE, Tuupanen S, Hänninen U, Kondelin J, Ristolainen H, Katainen R, et al. Abstract 5193: novel candidate oncogenes with mutation hot spots in microsatellite unstable colorectal cancer. Cancer Res. 2014;74(19):5193.
Article Google Scholar
Gemayel R, Vinces MD, Legendre M, Verstrepen KJ. Variable tandem repeats accelerate evolution of coding and regulatory sequences. Annu Rev Genet. 2010;44(1):445–77. https://doi.org/10.1146/annurev-genet-072610-155046.
Article CAS PubMed Google Scholar
Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MG, et al. Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res. 2005;15(4):537–51. https://doi.org/10.1101/gr.3096505.
Article CAS PubMed PubMed Central Google Scholar
Mularoni L, Ledda A, Toll-Riera M, Albà MM. Natural selection drives the accumulation of amino acid tandem repeats in human proteins. Genome Res. 2010;20(6):745–54. https://doi.org/10.1101/gr.101261.109.
Article CAS PubMed PubMed Central Google Scholar
Gemayel R, Cho J, Boeynaems S, Verstrepen KJ. Beyond junk-variable tandem repeats as facilitators of rapid evolution of regulatory and coding sequences. Genes. 2012;3(3):461–80. https://doi.org/10.3390/genes3030461.
Article CAS PubMed PubMed Central Google Scholar
Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ. Unstable tandem repeats in promoters confer transcriptional evolvability. Science. 2009;324(5931):1213–6. https://doi.org/10.1126/science.1170097.
Article CAS PubMed PubMed Central Google Scholar
Morin GB. The human telomere terminal transferase enzyme is a ribonucleoprotein that synthesizes TTAGGG repeats. Cell. 1989;59(3):521–9. https://doi.org/10.1016/0092-8674(89)90035-4.
Article CAS PubMed Google Scholar
Casas-Vila N, Scheibe M, Freiwald A, Kappei D, Butter F. Identification of TTAGGG-binding proteins in Neurospora crassa, a fungus with vertebrate-like telomere repeats. BMC Genomics. 2015;16(1):1–9.
Article Google Scholar
Sand L, Szuhai K, Hogendoorn P. Sequencing overview of Ewing sarcoma: a journey across genomic, epigenomic and transcriptomic landscapes. Int J Mol Sci. 2015;16(7):16176–215. https://doi.org/10.3390/ijms160716176.
Article CAS PubMed PubMed Central Google Scholar
Lai Y, Sun F. The relationship between microsatellite slippage mutation rate and the number of repeat units. Mol Biol Evol. 2003;20(12):2123–31. https://doi.org/10.1093/molbev/msg228.
Article CAS PubMed Google Scholar
Bachtrog D, Agis M, Imhof M, Schlötterer C. Microsatellite variability differs between dinucleotide repeat motifs—evidence from Drosophila melanogaster. Mol Biol Evol. 2000;17(9):1277–85. https://doi.org/10.1093/oxfordjournals.molbev.a026411.
Article CAS PubMed Google Scholar
Chakraborty R, Kimmel M, Stivers DN, Davison LJ, Deka R. Relative mutation rates at di-, tri-, and tetranucleotide microsatellite loci. Proc Natl Acad Sci U S A. 1997;94(3):1041–6. https://doi.org/10.1073/pnas.94.3.1041.
Article CAS PubMed PubMed Central Google Scholar
Schug MD, Hutter CM, Wetterstrand KA, Gaudette MS, Mackay TF, Aquadro CF. The mutation rates of di-, tri- and tetranucleotide repeats in Drosophila melanogaster. Mol Biol Evol. 1998;15(12):1751–60. https://doi.org/10.1093/oxfordjournals.molbev.a025901.
Article CAS PubMed Google Scholar
Amos W, Flint J, Xu X. Heterozygosity increases microsatellite mutation rate, linking it to demographic history. BMC Genet. 2008;9(1):1–10.
Article Google Scholar
Amos W. Heterozygosity increases microsatellite mutation rate. Biol Lett. 2016;12(1):20150902.
Article Google Scholar
Primmer CR, Ellegren H, Saino N, Møller AP. Directional evolution in germline microsatellite mutations. Nat Genet. 1996;13(4):391–3. https://doi.org/10.1038/ng0896-391.
Article CAS PubMed Google Scholar
Ellegren H. Heterogeneous mutation processes in human microsatellite DNA sequences. Nat Genet. 2000;24(4):400–2. https://doi.org/10.1038/74249.
Article CAS PubMed Google Scholar
Whittaker JC, Harbord RM, Boxall N, Mackay I, Dawson G, Sibly RM. Likelihood-based estimation of microsatellite mutation rates. Genetics. 2003;164(2):781–7.
Article PubMed PubMed Central Google Scholar
Seyfert AL, Cristescu MEA, Frisse L, Schaack S, Thomas WK, Lynch M. The rate and spectrum of microsatellite mutation in Caenorhabditis elegans and Daphnia pulex. Genetics. 2008;178(4):2113–21. https://doi.org/10.1534/genetics.107.081927.
Article CAS PubMed PubMed Central Google Scholar
Schlötterer C. Evolutionary dynamics of microsatellite DNA. Chromosoma. 2000;109(6):365–71. https://doi.org/10.1007/s004120000089.
Article PubMed Google Scholar
Noble L. Microsatellites — evolution and applications. Heredity. 1999;83(5):633–4. https://doi.org/10.1038/sj.hdy.6886482.
Article Google Scholar
Madesis P, Ganopoulos I, Tsaftaris A. Microsatellites: evolution and contribution. In: Kantartzi SK, Totowa NJ, editors. Microsatellites: Methods and Protocols. New York: Humana Press; 2013. p. 1–13.
Saeed AF, Wang R, Wang S. Microsatellites in pursuit of microbial genome evolution. Front Microbiol. 2016;6:1462.
Article PubMed PubMed Central Google Scholar
Weber JL, Wong C. Mutation of human short tandem repeats. Hum Mol Genet. 1993;2(8):1123–8. https://doi.org/10.1093/hmg/2.8.1123.
Article CAS PubMed Google Scholar
Pearson CE, Sinden RR. Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA. Curr Opin Struct Biol. 1998;8(3):321–30. https://doi.org/10.1016/S0959-440X(98)80065-1.
Article CAS PubMed Google Scholar
Sinden RR. Biological implications of the DNA structures associated with disease-causing triplet repeats. Am J Hum Genet. 1999;64(2):346–53. https://doi.org/10.1086/302271.
Article CAS PubMed PubMed Central Google Scholar
Richard GF, Pâques F. Mini- and microsatellite expansions: the recombination connection. EMBO Rep. 2000;1(2):122–6. https://doi.org/10.1093/embo-reports/kvd031.
Article CAS PubMed PubMed Central Google Scholar
Charlesworth B, Sniegowski P, Stephan W. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature. 1994;371(6494):215–20. https://doi.org/10.1038/371215a0.
Article CAS PubMed Google Scholar
Liu S, Hou W, Sun T, Xu Y, Li P, Yue B, et al. Genome-wide mining and comparative analysis of microsatellites in three macaque species. Mol Gen Genomics. 2017;292(3):537–50. https://doi.org/10.1007/s00438-017-1289-1.
Article CAS Google Scholar
Xu Y, Li W, Hu Z, Zeng T, Shen Y, Liu S, et al. Genome-wide mining of perfect microsatellites and tetranucleotide orthologous microsatellites estimates in six primate species. Gene. 2018;643:124–32. https://doi.org/10.1016/j.gene.2017.12.008.
Article CAS PubMed Google Scholar
Xu Y, Hu Z, Wang C, Zhang X, Li J, Yue B. Characterization of perfect microsatellite based on genome-wide and chromosome level in rhesus monkey (Macaca mulatta). Gene. 2016;592(2):269–75. https://doi.org/10.1016/j.gene.2016.07.016.
Article CAS PubMed Google Scholar
Karaoglu H, Lee CMY, Meyer W. Survey of simple sequence repeats in completed fungal genomes. Mol Biol Evol. 2005;22(3):639–49. https://doi.org/10.1093/molbev/msi057.
Article CAS PubMed Google Scholar
Li C-Y, Liu L, Yang J, Li J-B, Su Y, Zhang Y, et al. Genome-wide analysis of microsatellite sequence in seven filamentous fungi. Interdiscip Sci: Comput Life Sci. 2009;1(2):141–50. https://doi.org/10.1007/s12539-009-0014-5.
Article CAS Google Scholar
Lim S, Notley-McRobb L, Lim M, Carter DA. A comparison of the nature and abundance of microsatellites in 14 fungal genomes. Fungal Genet Biol. 2004;41(11):1025–36. https://doi.org/10.1016/j.fgb.2004.08.004.
Article CAS PubMed Google Scholar
Murat C, Riccioni C, Belfiori B, Cichocki N, Labbé J, Morin E, et al. Distribution and localization of microsatellites in the Perigord black truffle genome and identification of new molecular markers. Fungal Genet Biol. 2011;48(6):592–601. https://doi.org/10.1016/j.fgb.2010.10.007.
Article CAS PubMed Google Scholar
Ohm RA, de Jong JF, Lugones LG, Aerts A, Kothe E, Stajich JE, et al. vanKuyk PA, Horton JS, Grigoriev IV, Wösten HAB. Genome sequence of the model mushroom Schizophyllum commune. Nat Biotechnol. 2010;28(9):957–63. https://doi.org/10.1038/nbt.1643.
Article CAS PubMed Google Scholar
Qian J, Xu H, Song J, Xu J, Zhu Y, Chen S. Genome-wide analysis of simple sequence repeats in the model medicinal mushroom Ganoderma lucidum. Gene. 2013;512(2):331–6. https://doi.org/10.1016/j.gene.2012.09.127.
Article CAS PubMed Google Scholar
Zhao X, Tan Z, Feng H, Yang R, Li M, Jiang J, et al. Microsatellites in different potyvirus genomes: survey and analysis. Gene. 2011;488(1):52–6. https://doi.org/10.1016/j.gene.2011.08.016.
Article CAS PubMed Google Scholar
Mrázek J, Guo X, Shah A. Simple sequence repeats in prokaryotic genomes. Proc Natl Acad Sci U S A. 2007;104(20):8472–7. https://doi.org/10.1073/pnas.0702412104.
Article CAS PubMed PubMed Central Google Scholar
Burranboina K, Abraham S, Murugan K, Bayyappa M, Yogisharadhya R, Raghavendra G. Genome wide identification and analysis of microsatellite repeats in the largest DNA viruses (Poxviridae family): an insilico approach. Annu Res Rev Biol. 2018;22(1):1–11. https://doi.org/10.9734/ARRB/2018/38367.
Article Google Scholar
Zhou L, Deng L, Fu Y, Wu X, Zhao X, Chen Y, et al. Comparative analysis of microsatellites and compound microsatellites in T4-like viruses. Gene. 2016;575(2):695–701. https://doi.org/10.1016/j.gene.2015.09.053.
Article CAS PubMed Google Scholar
Du L, Zhang C, Liu Q, Zhang X, Yue B. Krait: an ultrafast tool for genome-wide survey of microsatellites and primer design. Bioinformatics. 2018;34(4):681–3. https://doi.org/10.1093/bioinformatics/btx665.
Article CAS PubMed Google Scholar
Kofler R, Schlötterer C, Lelley T. SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics. 2007;23(13):1683–5. https://doi.org/10.1093/bioinformatics/btm157.
Article CAS PubMed Google Scholar
Luo W, Nie Z, Zhan F, Wei J, Wang W, Gao Z. Rapid development of microsatellite markers for the endangered fsh Schizothorax biddulphi (Günther) using next generation sequencing and cross-species amplification. Int J Mol Sci. 2012;13(11):14946–55. https://doi.org/10.3390/ijms131114946.
Article CAS PubMed PubMed Central Google Scholar
Huang J, Li W, Jian Z, Yue B, Yan Y. Genome-wide distribution and organization of microsatellites in six species of birds. Biochem Syst Ecol. 2016;67:95–102. https://doi.org/10.1016/j.bse.2016.05.023.
Article CAS Google Scholar
Cai G, Leadbetter CW, Muehlbauer MF, Molnar TJ, Hillman BI. Genome-wide microsatellite identification in the fungus Anisogramma anomala using Illumina sequencing and genome assembly. PLoS One. 2013;8(11):e82408. https://doi.org/10.1371/journal.pone.0082408.
Article CAS PubMed PubMed Central Google Scholar
Wang Y, Chen M, Wang H, Wang JF, Bao D. Microsatellites in the genome of the edible mushroom, Volvariella volvacea. Biomed Res Int. 2014;2014:1–10.
Article Google Scholar
Webster MT, Smith NGC, Ellegren H. Microsatellite evolution inferred from human– chimpanzee genomic sequence alignments. Proc Natl Acad Sci U S A. 2002;99(13):8748–53. https://doi.org/10.1073/pnas.122067599.
Article CAS PubMed PubMed Central Google Scholar
Pascual M, Schug MD, Aquadro CF. High density of long dinucleotide microsatellites in Drosophila subobscura. Mol Biol Evol. 2000;17(8):1259–67. https://doi.org/10.1093/oxfordjournals.molbev.a026409.
Article CAS PubMed Google Scholar
Schlötterer C, Harr B. Drosophila virilis has long and highly polymorphic microsatellites. Mol Biol Evol. 2000;17(11):1641–6. https://doi.org/10.1093/oxfordjournals.molbev.a026263.
Article PubMed Google Scholar
Hancock JM. Simple sequences in a ‘minimal ’ genome. Nat Genet. 1996;14(1):14–5. https://doi.org/10.1038/ng0996-14.
Article CAS PubMed Google Scholar
Qi WH, Jiang XM, Du LM, Xiao GS, Hu TZ, Yue BS, et al. Genome-wide survey and analysis of microsatellite sequences in bovid species. PLoS One. 2015;10(7):e0133667. https://doi.org/10.1371/journal.pone.0133667.
Article CAS PubMed PubMed Central Google Scholar
Perinchery G, Nojima D, Goharderakhshan R, Tanaka Y, Alonzo J, Dahiya R. Microsatellite instability of dinucleotide tandem repeat sequences is higher than trinucleotide, tetranucleotide and pentanucleotide repeat sequences in prostate cancer. Int J Oncol. 2000;16(6):1203–9.
CAS PubMed Google Scholar
Borodulina OR, Golubchikova JS, Ustyantsev IG, Kramerov DA. Polyadenylation of RNA transcribed from mammalian SINEs by RNA polymerase III: complex requirements for nucleotide sequences. Biochim Biophys Acta. 2016;1859(2):355–65. https://doi.org/10.1016/j.bbagrm.2015.12.003.
Article CAS PubMed Google Scholar
Kaessmann H, Vinckenbosch N, Long M. RNA-based gene duplication: mechanistic and evolutionary insights. Nat Rev Genet. 2009;10(1):19–31. https://doi.org/10.1038/nrg2487.
Article CAS PubMed PubMed Central Google Scholar
Richardson SR, Morell S, Faulkner GJ. L1 retrotransposons and somatic mosaicism in the brain. Annu Rev Genet. 2014;48(1):1–27. https://doi.org/10.1146/annurev-genet-120213-092412.
Article CAS PubMed Google Scholar
Prasad MD. Survey and analysis of microsatellites in the silkworm, Bombyx mori: frequency, distribution, mutations, marker potential and their conservation in heterologous species. Genetics. 2005;169(1):197–214. https://doi.org/10.1534/genetics.104.031005.
Article CAS PubMed PubMed Central Google Scholar
Gur-Arie R, Cohen CJ, Eitan Y, Shelef L, Hallerman EM, Kashi Y. Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res. 2000;10(1):62–71.
CAS PubMed PubMed Central Google Scholar
Murray V. The frequency of poly(G) tracts in the human genome and their use as a sensor of DNA damage. Comput Biol Chem. 2015;54:13–7. https://doi.org/10.1016/j.compbiolchem.2014.11.006.
Article CAS PubMed Google Scholar
Schlötterer C, Tautz D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 1992;20(2):211–5. https://doi.org/10.1093/nar/20.2.211.
Article PubMed PubMed Central Google Scholar
Morgante M, Hanafey M, Powell W. Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002;30(2):194–200. https://doi.org/10.1038/ng822.
Article CAS PubMed Google Scholar
Russell GJ, Walker PMB, Elton RA, Subak-Sharpe JH. Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol. 1976;108(1):1–20. https://doi.org/10.1016/S0022-2836(76)80090-3.
Article CAS PubMed Google Scholar
Swartz MN, Trautner TA, Kornberg A. Enzymatic synthesis of deoxyribonucleic acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J Biol Chem. 1962;237(6):1961–7. https://doi.org/10.1016/S0021-9258(19)73967-2.
Article CAS PubMed Google Scholar
Coulondre C, Miller JH, Farabaugh PJ, Gilbert W. Molecular basis of base substitution hotspots in Escherichia coli. Nature. 1978;274(5673):775–80. https://doi.org/10.1038/274775a0.
Article CAS PubMed Google Scholar
Bird AP. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 1980;8(7):1499–504. https://doi.org/10.1093/nar/8.7.1499.
Article CAS PubMed PubMed Central Google Scholar
Cooper DN, Taggart MH, Bird AP. Unmethlated domains in vertebrate DNA. Nucleic Acids Res. 1983;11(3):647–58. https://doi.org/10.1093/nar/11.3.647.
Article CAS PubMed PubMed Central Google Scholar
Bird A, Taggart M, Frommer M, Miller OJ, Macleod D. A fraction of the mouse genome that is derived from islands of nonmethylated, CpG-rich DNA. Cell. 1985;40(1):91–9. https://doi.org/10.1016/0092-8674(85)90312-5.
Article CAS PubMed Google Scholar
Razin A. CpG methylation, chromatin structure and gene silencing—a three-way connection. EMBO J. 1998;17(17):4905–80. https://doi.org/10.1093/emboj/17.17.4905.
Article CAS PubMed PubMed Central Google Scholar
Eckert KA, Yan G, Hile SE. Mutation rate and specificity analysis of tetranucleotide microsatellite DNA alleles in somatic human cells. Mol Carcinog. 2002;34(3):140–50. https://doi.org/10.1002/mc.10058.
Article CAS PubMed Google Scholar
Wierdl M, Dominska M, Petes TD. Microsatellite instability in yeast: dependence on the length of the microsatellite. Genetics. 1997;146(3):769–79. https://doi.org/10.1093/genetics/146.3.769.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank Lianming Du for assistance with microsatellite extraction.

Funding

This work was supported by the Yalong River Hydropower Development Company, Ltd. (No. YLDC-ZBA-2018116).

Author information

Authors and Affiliations

Sichuan Key Laboratory of Conservation Biology on Endangered Wildlife, College of Life Sciences, Sichuan University, Chengdu, 610065, People’s Republic of China
Yi Lei, Yu Zhou, Megan Price & Zhaobin Song
Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065, People’s Republic of China
Zhaobin Song

Authors

Yi Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Megan Price
View author publications
You can also search for this author in PubMed Google Scholar
Zhaobin Song
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zhaobin Song proposed the research and directed the study. Yi Lei and Yu Zhou analyzed and wrote the paper. Megan Price wrote and revised the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhaobin Song.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplement Table 1.

Total numbers of different microsatellite repeats in 14 fish species.

Additional file 2: Supplement Table 2.

Total numbers of Mono-, Di-, Tri- Tetra-, Penta-, and Hexanucleotide repeats in 14 fish species.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Lei, Y., Zhou, Y., Price, M. et al. Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions. BMC Genomics 22, 421 (2021). https://doi.org/10.1186/s12864-021-07752-6

Download citation

Received: 08 March 2021
Accepted: 24 May 2021
Published: 07 June 2021
DOI: https://doi.org/10.1186/s12864-021-07752-6

Genome-wide characterization of microsatellite DNA in fishes: survey and analysis of their abundance and frequency in genome-specific regions

Abstract

Background

Results

Conclusions

Background

Materials and methods

Genomic sequences

Microsatellite identification

Results

Distribution patterns of microsatellite repeats in the fish genomes

Mononucleotide repeats

Dinucleotide repeats

Trinucleotide repeats

Tetranucleotide repeats

Pentanucleotide repeats

Hexanucleotide repeats

Iteration number and length distribution of microsatellites in fish genomes

Discussion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Supplement Table 1.

Additional file 2: Supplement Table 2.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomics

Contact us