Genic non-coding microsatellites in the rice genome: characterization, marker design and use in assessing genetic and evolutionary relationships among domesticated groups
© Parida et al; licensee BioMed Central Ltd. 2009
Received: 26 July 2008
Accepted: 31 March 2009
Published: 31 March 2009
Completely sequenced plant genomes provide scope for designing a large number of microsatellite markers, which are useful in various aspects of crop breeding and genetic analysis. With the objective of developing genic but non-coding microsatellite (GNMS) markers for the rice (Oryza sativa L.) genome, we characterized the frequency and relative distribution of microsatellite repeat-motifs in 18,935 predicted protein coding genes including 14,308 putative promoter sequences.
We identified 19,555 perfect GNMS repeats with densities ranging from 306.7/Mb in chromosome 1 to 450/Mb in chromosome 12 with an average of 357.5 GNMS per Mb. The average microsatellite density was maximum in the 5' untranslated regions (UTRs) followed by those in introns, promoters, 3'UTRs and minimum in the coding sequences (CDS). Primers were designed for 17,966 (92%) GNMS repeats, including 4,288 (94%) hypervariable class I types, which were bin-mapped on the rice genome. The GNMS markers were most polymorphic in the intronic region (73.3%) followed by markers in the promoter region (53.3%) and least in the CDS (26.6%). The robust polymerase chain reaction (PCR) amplification efficiency and high polymorphic potential of GNMS markers over genic coding and random genomic microsatellite markers suggest their immediate use in efficient genotyping applications in rice. A set of these markers could assess genetic diversity and establish phylogenetic relationships among domesticated rice cultivar groups. We also demonstrated the usefulness of orthologous and paralogous conserved non-coding microsatellite (CNMS) markers, identified in the putative rice promoter sequences, for comparative physical mapping and understanding of evolutionary and gene regulatory complexities among rice and other members of the grass family. The divergence between long-grained aromatics and subspecies japonica was estimated to be more recent (0.004 Mya) compared to short-grained aromatics from japonica (0.006 Mya) and long-grained aromatics from subspecies indica (0.014 Mya).
Our analyses showed that GNMS markers with their high polymorphic potential would be preferred candidate functional markers in various marker-based applications in rice genetics, genomics and breeding. The CNMS markers provided encouraging implications for their use in comparative genome mapping and understanding of evolutionary complexities in rice and other members of grass family.
Microsatellites or simple sequence repeats are tandemly repeated 1–6 base-pair (bp) nucleotide motifs distributed across the genome in many prokaryotes and eukaryotes . An increasing number of microsatellites have been characterized in protein coding sequences (CDSs) and non-coding untranslated regions (UTRs) of genes for several plant species. Alterations in these microsatellite sequences are thought to have significant consequences with regard to gene function . Variation in the length of microsatellite motifs in non-coding sequences of genes (i.e. promoters, UTRs and introns) may affect the process of transcription and translation through slippage, gene silencing and pre-mRNA splicing as has been observed for many diseases in humans, including cancers and neuronal disorders [3–9]. Microsatellite markers based on such sequence motifs would be useful as "functional genetic markers" for various applications in genomics and crop breeding. However, the identification and characterization of such microsatellites has been limited in plants.
Completely sequenced genomes provide scope for designing a large number of gene based microsatellite markers. Rice (Oryza sativa L.) is the first cereal with a completely sequenced genome that has enabled the development of a large number of microsatellite markers . Recently, Zhang et al.  developed 52,485 microsatellite markers polymorphic between indica and japonica. It is difficult to choose useful and informative microsatellite markers from large marker data-sets for genotyping applications in rice. This can be overcome by constructing a smaller informative microsatellite marker database comprising markers located in potentially functional genic sequences with relatively high polymorphic potential. However, genic microsatellite markers when derived from protein-coding sequences are constrained by purifying selection  and thus have less potential for revealing polymorphism particularly at the intra-specific level . In contrast, markers derived from non-coding sequence components (i.e. 5'UTRs, introns and 3'UTRs) are under moderate selection pressure and thus expected to be more polymorphic as genetic markers. Previous studies have shown non-random and distinct patterns of microsatellite distribution in non-coding sequence components of rice genes predicted in the completely sequenced rice genome . In view of the excellent genetic attributes and higher informativeness expected for genic non-coding microsatellite (GNMS) markers, development of such markers from the protein coding genes predicted in the rice genome would be of practical significance.
A comparative analysis of non-coding sequences, known as phylogenetic footprinting [15–19], has provided useful inferences about conserved non-coding microsatellite (CNMS) repeat-containing regulatory sequence elements and their significance in gene regulation in plant-specific pathways . These studies have suggested the use of completely finished and recently annotated rice genomic sequences for intra- and inter-genomic phylogenetic footprinting to detect a large number of paralogous and orthologous CNMS motifs, respectively, in the 5' non-coding promoter regions of genes among cereals and Arabidopsis thaliana. Identification of such CNMS motifs would help in understanding the pattern of regulatory or non-coding promoter sequence evolution in plant genomes.
We undertook this study to characterize the frequency and relative distribution of GNMS repeat-motifs in different sequence components of protein coding rice genes; design primers flanking the GNMS repeat-motifs; physically locate the markers on rice chromosomes, and evaluate their efficiency in the assessment of molecular diversity; detect and characterize CNMS motifs in the putative promoter regions of rice genes using intra- and inter-genomic phylogenetic footprinting; and evaluate markers for their utility in comparative physical mapping and establishing molecular phylogenetic relationships among different rice cultivar groups.
Results and discussion
Frequency and relative abundance of GNMS
Nature, frequency and relative distribution of microsatellites in the four non-coding and coding sequence components of rice genes distributed over 12 chromosomes
Characters under study
Range of values observed for individual rice chromosomes
Number of sequences examined
5799 (Chr 10) to 18954 (Chr 1)
Size (bp) of examined sequences
2786342 (Chr 9) to 8429186 (Chr 1)
Number (%) of identified perfect microsatellites
5920 (31.2, Chr 1) to 2475 (42.4, Chr 11)
Number (%) of mononucleotides
2263 (12, Chr 1) to 1348 (23, Chr 11)
Number (%) of dinucleotides
1299 (21.9, Chr 1) to 742 (27.6, Chr 12)
Number (%) of trinucleotides
298 (12.8, Chr 10) to 625 (21.5, Chr 8)
Number (%) of tetranucleotides
38 (1, Chr 4) to 70 (2.4, Chr 8)
Number (%) of pentanucleotides
9 (0.24, Chr 4) to 16 (0.59, Chr 12)
Number (%) of hexanucleotides
1 (0.04, Chr 1) to 10 (0.2, Chr 2)
Number (%) of perfect microsatellites excluding mononucleotides
1087 (18, Chr 9) to 2585 (13.6, Chr 1)
Number (%) of primer-pairs for perfect microsatellites
938 (86.3, Chr 9) to 2395 (92.6, Chr 1)
Perfect microsatellite counts per Mb sequences
450 (Chr 12) to 306.7 (Chr 1)
Number (%) of compound microsatellites
217 (3.6, Chr 9) to 585 (3, Chr 1)
Compound microsatellite counts per Mb sequences
99.8 (Chr 12) to 68.5 (Chr 4)
Number (%) of perfect class I microsatellites
229 (21, Chr 10) to 598 (24.4, Chr 3)
Number (%) of primer-pairs for perfect class I microsatellites
216 (94.3, Chr 10) to 571 (95.5, Chr 3)
Perfect class I microsatellite counts per Mb sequences
101.4 (Chr 12) to 67.7 (Chr 1)
Nature and distribution of GNMS
The GC-rich trinucleotide GNMS repeat-motifs were the most prevalent class of microsatellites in the regulatory regions (i.e. 5'UTRs and promoters), whereas the AT-rich trinucleotide repeats were distributed evenly in all the coding and non-coding sequence components. However, the proportion of GC-rich trinucleotide motifs was maximum in the 5'UTRs (65.5%) followed by putative promoters (50.5%), 3'UTRs (42.8%) and introns (28.7%) compared to 96.2% in the CDS (Table 1, see Additional file 2). This trend corresponds to GC-rich microsatellites being frequently detected in the regions downstream of TSS possibly due to higher GC content at the 5'end of the rice genes . The GC-rich trinucleotide GNMS repeat-motifs in the 5'UTRs and promoters perhaps serve as binding sites for nuclear proteins that are essential for regulating translation and gene expression, and thus are expected to occur more frequently in these sequences . The high frequency of trinucleotide GNMS repeat-motifs in the coding regions could be due to selection against frameshift mutations that limits expansion of non-triplet microsatellites . These results agreed well with earlier observations on the relative abundance of GC-rich trinucleotide repeat-motifs in the expressed sequence tags and unigene sequences of cereal genomes [25, 26]. The dinucleotide and tetranucleotide repeat-motifs were predominant particularly in the intronic and 3'UTR sequences (see Additional file 2). The AT-rich dinucleotide repeat-motifs were most in intronic sequences (64.5%) followed by 3'UTRs (49.7%), whereas the proportion of AT-rich tetranucleotide repeats was maximum in 3'UTRs (6.7%; Table 1). The purine-rich dinucleotide microsatellites, such as (GA)n, were abundant in 5'UTRs (28.6%) followed by promoters (18.4%) compared to 28% in CDS (see Additional file 1). Our observations are comparable to those from earlier studies on abundance of GA-rich dinucleotide repeat-motifs in the coding [25, 26] and 5'-end flanking regions  and AT-rich dinucleotide motifs in the intronic sequences in rice genes . The promoter sequences of rice genes frequently (22.3%) contained pyrimidine-rich microsatellites, especially (CT)n dinucleotides (see Additional file 1), possibly due to their potential role in activation of promoters for transcription initiation .
The microsatellite with longer repeat-motifs is expected to be more polymorphic due to high length dependent replication slippage . We identified 4,559 class I GNMS repeat-motifs in the protein coding genes predicted in the rice genome with an overall density of 83.8 GNMS/Mb (Table 1). The density of the class I repeat-motif containing GNMS varied from 67.7/Mb in chromosome 1 to 101.4/Mb in chromosome 12 whereas its proportion ranged from 18.8% (877) in promoters to 26.6% (2394) in intronic sequences (Table 1). Thus, the potential of microsatellite expansion in the genic non-coding sequences of rice genes is not correlated with the frequency of GC-rich trinucleotide repeat-motifs in these regions . Our results revealed non-random and strongly biased distribution of GNMS repeat-motifs across the regulatory and non-coding regions of the rice genes.
Design and physical location of GNMS markers
Forward and reverse primers were designed from the flanking sequences of the identified microsatellite repeat-motifs in each of the four genic non-coding sequence components of rice genes (i.e. promoters, 5'UTRs, introns and 3'UTRs). The structural organization of the various sequence components and their implications for designing different types of GNMS markers are shown in additional file 3. Primers were designed for 17,966 (92%) genic non-coding sequences – 4,278 (91.8%) in promoters, 4,484 (92.6%) in 5'UTRs, 8,370 (93%) in introns and 834 (81.7%) in 3'UTRs. The primer sequences for 4,288 (94%) hypervariable class I GNMS markers distributed across 12 rice chromosomes are provided in additional file 4 (sequences for the remaining primer-pairs are available on request). Class I markers included 829 in promoters, 953 in 5'UTRs, 2,275 in introns and 231 in 3'UTRs. The GNMS markers were present in the rice genes that regulate biological and cellular functions. For example, we identified 51 (7 in promoters, 12 in 5'UTRs, 27 in introns, 5 in 3'UTRs) GNMS markers including 10 class I types in various disease resistance genes predicted in rice chromosome 11 (see Additional file 3) and 25 (4 in promoters, 9 in 5'UTRs, 11 in introns, 1 in 3'UTRs) GNMS markers including 7 class I types in rice chromosome 12 (see Additional file 4). The GNMS markers when genetically associated with the target traits would facilitate gene cloning and marker-assisted breeding, thereby accelerating rice genetic improvement.
Amplification efficiency and polymorphic potential of GNMS markers
Comparative evaluation of polymorphic potential of the microsatellite markers designed from the non-coding and coding sequence components of rice genes
Number of alleles
Sequence components of genes
Number of GNMS markers amplified
Number (%) polymorphic GNMS markers
We detected polymorphism with 11 (73.3%, PIC of 0.72) markers from intronic sequences, 8 from promoters (53.3%, PIC of 0.64), 6 from 5'UTRs (40%, 0.62), 5 from 3'UTRs (33.3%, 0.58) and 4 from CDS (26.6%, 0.10) (Table 2). Among the intronic GNMS markers, the one based on the ubiquitin gene (see Additional file 8) showed the maximum PIC value (0.86) followed by that on ribosomal protein S35 (0.84). The higher level of polymorphism we observed for the GNMS markers derived from the promoters, UTRs and introns are expected due to the presence of the most abundant and polymorphic class of GA- or AT-rich dinucleotide microsatellite repeat-motifs in these sequence components. Further, a comparative evaluation of the polymorphic potential of 225 GNMS markers distributed over 12 rice chromosomes with that of 600 rice microsatellite (RM)  series markers in two parental genotypes (Jaya and NPT-11) of a large mapping population revealed higher efficiency for GNMS markers (32%) over RM markers (19%) in detecting parental polymorphism (unpublished results). The GNMS markers, being more informative than the genic coding and random genomic microsatellite markers developed earlier would be of immediate use in efficient large-scale genotyping applications in rice . Twenty-six (46.4%) of the 56 GNMS markers showed polymorphism between the indica and japonica genotypes, while 22 (39.3%) revealed polymorphism among the indica genotypes. GNMS markers derived from the intronic sequences showed maximum inter sub-specific polymorphism as reported for intron length polymorphisms in rice . Sequencing of the amplicons obtained with the 8 GNMS markers, from the genic non-coding and coding sequence components that showed amplification for all the 18 rice genotypes, confirmed the presence of target repeat motifs (see Additional file 9).
Assessment of molecular genetic diversity among domesticated rice cultivar groups
Evolutionary significance of CNMS containing rice promoters
Using inter-genomic phylogenetic footprinting we detected 112 CNMS repeats (14.6% of the 767 microsatellites containing promoter sequences of rice genes) in the putative promoter sequences of orthologous genes among 5 cereal genomes (viz. barley, maize, rice, Sorghum, wheat; see Additional file 11) and 67 (8.7%) CNMS repeats in the promoters of orthologous genes of rice and A. thaliana (see Additional file 12). With intra-genomic phylogenetic footprinting we identified 45 (5.9%) CNMS markers in the promoters of paralogous rice genes (see Additional file 13). The CNMS markers identified among the 5 cereal genomes included 43 in the promoters between orthologous rice and maize genes, 26 between rice and barley, 28 between rice and wheat and 15 between rice and Sorghum (see Additional file 11). Results from intra- and inter-genomic phylogenetic footprinting comparisons showed that 11 CNMS motifs were conserved in the promoters of both orthologous and paralogous rice genes (see Additional file 14). The CT-rich dinucleotide repeat-motifs were the predominant microsatellite classes in the CNMS, which is consistent with the characteristics of pyrimidine-rich repeat distribution in the promoter regions of the rice genes as observed in our study. A low frequency of CNMS motifs indicated a relatively rapid evolution of rice promoters having such sequences, which could be due to functional constraints and rapid adaptive changes in the regulatory regions of homologous genes for imparting specific roles in gene regulation. This may be why most of the identified CNMS repeats were located in the orthologous and paralogous genes in the immediate upstream region (1 to 200 bp) of the transcription initiation site with preference for known regulatory binding sites as characterized by PLACE and PlantCARE (see Additional file 15). It is possible that these sequences are involved in gene regulation in response to environmental stimuli . For example, the CNMS motif (GA)n in the promoters of rice gene cytochrome P450, contained sequences similar to GAGA (AGAGAGAGA), a known regulatory element, which is involved in light responsive phototransduction regulation in plants. Complementary to (GA)n, the CNMS motif (CT)n contained a different regulatory element, C2C2-GATA (TCTCTCTCTCT), controlling similar light responsive gene regulation in the serine/threonine protein kinase gene. A comparative physical mapping of CNMS markers of rice chromosome 1 and homeologous chromosomes of 4 other cereal species (barley, maize, Sorghum, wheat) and one dicot species (A. thaliana) detected several collinear regions with complex chromosomal syntenic relationships (see Additional file 16), which provide clues to the role of identified CNMS markers in comparative genomics and in the understanding of evolutionary complexities in cereals and Arabidopsis.
Intra- and inter-specific molecular dating of divergence of CNMS containing rice promoter sequences
Species/domesticated rice cultivar groups
No. of paralogous/orthologous CNMS
Duplication event/time of divergence (Mya)
Rice from rice
Wheat from rice
Maize from rice
Sorghum from rice
Barley from rice
Arabidopsis from rice
indica vs. japonica
indica vs. short-grained aromatics
indica vs. long-grained aromatics
Short-grained vs. long-grained aromatics
Short-grained aromatics vs. japonica
Long-grained aromatics vs. japonica
We studied relative distribution of microsatellites in different sequence components of protein coding rice genes, designed 17,966 GNMS markers, including 4,288 hypervariable class I types from the promoter, 5'UTR, intronic and 3'UTR sequences and determined their occurrence and organization on the 12 rice chromosomes. The class I markers were bin-mapped to guide the selection of markers with genome wide distribution for various genotyping applications in rice. We demonstrated the utility of GNMS markers by their robust PCR amplification efficiency and high potential for detecting polymorphism over genic coding and random genomic microsatellite markers, and thus their immediate use in rice genetics, genomics and breeding. The unrooted phylogenetic tree constructed based on molecular diversity values of a set of GNMS markers in rice genotypes clearly established molecular genetic relationships among the domesticated rice cultivar groups, thereby suggesting their utility in defining varietal identity in commerce. The orthologous and paralogous CNMS markers identified in the rice promoters would be useful for comparative genome mapping and phylogenetic analysis in rice and other members of grass family
Accessing the genic non-coding sequences of the rice genome
The latest annotated 28,763 non-transposable element (TE)-related rice genes (individually for each of the 12 rice chromosomes) were acquired in FASTA format from the TIGR rice genome annotation database release 5.0 (24th Jan' 2007) using an ftp server . Of these, 25,447 genes were found to contain defined UTR sequences. A set of 6,512 of the 25,447 rice genes identified to have alternatively spliced isoforms were excluded from our analysis. To determine the density of microsatellites accurately, we screened 18,935 rice gene models representing only one splice form with defined UTRs, CDS and introns for further analyses.
Identification and characterization of promoter sequences
For identifying and characterizing the putative promoter sequences, we assessed the genomic FASTA sequences 1000 bp upstream of the transcription start site chromosome-wise individually for 18,935 rice genes and used the TSSP SoftBerry plant promoter prediction program . The results from 16,738 rice genes containing defined promoter sequences with a description of putative cis-regulatory elements were stored separately for the 12 rice chromosomes. These predicted promoter sequences were BLAST searched against the annotated 13,046 rice eukaryotic promoter database (EPD)  chromosome-wise and compared with major databases namely, PLACE  and PlantCARE  for the identification of transcription factor binding sites and cis-regulatory elements. Based on the BLAST results (with matching E value = 0 and bit score ≥ 500), 14,308 robust promoter sequences were finally identified in the whole rice genome for further analyses.
Mining of microsatellites and primer design
The genic non-coding sequences of 18,935 rice genes including 14,308 putative promoter sequences were searched for microsatellites as described earlier  and compared with those with the CDS in each of the 12 rice chromosomes. The nature, frequency and relative abundance of various repeat-motif classes including hypervariable class I (≥ 20 nucleotides) and potentially variable class II (12 to 20 nucleotides) types were determined individually for promoters, 5'UTRs, CDS, introns and 3'UTRs of the rice genes. We designed primers from the flanking sequences of the identified repeat-motifs in each of these 5 sequence components of rice genes as described earlier .
Distribution of GNMS markers in the rice genome
The specific physical location of class I GNMS markers designed from the promoters, 5'UTRs, introns and 3'UTRs of rice genes was determined based on their annotated physical positions (bp) on the rice chromosomes provided in the latest released TIGR rice pseudomolecule 5.0 database. Individual rice chromosomes were divided into 1 Mb interval sized bins and the class I GNMS markers were plotted separately for each of the 12 rice chromosomes according to their ascending order of physical location (bp).
Evaluation of amplification efficiency and polymorphic potential
The potential of GNMS markers to amplify the target sequence and detect polymorphism was evaluated using 15 markers we designed from the flanking sequences from each of the 5 sequence components (promoters, 5'UTRs, introns, 3'UTRs and CDS) of the rice genes. Genomic DNA was isolated from a set of 18 diverse rice genotypes (see Additional file 17) – 7 indica, 9 aromatic and 2 japonica rice genotypes – and used in PCR to amplify 75 GNMS markers. The amplified fragments were resolved in 10% native polyacrylamide gel using 0.5× TBE buffer (4 h at 220 V) and visualized under UV light after staining with GelStar (CAMBREX BioScience, USA). We used allelic data to estimate the number, range and distribution of amplified alleles, average polymorphic alleles per primer, percent polymorphism and PIC for all the amplified GNMS markers. The PIC value was calculated using the formula, PIC = 1 - ∑Pij2 , where Pij is the frequency of the jth allele for the ith locus summed across all alleles for the locus. Cluster analysis among the 18 rice genotypes was based on Nei and Li similarity coefficient  by using the un-weighted pair group method analysis (UPGMA) in TREECON  software package. We determined the confidence limit of clusters by 500 bootstrap-replicates and constructed an unrooted phylogenetic tree by bootstrap of 50% majority rule consensus. To confirm that the GNMS markers did amplify the expected microsatellite repeat-motifs, 8 markers from each of the promoter, 5'UTR, intron, 3'UTR and CDS regions of rice genes that amplified in all the 18 rice genotypes were purified and sequenced. The high quality sequences were aligned and further examined for the presence of predicted repeat motifs.
Detection and characterization of CNMS containing rice promoter sequences through intra- and inter-genomic phylogenetic footprinting
The predicted microsatellite containing promoter sequences of rice were BLAST searched against each other and with the 5' non-coding sequence regions of genes/expressed sequence tags annotated on completely sequenced bacterial artificial chromosomes anchored on the chromosomes and/bins of maize , Sorghum , wheat , barley  and Arabidopsis . The matching sequences were aligned using a VISTA sequence alignment algorithm program [56, 57] for identification and characterization of paralogous and orthologous CNMS containing promoters. A minimum percent nucleotide identity threshold of 70% and 20 bp as a minimal length criterion were considered significant in VISTA  for our analyses. The matching orthologous and paralogous CNMS containing rice promoter sequences were further characterized for known functional promoter regulatory elements using PLACE and PlantCARE software tools. The candidate CNMS containing rice promoters for cereal and A. thaliana genomes were identified. For comparative physical mapping, the physical positions (bp) of putative CNMS motifs on rice chromosome 1 (that carried more CNMS than other chromosomes) were determined and their physical order compared with that on homeologous chromosomes of 4 other cereals and A. thaliana.
Estimation of intra- and inter-specific CNMS divergence
The CNMS containing promoter sequences of orthologous and paralogous rice genes were polled into alignments of 100–200 bp on average and used as inputs in the baseml program within the PAML version of PAL2NAL software package  for estimating nucleotide substitution rates among the CNMS sequences of cereals and A. thaliana. For estimating substitution rates among the indica and japonica cultivar groups, the CNMS repeat-motifs containing high quality promoter sequences of 8 rice genes that amplified in all the 18 rice genotypes were analyzed as described above. The modal nucleotide substitution obtained for the CNMS containing rice promoter sequences were used to estimate time (T) since divergence among the 5 cereals and indica and japonica cultivar groups as T (Mya) = Ks/2λ, where λ = mean rate of synonymous substitutions equal to 1.243 synonymous substitutions per 109 years .
The authors are thankful to the Project Director, National Research Centre on Plant Biotechnology, IARI, New Delhi for providing the facilities. This work was partly funded by the Department of Biotechnology, Government of India. The authors also thank the Institute for Genomic Research (TIGR) for making available their databases and the Institute of Plant Genetics and Crop Research (IPK) for the availability of microsatellite search tool MISA.
- Hancock JM: Microsatellites and other simple sequences: genomic context and mutational mechanisms. Microsatellite: evolution and applications. Edited by: Goldstein DB, Schlotterer C. 1999, Oxford University Press, Oxford, U.K., 1-9.Google Scholar
- Young ET, Sloan JS, Riper KV: Trinucleotide repeats are clustered in regulatory genes in Saccharomyces cerevisiae. Genetics. 2000, 154: 1053-1068.PubMed CentralPubMedGoogle Scholar
- Kim GP, Colangelo L, Allegra C, Glebov O, Parr A, Hooper S, Williams J, Paik SM, Eaton L, King W, Wolmark N, Wieand HS, Ilan R: Prognostic role of microsatellite instability in colon cancer. Proc Am Soc Clin Oncol. 2001, 20: 1666-Google Scholar
- Li YC, Korol AB, Fahima T, Nevo E: Microsatellites within genes: Structure, Function, and Evolution. Mol Biol Evol. 2004, 21: 991-1007. 10.1093/molbev/msh073.View ArticlePubMedGoogle Scholar
- Streelman JT, Kocher D: Microsatellite variation associated with prolactin expression and growth of salt challenged Tilapia. Physiol Genomics. 2002, 9: 1-4.View ArticlePubMedGoogle Scholar
- Kenneson A, Zhang F, Hagedorn CH, Warren ST: Reduced FMRP and increased FMR1 transcription is proportionally associated with CGG repeat number in intermediate-length and permutation carriers. Hum Mol Genet. 2001, 10: 1449-1454. 10.1093/hmg/10.14.1449.View ArticlePubMedGoogle Scholar
- Cummings CJ, Zoghbi HY: Trinucleotide repeats: mechanisms and pathophysiology. Hum Genet. 2000, 1: 281-328.Google Scholar
- Tidow N, Boecker A, Schmidt H, Agelopoulos K, Boecker W, Buerger H, Brandt B: Distinct amplification of an untranslated regulatory sequence in the egfr gene contributes to early steps in breast cancer development. Cancer Res. 2003, 63: 1172-1178.PubMedGoogle Scholar
- Liquori CL, Ricker K, Moseley ML, Jacobsen JF, Kress W, Naylor SL, Day JW, Ranum LP: Myotonic dystrophy type 2 caused by a CCTG expansion in intron 1 of ZNF9. Science. 2001, 293: 864-867. 10.1126/science.1062125.View ArticlePubMedGoogle Scholar
- IRGSP: The map based sequence of rice genome. Nature. 2005, 436: 793-800. 10.1038/nature03895.View ArticleGoogle Scholar
- Zhang Z, Deng Y, Tan J, Hu S, Yu J, Xue Q: A genome-wide microsatellite polymorphism database for the indica and japonica rice. DNA Res. 2007, 14: 37-45. 10.1093/dnares/dsm005.PubMed CentralView ArticlePubMedGoogle Scholar
- Cho YG, Ishii T, Temnykh S, Chen X, Lipovich L, Park WD, Ayres N, Cartinhour S, McCouch SR: Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.). Theor Appl Genet. 2000, 100: 713-722. 10.1007/s001220051343.View ArticleGoogle Scholar
- Chabane K, Ablett GA, Cordeiro GM, Valkoun J, Henry RJ: EST versus genomic derived microsatellite markers for genotyping wild and cultivated barley. Genet Resour Crop Evol. 2005, 52: 903-909. 10.1007/s10722-003-6112-7.View ArticleGoogle Scholar
- Lawson MJ, Zhang L: Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genome. Genome Biol. 2006, 7: R14-10.1186/gb-2006-7-2-r14.PubMed CentralView ArticlePubMedGoogle Scholar
- Santi L, Wang Y, Stile MR, Berendzen K, Wanke D, Roig C, Pozzi C, Müller K, Müller J, Rohde W, Salamini F: The GA octodinucleotide repeat binding factor BBR participates in the transcriptional regulation of the homeobox gene Bkn3. Plant J. 2003, 34: 813-826. 10.1046/j.1365-313X.2003.01767.x.View ArticlePubMedGoogle Scholar
- Tagle DA, Koop BF, Goodman F, Slightom JL, Hess DL, Jones RT: Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and aminoacid sequences, developmental regulation and phylogenetic footprinting. J Mol Biol. 1988, 203: 439-455. 10.1016/0022-2836(88)90011-3.View ArticlePubMedGoogle Scholar
- Levy S, Hannennalli S, Workman C: Enrichment of regulatory signals in the conserved non-coding genomic sequences. Bioinformatics. 2001, 17: 871-877. 10.1093/bioinformatics/17.10.871.View ArticlePubMedGoogle Scholar
- Guo H, Moose SP: Conserved non-coding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell. 2003, 15: 1143-1158. 10.1105/tpc.010181.PubMed CentralView ArticlePubMedGoogle Scholar
- Colinas J, Birnbaum K, Benfey PN: Using cauliflower to find conserved non-coding regions in Arabidopsis. Plant Physiol. 2002, 129: 451-454. 10.1104/pp.002501.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang L, Zuo K, Zhang F, Cao Y, Wang J, Zhang Y, Sun X, Tang K: Conservation of non-coding microsatellites in plants: implication for gene regulation. BMC Genomics. 2006, 7: 323-10.1186/1471-2164-7-323.PubMed CentralView ArticlePubMedGoogle Scholar
- Fujimori S, Washio T, Higo K, Ohtomo Y, Murakami K, Matsubara K, Kawai J, Carninci P, Hayashizaki Y, Kikuchi S, Tomita M: A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription. FEBS Lett. 2003, 554: 17-22. 10.1016/S0014-5793(03)01041-X.View ArticlePubMedGoogle Scholar
- Yu J, Hu S, Wang J, Wong GK-S, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002, 296: 79-92. 10.1126/science.1068037.View ArticlePubMedGoogle Scholar
- Stallings RL: Distribution of trinucleotide microsatellites in different categories of mammalian genomic sequence: implications for human genetic diseases. Genomics. 1994, 21: 116-121. 10.1006/geno.1994.1232.View ArticlePubMedGoogle Scholar
- Metzgar D, Bytof J, Wills C: Selection against frameshift mutations limits microsatellite expansion in coding DNA. Genome Res. 2000, 10: 72-80.PubMed CentralPubMedGoogle Scholar
- Varshney RK, Graner A, Sorrells ME: Genic microsatellite markers in plants features and applications. Trends Biotech. 2005, 23: 48-55. 10.1016/j.tibtech.2004.11.005.View ArticleGoogle Scholar
- Parida SK, Rajkumar KA, Dalal V, Singh NK, Mohapatra T: Unigene derived microsatellite markers for the cereal genomes. Theor Appl Genet. 2006, 112: 808-817. 10.1007/s00122-005-0182-1.View ArticlePubMedGoogle Scholar
- Temnykh S, Declerk G, Lukashover A, Lipovich L, Cartinhour S, McCouch SR: Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length-variation, transposon associations and genetic marker potential. Genome Res. 2001, 11: 1441-1452. 10.1101/gr.184001.PubMed CentralView ArticlePubMedGoogle Scholar
- Xu G, Goodridge AG: A CT repeat in the promoter of the chicken malic enzyme gene is essential for function at an alternative transcription site. Arch Biochem Biophys. 1998, 358: 83-91. 10.1006/abbi.1998.0852.View ArticlePubMedGoogle Scholar
- Yu JK, Dake TM, Singh S, Benscher D, Li W, Gill B, Sorrells ME: Development and mapping of EST derived simple sequence repeat (SSR) markers for hexaploid wheat. Genome. 2004, 47: 805-818. 10.1139/g04-057.View ArticlePubMedGoogle Scholar
- Feltus FA, Wan J, Schulze SR, Estill JC, Jiang N, Paterson AH: An SNP resource for rice genetics and breeding based on subspecies indica and japonica genome alignments. Genome Res. 2004, 14: 1812-1819. 10.1101/gr.2479404.PubMed CentralView ArticlePubMedGoogle Scholar
- Shen YJ, Jiang H, Jin JP, Zhang ZB, Xi B, He YY, Wang G, Wang C, Qian L, Li X, Yu QB, Liu HJ, Chen DH, Gao JH, Huang H, Shi TL, Yang ZN: Development of genome wide DNA polymorphism database for map-based cloning of rice genes. Plant Physiol. 2004, 135: 1198-1205. 10.1104/pp.103.038463.PubMed CentralView ArticlePubMedGoogle Scholar
- Ni J, Colowit PM, Mackill DJ: Evaluation of genetic diversity in rice subspecies using microsatellite markers. Crop Sci. 2002, 42: 601-607.View ArticleGoogle Scholar
- Nagaraju J, Kathirvel M, Kumar R, Siddiq EA, Hasnain SE: Genetic analysis of traditional and evolved Basmati and non-Basmati rice varieties by using fluorescence based ISSR-PCR and SSR markers. Proc Natl Acad Sci. 2002, 99: 5836-5841. 10.1073/pnas.042099099.PubMed CentralView ArticlePubMedGoogle Scholar
- Garris AJ, Tai TH, Coburn J, Kresovich S, McCouch SR: Genetic structure and diversity in Oryza sativa L. Genetics. 2005, 169: 1631-1638. 10.1534/genetics.104.035642.PubMed CentralView ArticlePubMedGoogle Scholar
- McCouch SR, Teytelman L, Xu Y, Lobos KB, Clare K, Stein L: Development of 2,240 new SSR markers for rice (Oryza sativa L.). DNA Res. 2002, 9: 199-207. 10.1093/dnares/9.6.199.View ArticlePubMedGoogle Scholar
- Pessoa-Filho M, Belo A, Alcochete AAN, Rangel PHN, Ferreira ME: A set of multiplex panels of microsatellite markers for rapid molecular characterization of rice accessions. BMC Plant Biol. 2007, 7: 23-10.1186/1471-2229-7-23.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang X, Zhao X, Jhu J, Wu W: Genome-wide investigation of intron length polymorphisms and their potential as molecular markers in rice (Oryza sativa L.). DNA Res. 2005, 12: 417-427. 10.1093/dnares/dsi019.View ArticlePubMedGoogle Scholar
- Caicedo AL, Williamson SH, Hernandez RD, Boyko A, Fledel-Alon A, York TL, Polato NR, Olsen KM, Nielsen R, McCouch SR, Bustamante CD, Purugganan MD: Genome-wide patterns of nucleotide polymorphism in domesticated rice. PLoS Genet. 2007, 3: e163-10.1371/journal.pgen.0030163.PubMed CentralView ArticleGoogle Scholar
- Wolfe KH, Gouy M, Yang YW, Sharp PM, Li WH: Date of the monocot-dicot divergence estimated for chloroplast DNA sequence data. Proc Natl Acad Sci. 1989, 86: 6201-6205. 10.1073/pnas.86.16.6201.PubMed CentralView ArticlePubMedGoogle Scholar
- Sang T, Song Ge: The Puzzle of rice domestication. J Int Plant Biol. 2007, 49: 760-768. 10.1111/j.1744-7909.2007.00510.x.View ArticleGoogle Scholar
- Normile D: Archaeology-Yangtze seen as earliest rice site. Science. 1997, 275: 309-10.1126/science.275.5298.309.View ArticleGoogle Scholar
- Khush GS: Origin, dispersal, cultivation and variation of rice. Plant Mol Biol. 1997, 35: 25-34. 10.1023/A:1005810616885.View ArticlePubMedGoogle Scholar
- TIGR Rice Genome Database. [http://rice.plantbiology.msu.edu/]
- TSSP plant promoter prediction program. [http://www.softberry.com]
- Schmid CD, Praz V, Delorenzi M, Perier R, Bucher P: The eukaryotic promoter database EPD: the impact of in silico primer extension. Nucl Acids Res. 2004, 32: D82-D85. 10.1093/nar/gkh122.PubMed CentralView ArticlePubMedGoogle Scholar
- PLACE. [http://www.dna.affrc.go.jp/PLACE/]
- PlantCARE. [http://bioinformatics.psb.ugent.be/webtools/plantcare/html/]
- Anderson JA, Churchill GA, Autrique JE, Tanksley SD, Sorrells ME: Optimizing parental selection for genetic linkage maps. Genome. 1993, 36: 181-186. 10.1139/g93-024.View ArticlePubMedGoogle Scholar
- Nei M, Li WH: Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci. 1979, 76: 5269-5273. 10.1073/pnas.76.10.5269.PubMed CentralView ArticlePubMedGoogle Scholar
- Van de Peer , De Wachter R: TREECON for Windows: a software package for construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput Applic Biosci. 1994, 10: 569-570.Google Scholar
- Maize genome project. [http://www.maizesequence.org]
- TIGR BLAST server. [http://rice.plantbiology.msu.edu/blast.shtml]
- GrainGene database. [http://wheat.pw.usda.gov]
- Barley Genomics. [http://barleygenomics.wsu.edu]
- The Arabidopsis Information Resource. [http://www.arabidopsis.org]
- VISTA alignment algorithm program. [http://www.gsd.lbl.gov/vista]
- Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I: VISTA: Visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000, 16: 1046-1047. 10.1093/bioinformatics/16.11.1046.View ArticlePubMedGoogle Scholar
- Suyama M, Torrents D, Bork P: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucl Acids Res. 2006, 34: W609-612. 10.1093/nar/gkl315.PubMed CentralView ArticlePubMedGoogle Scholar
- Muse SV: Examining rates and patterns of nucleotide substitution in plants. Plant Mol Biol. 2000, 42: 25-43. 10.1023/A:1006319803002.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.