Conservation of noncoding microsatellites in plants: implication for gene regulation
© Zhang et al; licensee BioMed Central Ltd. 2006
Received: 21 August 2006
Accepted: 25 December 2006
Published: 25 December 2006
Microsatellites are extremely common in plant genomes, and in particular, they are significantly enriched in the 5' noncoding regions. Although some 5' noncoding microsatellites involved in gene regulation have been described, the general properties of microsatellites as regulatory elements are still unknown. To address the question of microsatellites associated with regulatory elements, we have analyzed the conserved noncoding microsatellite sequences (CNMSs) in the 5' noncoding regions by inter- and intragenomic phylogenetic footprinting in the Arabidopsis and Brassica genomes.
We identified 247 Arabidopsis-Brassica orthologous and 122 Arabidopsis paralogous CNMSs, representing 491 CT/GA and CTT/GAA repeats, which accounted for 10.6% of these types located in the 500-bp regions upstream of coding sequences in the Arabidopsis genome. Among these identified CNMSs, 18 microsatellites show high conservation in the regulatory regions of both orthologous and paralogous genes, and some of them also appear in the corresponding positions of more distant homologs in Arabidopsis, as well as in other plants. A computational scan of CNMSs for known cis-regulatory elements showed that light responsive elements were clustered in the region of CT/GA repeats, as well as salicylic acid responsive elements in the (CTT)n/(GAA)n sequences. Patterns of gene expression revealed that 70–80% of CNMS (CTT)n/(GAA)n associated genes were regulated by salicylic acid, which was consistent with the prediction of regulatory elements in silico.
Our analyses showed that some noncoding microsatellites were conserved in plants and appeared to be ancient. These CNMSs served as regulatory elements involved in light and salicylic acid responses. Our findings might have implications in the common features of the over-represented microsatellites for gene regulation in plant-specific pathways.
Microsatellites, as one of the major repeat classes, are extremely common in eukaryotic genomes . They are generally thought to result from the mutation effects of replication slippage . Different from the origin of microsatellites from repetitive DNA in animals , plant microsatellites show a significant association with nonrepetitive DNA . They can be found abundantly within or near genes in plant genomes, and in particular, some types are significantly enriched within the 5' noncoding regions of plant genes [5–7]. For example, in Arabidopsis thaliana, this feature is mostly attributable to the fact that CT/GA and CTT/GAA repeats are more frequently found in 5'-flanks than in other genomic regions, suggesting that they can potentially function as factors in regulating gene expression .
For quite a long time, microsatellites were only considered as genetic markers in DNA fingerprinting and diversity studies due to the extensive length polymorphisms. However, recent findings show that some of them act as cis-regulatory elements which can be recognized by transcription factors [8, 9]. It has been well known for so-called GAGA elements, comprising the dinucleotide repeat sequence (GA)n to be present in promoters regulating numerous developmental genes in animals [10, 11]. Similarly, the (GA)n sequences in regulatory regions of some plant genes can also be recognized by GAGA-binding factors [12–14], and more generally, the GA-rich element, a more complex 9 base pairs (bp) based (GA)n repeat, has been shown to have protein-binding affinity . Another major microsatellite in plants, the trinucleotide repeat sequence (GAA)n presented within 5'UTR of ntp303 was found important in the modulation of transcription and translation efficiency . Furthermore, some unusual phenotypic variations were found to be associated with the length of 5' noncoding microsatellites. A typical example was reported by Bao and his colleagues that variation in the number of CT/GA repeats in the 5'UTR of the waxy gene was correlated with amylose content in rice . Although the mechanism is still unclear, the microsatellite length polymorphism is thought to affect the expression of the related genes of amylose synthesis.
Regions of DNA involved in gene regulation are expected to exhibit sequence conservation between related species over evolutionary time due to functional constraints. It has been recognized that comparative analyses of noncoding DNA sequences in multiple species, known as phylogenetic footprinting, can help identify conserved putative regulatory elements . Successful identification of conserved noncoding sequences in comparisons among different grass genomes and cruciferous species, as well as between closely related genomic sequences from Arabidopsis and Brassica species has provided some good references for discovery of Conserved Noncoding Microsatellite Sequences (CNMSs) by phylogenetic footprinting in plants [19–22].
If microsatellites are important for regulating gene expression, they should be conserved in the homologous promoters through gene duplication or speciation during plant evolution. To address the question of microsatellites associated with gene regulatory elements, we used inter- and intragenomic phylogenetic footprinting to analyze the dominant microsatellites in the 5' noncoding regions of Arabidopsis and Brassica oleracea genes for CNMSs. About 10% of 5' noncoding CT/GA and CTT/GAA repeats are conserved in the Arabidopsis genome, and they are preferentially involved in gene regulation in plant-specific pathways.
Distribution of microsatellites in different genomic regions
Conservation of microsatellites in Arabidopsis
Regulatory sequence elements within promoter DNA are often short, orientation independent and contain frequent gaps of variable size. Thus, we determined the conserved noncoding microsatellite sequences as candidate regulatory elements based upon the following criteria: that there were at least 6-bp overlapping regions of the corresponding microsatellites between the aligned sequences. According to the criteria, we identified 247 Arabidopsis-Brassica orthologous CNMSs and 122 Arabidopsis paralogous CNMSs [see Additional file 1], involving 491 CT/GA and CTT/GAA repeats respectively (Table 1), which accounted for 10.6% of these types located in the 500-bp regions upstream of coding sequences in the Arabidopsis genome. These CNMSs do not randomly occur in different noncoding regions and they tend to be found more frequently near the initiation codon (Figure 2A, 2B).
Summary of Arabidopsis-Brassica and Arabidopsis-Arabidopsis CNMSs
Arabidopsis CNMS genes
Evolution of conserved microsatellites in Arabidopsis
Ultra-CNMSs in Arabidopsis-Brassica orthologs and Arabidopsis paralogs
WD-40 repeat protein
hydroxyproline-rich glycoprotein protein
ACT domain protein
TSO1-like CXC domain protein
phosphatidylinositol-4-phos phate 5-kinase-related
protein phosphatase 2C
heat shock protein
vacuolar ATP synthase
amino acid permease
signal peptidase subunit
Conservation of microsatellites in plants
Annotation enrichment and depletion of CNMS associated genes
CNMSs as regulatory elements in plants
Prediction of CNMSs serve as regulatory elements in silico
part of a light responsive element [35,37]
Binding site for GAGA-binding factor, and Gbp is a light-responsive gene .
salicylic acid responsive element 
part of a light responsive element [34,35].
CT-rich motif found in a 60-nt region downstream of the transcription start site of the CaMV 35S RNA; Can enhance gene expression .
salicylic acid responsive element 
Microsatellites (CT)n/(GA)n and (CTT)n/(GAA)n are well presented in the Arabidopsis genome, and in particular, they are preferentially located within the 5' noncoding regions. In this study, we identified 491 conserved CT/GA and CTT/GAA repeats for candidate regulatory elements by inter- and intragenomic phylogenetic footprinting. These CNMSs tend to occur within these regions near the initiation codon with the preference of CT and CTT motifs, which are consistent with the characteristic of pyrimidine-rich repeat distribution in these regions [5, 7]. Another striking feature of CNMS distribution is that they are rarely found in the peri-centromeric regions; in contrast, their related genes are always clustered in chromosome arms (data not shown). The reasons for the absence of CNMS on peri-centromeric regions are still unclear, but CNMS associated genes occurring in clusters on chromosome arms is probably attributable to co-expression.
Microsatellites generally evolve rapidly, but there are about 10% of 5' noncoding CT/GA and CTT/GAA repeats which show high conservation in occurrences and appear to be ancient. In particular, the Ultra-CNMSs have been under purifying selection for more than 42 Myr, and some of them for at least 170 Myr. This conservation may be explained by function constraint so that many homologous genes have the corresponding microsatellite sequences in their regulatory regions. Most microsatellites of CT/GA and CTT/GAA types seem to be originated by recent mutations under positive selection [4, 7], which lead to the significant over-representation of microsatellites in the 5' noncoding regions compared with other genomic fractions. The reasons of positive selection for some repeat occurrences are still unknown. However, at least, they may provide opportunities for rapid adaptive changes in these regulatory regions or play specific roles in gene regulation.
It is well known that intergenomic phylogenetic footprinting is an effective method for the discovery of regulatory elements in a set of orthologous noncoding regions from multiple species [18–22]. In plant genomes, intragenomic phylogenetic footprinting represents another powerful strategy to detect regulatory elements due to the facts that most plant genomes are rich in duplicated genes and large fractions of these gene pairs share transcriptional characteristics . Although detection of the full complement of cis-elements is not feasible by this approach due to potential acquisition and loss of individual regulatory elements between duplicated promoters, we can readily identify several specific regulatory elements which show high conservation in duplicated genes. Using this approach, we have successfully identified 122 Arabidopsis CNMSs as candidate regulatory elements of plant-specific function. Most of paralogous CNMSs were originated from the recent polyploidization event before the divergence between Arabidopsis and Brassica , implying that they might be conserved with their counterparts in Brassica. We compared the data generated by inter- and intragenomic phylogenetic footprinting and found 18 CNMSs highly conserved in both orthologous and paralogous sequences. The number of the identified ultra-CNMSs may be underestimated for the incomplete reference genome sequences of Brassica or the false orthologous relationships. These conserved microsatellites occurring among three or more homologous genes provides greater evidence that these CNMS are likely to be significant in gene regulation.
Functional annotation showed that CNMS associated genes were obviously depleted for DNA metabolism, such as DNA replication, DNA recombination and DNA repair. It is possible that genes that are essential for survival, lack CNMSs within their 5' noncoding regions because these genes do not need some specific regulatory elements. In contrast, these CNMS genes are preferentially associated with regulation of transcription in plants. CNMSs serve as regulatory elements and their related genes can be responsive to one or more forms of environmental stimuli (Table 3). The functional biases imply that CNMS associated genes (e.g. transmembrane receptor kinase genes and transcription factor genes) encoding proteins are involved in upstream pathways of defense responses in plants.
Although GAGA elements are known to be involved in the regulation of numerous developmental genes in animals [10, 11], we believe that CNMSs (CT)n/(GA)n are likely to be associated with transcriptional regulation in light signaling pathways in plants . These CNMSs are often found in a number of different light-regulated genes [12, 41]. Although expression of most CNMS (CT)n/(GA)n associated genes was not significantly changed with light/dark transitions, three Ultra-CNMSs related genes (At5g52430, Atlg21920 and At3g62650) were obviously induced with longer periods of darkness according to microarray gene expression data of 7800 unique Arabidopsis genes . This was consistent with the fact that about 9% of these CNMS genes were significantly down-regulated, while only 2% of them were up-regulated for light by a whole-genome expression analysis in seedling of Arabidopsis . It is possible that the expression level changes of most CNMS (CT)n/(GA)n associated genes are not obvious under light since they are always in upstream of related pathways. However, CNMSs (CT)n/(GA)n, at least parts of them, may be the binding sites for trans-acting regulators involved in light signaling pathways and their associated genes can be induced under darkness.
Salicylic acid is well known as an important signaling molecule involved in both locally and systemically induced disease resistance responses . Many salicylic acid responsive genes have been found in plant defense pathways. The CNMS (CTT)n/(GAA)n associated genes exhibit distinct expression characters with salicylic acid treatment, implying that they may be associated with a range of different stresses . CNMSs (CTT)n/(GAA)n as regulatory elements regulating gene expression are associated with the repeat number in salicylic acid signaling pathways. They may not act as isolated transcription factor binding sites to regulate gene expression. Instead, they are likely to co-operate with other elements to perform complex regulatory functions in transcription. Perhaps some of them may perform roles in RNA interference by forming RNA duplexes with complementary antisense microsatellite sequences, which lead to quite a few CNMS genes whose transcripts are undetectable in Arabidopsis leaves.
Microsatellites (CT)n/(GA)n and (CTT)n/(GAA)n are preferentially associated the 5' noncoding regions in the Arabidopsis genome. Parts of them are conserved among the homologous genes and appear to be ancient. The computational prediction and gene expression analysis indicated that CNMSs (CT)n/(GA)n and (CTT)n/(GAA)n acted as regulatory elements involved in light and salicylic acid responses. From our analysis, the presence of CT/GA and CTT/GAA repeats in regulatory regions may be particularly useful as a guide for further experiments of plant regulatory networks in response to environmental stimulus.
The Arabidopsis plants were grown in soil in a growth chamber at 20°C with 8 hours of light for 40 days. Plants were sprayed to run-off with 1 mM salicylic acid in 0.5% dimethyl sulfoxide (DMSO) for different time scales. One, four, twelve and forty-eight hours post-treatment, leaves were cut and harvested respectively, quick-frozen in liquid nitrogen, then stored at -80°C. Total RNA was later extracted using Plant RNA Mini Kit (Watson Biotechnologies INC., China).
Sequence data sources
The annotated sequences of the five chromosomes of Arabidopsis (accession numbers: NC_003070, NC_003071, NC_003074, NC_003075, and NC_003076, updated 25-JAN-2005) were downloaded from the Genomes Division of GenBank [45, 46]. Intergenic regions were defined as being a part of DNA from the end of the last exon of one gene to the beginning of the first exon of the following gene. A set of 16223 full-length cDNA sequences containing both 5' and 3'UTRs for Arabidopsis was extracted from the TAIR database . The preliminary sequences of Brassica genome were obtained from The Institute for Genomic Research website .
Identification of orthologous and paralogous gene pairs
To identify putative Arabidopsis-Brassica orthologous gene sets, each preliminary sequence from Brassica was searched against 1-kb sequences (fragments from the position -500 to +500 relative to the translation initiation) of all genes from Arabidopsis using BLASTN  and then the fragments from Brassica were clustered according to the best match gene of the Arabidopsis genome. Conversely, each 1-kb gene sequence from Arabidopsis was searched against the contigs from Brassica. Two sequences were defined as orthologs if each of them was the best hit of the other in the aligned regions and if the expect value (E) was <le-10. A list of the identified Arabidopsis-Brassica orthologs in the study is provided as supplementary data [see Additional file 2].
For identifying the paralogous gene pairs from a recently common ancestor in the Arabidopsis genome, each annotated coding sequence was searched against all other coding sequences using BLASTN. The best pair was considered significant if each of them was the best hit of the other and the expect value was <le-10. A file of the list of the paralogous gene pairs is included as supplementary data [see Additional file 3]. To avoid the negative conservation of microsatellites caused by the effects of insufficient randomizing mutations, the tandemly repeated gene pairs separated by less than 25 intermediate genes were ignored in further analysis.
Microsatellites were found in sequences using the modified Sputnik repeat-finder . Di-and trinucleotide repeats were identified when a total size of at least 12-bp, allowing up to about 10% deviation from a perfect repeat. Repeat motifs consisting of different frames (e.g. GAA, AGA and AAG) were regarded as the same type of repeat.
Identification of CNMSs
Because gene fragments of Brassica were derived from preliminary contigs with no annotated open reading frames, each pair of Arabidopsis-Brassica sequences were aligned using DiAlign2 with translation option to identify the 5' noncoding sequences and coding regions in the Brassica orthologs . The 5' noncoding sequence pairs were aligned using DiAlign2 for finding conserved microsatellites. To exclude nonspecific alignments, a stringent threshold parameter of 3 was used. The CNMSs were identified when the corresponding loci had at least 6-bp overlapping sequences between the aligned microsatellite sequences.
Selection of random data sets
To ensure that CNMSs were not to occur by chance, we used two different datasets of random pairs as negative controls to validate the results. One control dataset contained 1000 random pairs of 500-bp upstream noncoding sequences in the Arabidopsis genome, and another control dataset of 1000 pairs was randomly generated from the 500-bp sequences of Arabidopsis genomic DNA fragments. The reference dataset of equal data size was randomly selected from the 500-bp paralogous noncoding sequence pairs in Arabidopsis. The 1000 paralogous pairs, the 1000 shuffled pairs of noncoding sequences and the 1000 random pairs of genomic sequences were respectively referred as dataset 1, dataset 2 and dataset 3 in further analysis. Similarly, three corresponding datasets of Arabidopsis and Brassica sequence pairs were generated with the same data size. The dataset 1 consisted of 1000 Arabidopsis-Brassica orthologous pairs of 5' noncoding sequences, and the dataset 2 contained 1000 random pairs of Arabidopsis and Brassica upstream noncoding sequences, and the dataset 3 of 1000 pairs was randomly generated from the Arabidopsis and Brassica genomic DNA sequences. The same criteria of CNMS detection was applied in the test. Occurrences of CNMSs were analyzed in analogous manner for 10 different random sets with equal data size.
Estimation of duplication and speciation time
We used the level of synonymous substitution of CNMS associated coding sequences to estimate the Ks of CNMSs. For each pair of CNMS associated genes, the two protein sequences were aligned by ClustalW, and the resulting alignment was then used as a guide to align the nucleotide sequences . After removing gaps, the level of synonymous substitution was estimated using the yn00 program in PAML . The time of divergence (T), between two sequences was calculated from this as T = Ks/2λ, where Ks is the fraction of synonymous substitutions per synonymous site and λ is the mean rate of synonymous substitution. The estimate value for λ in dicots is 1.5 synonymous substitutions per 108 years .
Estimation of gene expression level
Gene expression level was estimated using the data from the Massively Parallel Signature Sequencing (MPSS) database of Arabidopsis [54, 55]. The MPSS data of three different libraries was generated from untreated leaves and treated leaves 4 and 52 hours after salicylic acid treatment, respectively. For the three libraries, a total of 9,081,200 17-bp signatures were obtained in multiple sequencing runs and in two sequencing frames. The abundance for each signature was normalized to transcripts per million (TPM) to facilitate comparisons across libraries.
RT-PCR of Arabidopsis CNMS associated genes was conducted using the one-step RNA PCR kit (TaKaRa) with gene specific primers [see Additional file 4]. The 0.5 μg total RNA was used as the template to be amplified with the following program: an initial 50°C for 30 min and 94°C for 2 min, followed by 25 cycles of 94°C for 30s, 54°C for 30s and 72°C for 1 min. The house-keeping gene actin 2 (At3gl8780) was used as an internal control in RT-PCR reaction.
This research was supported by National Sciences Foundation of China (No. 30600348), and China '973' project. Preliminary sequence data was obtained from The Institute for Genomic Research website. Sequencing of Brassica oleracea was funded by the "National Science Foundation".
- Toth G, Gaspari Z, Jurka J: Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 2000, 10: 967-981. 10.1101/gr.10.7.967.PubMedPubMed CentralView ArticleGoogle Scholar
- Levinson G, Gutman GA: Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol Biol Evol. 1987, 4: 203-221.PubMedGoogle Scholar
- Nadir E, Margalit H, Gallily T, Ben-Sasson SA: Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications. Proc Natl Acad Sci USA. 1996, 93: 6470-6475. 10.1073/pnas.93.13.6470.PubMedPubMed CentralView ArticleGoogle Scholar
- Morgante M, Hanafey M, Powell W: Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes. Nat Genet. 2002, 30: 194-200. 10.1038/ng822.PubMedView ArticleGoogle Scholar
- Fujimori S, Washio T, Higo K, Ohtomo Y, Murakami K, Matsubara K, Kawai J, Carninci P, Hayashizaki Y, Kikuchi S, Tomita M: A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription. FEBS Lett. 2003, 554: 17-22. 10.1016/S0014-5793(03)01041-X.PubMedView ArticleGoogle Scholar
- Lawson MJ, Zhang L: Distinct patterns of SSR distribution in the Arabidopsis thaliana and rice genomes. Genome Biol. 2006, 7: R14-10.1186/gb-2006-7-2-r14.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang LD, Yuan DJ, Yu SW, Li ZG, Cao YF, Miao ZQ, Qian HM, Tang KX: Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics. 2004, 20: 1081-1086. 10.1093/bioinformatics/bth043.PubMedView ArticleGoogle Scholar
- Iglesias AR, Kindlund E, Tammi M, Wadelius C: Some microsatellites may act as novel polymorphic cis-regulatory elements through transcription factor binding. Gene. 2004, 341: 149-165. 10.1016/j.gene.2004.06.035.PubMedView ArticleGoogle Scholar
- Martin P, Makepeace K, Hill SA, Hood DW, Moxon ER: Microsatellite instability regulates transcription factor binding and gene expression. Proc Natl Acad Sci USA. 2004, 102: 3800-3804. 10.1073/pnas.0406805102.View ArticleGoogle Scholar
- Bevilacqua A, Fiorenza MT, Mangia F: A developmentally regulated GAGA box-binding factor and Sp1 are required for transcription of the hsp70.1 gene at the onset of mouse zygotic genome activation. Development. 2000, 127: 1541-1551.PubMedGoogle Scholar
- Busturia A, Lloyd A, Bejarano F, Zavortink M, Xin H, Sakonju S: The MCP silencer of the Drosophila Abd-B gene requires both pleiohomeotic and GAGA factor for the maintenance of repression. Development. 2001, 128: 2163-2173.PubMedGoogle Scholar
- Sangwan I, O'Brian MR: Identification of a soybean protein that interacts with GAGA element dinucleotide repeat DNA. Plant Physiol. 2002, 129: 1788-1794. 10.1104/pp.002618.PubMedPubMed CentralView ArticleGoogle Scholar
- Santi L, Wang Y, Stile MR, Berendzen K, Wanke D, Roig C, Pozzi C, Muller K, Muller J, Rohde W, Salamini F: The GA octodinucleotide repeat binding factor BBR participates in the transcriptional regulation of the homeobox gene Bkn3. Plant J. 2003, 34: 813-826. 10.1046/j.1365-313X.2003.01767.x.PubMedView ArticleGoogle Scholar
- Meister RJ, Williams LA, Monfared MM, Gallagher TL, Kraft EA, Nelson CG, Gasser CS: Definition and interactions of a positive regulatory element of the Arabidopsis INNER NO OUTER promoter. Plant J. 2004, 37: 426-438. 10.1046/j.1365-313X.2003.01971.x.PubMedView ArticleGoogle Scholar
- Kooiker M, Airoldi CA, Losa A, Manzotti PS, Finzi L, Kater MM, Colombo L: BASIC PENTACYSTEINE1, a GA binding protein that induces conformational changes in the regulatory region of the homeotic Arabidopsis gene SEEDSTICK. Plant Cell. 2005, 17: 722-729. 10.1105/tpc.104.030130.PubMedPubMed CentralView ArticleGoogle Scholar
- Hulzink RJ, de Groot PF, Croes AF, Quaedvlieg W, Twell D, Wullems GJ, Van Herpen MM: The 5'-untranslated region of the ntp303 gene strongly enhances translation during pollen tube growth, but not during pollen maturation. Plant Physiol. 2002, 129: 342-353. 10.1104/pp.001701.PubMedPubMed CentralView ArticleGoogle Scholar
- Bao S, Corke H, Sun M: Microsatellites in starch-synthesizing genes in relation to starch physicochemical properties in waxy rice (Oryza sativa L.). Theor Appl Genet. 2002, 105: 898-905. 10.1007/s00122-002-1049-3.PubMedView ArticleGoogle Scholar
- Hardison RC: Conserved noncoding sequences are reliable guides to regulatory elements. Trends Genet. 2000, 16: 369-372. 10.1016/S0168-9525(00)02081-3.PubMedView ArticleGoogle Scholar
- Guo H, Moose SP: Conserved noncoding sequences among cultivated cereal genomes identify candidate regulatory sequence elements and patterns of promoter evolution. Plant Cell. 2003, 15: 1143-1158. 10.1105/tpc.010181.PubMedPubMed CentralView ArticleGoogle Scholar
- Inada DC, Bashir A, Lee C, Thomas BC, Ko C, Goff SA, Freeling M: Conserved noncoding sequences in the grasses. Genome Res. 2003, 13: 2030-2041. 10.1101/gr.1280703.PubMedPubMed CentralView ArticleGoogle Scholar
- Hong RL, Hamaguchi L, Busch MA, Weigel D: Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing. Plant Cell. 2003, 15: 1296-1309. 10.1105/tpc.009548.PubMedPubMed CentralView ArticleGoogle Scholar
- Colinas J, Birnbaum K, Benfey PN: Using cauliflower to find conserved non-coding regions in Arabidopsis. Plant Physiol. 2002, 129: 451-454. 10.1104/pp.002501.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang YW, Lai KN, Tai PY, Li WH: Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J Mol Evol. 1999, 48: 597-604. 10.1007/PL00006502.PubMedView ArticleGoogle Scholar
- Blanc G, Hokamp K, Wolfe KH: A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003, 13: 137-144. 10.1101/gr.751803.PubMedPubMed CentralView ArticleGoogle Scholar
- Koch MA, Haubold B, Mitchell-Olds T: Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol Biol Evol. 2000, 17: 1483-1498.PubMedView ArticleGoogle Scholar
- Reyes JC, Muro-Pastor MI, Florencio FJ: The GATA family of transcription factors in Arabidopsis and rice. Plant Physiol. 2004, 134: 1718-1732. 10.1104/pp.103.037788.PubMedPubMed CentralView ArticleGoogle Scholar
- Soltis PS, Soltis DE, Chase MW: Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature. 1999, 402: 402-404. 10.1038/46528.PubMedView ArticleGoogle Scholar
- The Gene Ontology Consortium: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMed CentralView ArticleGoogle Scholar
- Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B: GOToolBox: functional investigation of gene datasets based on Gene Ontology. Genome Biol. 2004, 5: R101-10.1186/gb-2004-5-12-r101.PubMedPubMed CentralView ArticleGoogle Scholar
- Lescot M, Dehais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouze P, Rombauts S: PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res. 2002, 30: 325-327. 10.1093/nar/30.1.325.PubMedPubMed CentralView ArticleGoogle Scholar
- PlantCARE database. [http://bioinformatics.psb.ugent.be/webtools/plantcare/html/]
- Higo K, Ugawa Y, Iwamoto M, Korenaga T: Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 1999, 27: 297-300. 10.1093/nar/27.1.297.PubMedPubMed CentralView ArticleGoogle Scholar
- PLACE database. [http://www.dna.affrc.go.jp/PLACE/signalscan.html]
- Bolle C, Kusnetsov VV, Herrmann RG, Oelmuller R: The spinach AtpC and AtpD genes contain elements for light-regulated, plastid-dependent and organ-specific expression in the vicinity of the transcription start sites. Plant J. 1996, 9: 21-30. 10.1046/j.1365-313X.1996.09010021.x.PubMedView ArticleGoogle Scholar
- Arguello-Astorga GR, Herrera-Estrella LR: Ancestral multipartite units in light-responsive plant promoters have structural features correlating with specific phototransduction pathways. Plant Physiol. 1996, 112: 1151-1166. 10.1104/pp.112.3.1151.PubMedPubMed CentralView ArticleGoogle Scholar
- Pauli S, Rothnie HM, Chen G, He X, Hohn T: The cauliflower mosaic virus 35S promoter extends into the transcribed region. J Virol. 2004, 78: 12120-12128. 10.1128/JVI.78.22.12120-12128.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Orozco BM, Ogren WL: Localization of light-inducible and tissue-specific regions of the spinach ribulose bisphosphate carboxylase/oxygenase (rubisco) activase promoter in transgenic tobacco plants. Plant Mol Biol. 1993, 23: 1129-1138. 10.1007/BF00042347.PubMedView ArticleGoogle Scholar
- Goldsbrough AP, Albrecht H, Stratford R: Salicylic acid-inducible binding of a tobacco nuclear protein to a 10 bp sequence which is highly conserved amongst stress-inducible genes. Plant J. 1993, 3: 563-571. 10.1046/j.1365-313X.1993.03040563.x.PubMedView ArticleGoogle Scholar
- Pastuglia M, Roby D, Dumas C, Cock JM: Rapid induction by wounding and bacterial infection of an S gene family receptor-like kinase gene in Brassica oleracea. Plant Cell. 1997, 9: 49-60. 10.1105/tpc.9.1.49.PubMedPubMed CentralView ArticleGoogle Scholar
- Haberer G, Hindemitt T, Meyers BC, Mayer KF: Transcriptional similarities, dissimilarities, and conservation of cis-elements in duplicated genes of Arabidopsis. Plant Physiol. 2004, 136: 3009-3022. 10.1104/pp.104.046466.PubMedPubMed CentralView ArticleGoogle Scholar
- Teakle GR, Manfield IW, Graham JF, Gilmartin PM: Arabidopsis thaliana GATA factors: organisation, expression and DNA-binding characteristics. Plant Mol Biol. 2002, 50: 43-57. 10.1023/A:1016062325584.PubMedView ArticleGoogle Scholar
- Schaffer R, Landgraf J, Accerbi M, Simon V, Larson M, Wisman E: Microarray analysis of diurnal and circadian-regulated genes in Arabidopsis. Plant Cell. 2001, 13: 113-123. 10.1105/tpc.13.1.113.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma L, Sun N, Liu X, Jiao Y, Zhao H, Deng XW: Organ-specific expression of Arabidopsis genome during development. Plant Physiol. 2005, 138: 80-91. 10.1104/pp.104.054783.PubMedPubMed CentralView ArticleGoogle Scholar
- Ryals JA, Neuenschwander UH, Willits MG, Molina A, Steiner HY, Hunt MD: Systemic Acquired Resistance. Plant Cell. 1996, 8: 1809-1819. 10.1105/tpc.8.10.1809.PubMedPubMed CentralView ArticleGoogle Scholar
- The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Arabidopsis genome in GenBank. [ftp://ftp.ncbi.nih.gov/genomes/Arabidopsis_thaliana]
- TAIR database. [ftp://ftp.arabidopsis.org/home/tair/home/tair/Sequences/]
- TIGR website. [ftp://ftp.tigr.org/pub/data/b_oleracea/wgs_seq/]
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMedPubMed CentralView ArticleGoogle Scholar
- The motified sputnik repeat-finder. [http://capb.dbi.udel.edu/main/tools.htm]
- Morgenstern B: DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics. 1999, 15: 211-218. 10.1093/bioinformatics/15.3.211.PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralView ArticleGoogle Scholar
- Koch M, Haubold B, Mitchell-Olds T: Molecular systematics of the Brassicaceae: Evidence from coding plastidic matK and nuclear Chs sequences. Am J Bot. 2001, 88: 534-544.PubMedView ArticleGoogle Scholar
- Meyers BC, Lee DK, Vu TH, Tej SS, Edberg SB, Matvienko M, Tindell LD: Arabidopsis MPSS: an online resource for quantitative expression analysis. Plant Physiol. 2004, 135: 801-813. 10.1104/pp.104.039495.PubMedPubMed CentralView ArticleGoogle Scholar
- Arabidopsis MPSS database. [http://mpss.udel.edu/at]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.