Identification of plant promoter constituents by analysis of local distribution of short sequences
© Yamamoto et al; licensee BioMed Central Ltd. 2007
Received: 21 November 2006
Accepted: 08 March 2007
Published: 08 March 2007
Plant promoter architecture is important for understanding regulation and evolution of the promoters, but our current knowledge about plant promoter structure, especially with respect to the core promoter, is insufficient. Several promoter elements including TATA box, and several types of transcriptional regulatory elements have been found to show local distribution within promoters, and this feature has been successfully utilized for extraction of promoter constituents from human genome.
LDSS (Local Distribution of Short Sequences) profiles of short sequences along the plant promoter have been analyzed in silico, and hundreds of hexamer and octamer sequences have been identified as having localized distributions within promoters of Arabidopsis thaliana and rice. Based on their localization patterns, the identified sequences could be classified into three groups, pyrimidine patch (Y Patch), TATA box, and REG (Regulatory Element Group). Sequences of the TATA box group are consistent with the ones reported in previous studies. The REG group includes more than 200 sequences, and half of them correspond to known cis-elements. The other REG subgroups, together with about a hundred uncategorized sequences, are suggested to be novel cis-regulatory elements. Comparison of LDSS-positive sequences between Arabidopsis and rice has revealed moderate conservation of elements and common promoter architecture. In addition, a dimer motif named the YR Rule (C/T A/G) has been identified at the transcription start site (-1/+1). This rule also fits both Arabidopsis and rice promoters.
LDSS was successfully applied to plant genomes and hundreds of putative promoter elements have been extracted as LDSS-positive octamers. Identified promoter architecture of monocot and dicot are well conserved, but there are moderate variations in the utilized sequences.
The determination of complete genome sequences has allowed analysis by various statistical methods that have furthered understanding of the function of genomes. Analysis of promoter structure is one of the most important issues. Understanding of promoter structure allows predictions concerning promoter positions and expression profiles, and sheds light on hidden transcriptional networks.
Several functional elements have been identified as promoter constituents for precise and regulated transcriptional initiation: TATA box, Initiator (Inr) motif, Downstream Promoter Element (DPE, found from drosophila), TFIIB-Recognition Element (BRE), and so-called cis-regulatory elements [1–3]. In addition, some mammalian promoters are associated with CpG islands [4, 5], which is related to the Sp1 recognition site  and have some relationship with gene regulation by DNA-methylation [3, 7]. Human transcriptional regulatory elements are reported to make clusters (modules) at the promoter region as well as the 3' end of a gene . Transcription start sites (TSS) in plant promoters have a CG-compositional strand bias, or GC-skew, where C is more frequently observed in the (+) strand than G [9, 10]. Some of these features are well understood and some are not, but all these features are useful to understand individual promoters. Some of the above features have been utilized for promoter prediction [11–13]. Although these studies obtain certain success, our current knowledge of promoters is still insufficient .
Availability of microarray data on co-regulated gene expression on a genomic scale has enabled the prediction of novel cis-elements involved in gene regulation. Several approaches have been developed for this detection of consensus sequences in a co-regulated promoter set (Gibbs Motif Sampling [14, 15], MEME ), and detection of over-represented sequence in co-regulated promoters with a set of reference sequences [17, 18]. These approaches are also applicable to chromatin immunoprecipitation (ChIP) data [19, 20]. In addition, identification of conserved promoter sequences by comparative genomics supports the prediction of regulatory elements [21–24].
Studies on plant transcription factors and functional cis-regulatory elements have been summarized in several databases, and the collective information of cis-elements and/or transfactor-binding DNA sequences are utilized for interpretation of plant promoters (PLACE: , AGRIS: , AthaMap: [27, 28]). Basis of these databases are published articles reporting analyses of individual promoters or transfactors, rather than large scale genomic analyses. Therefore, lack of large scale functional analyses of transcription factors in plant science is reflected in these databases as well.
In contrast to the above fact-based approaches, in silico prediction of plant promoter elements by survey of the Arabidopsis genome is also reported. Molina and Grotewold applied the MEME and Gibbs sampling methods to Arabidopsis core promoter regions with genomic scale, and detected several motifs including a plant TATA motif and microsatellites .
Recent studies on mammalian promoter elements have revealed that some of them have localized appearance along the promoter region, exemplified by the TATA box , and binding sites for NRF-1, Sp1, CREB, ATF, and E2F . These studies evoke the idea that localized distribution is a signature of a functional element of the promoter. Recently, this feature was successfully utilized for extraction of functional sequences from human promoters . Large-scale deletion analysis of human promoters suggested that there is some relationship between presence of functional elements and distance from TSS .
In this report, we have detected hundreds of short sequences showing localized distribution in plant promoters by comprehensive analyses of short sequences. The extracted sequences are mentioned as "LDSS (Local Distribution of Short Sequence)-positive" in this work. These sequences includes TATA boxes, various regulatory sequences identified in previous studies, a novel sequence group that would be a general component of a core promoter, and also many novel sequences that share many characteristics with regulatory sequences. Our analyses have also revealed conservation of the promoter architecture between monocot and dicot plants.
Patterns of distribution of peaks
Typically, DNA elements recognized by a protein (complex) is within the range of 5 to 15 bp long . Within this range, we decided to analyze localization patterns of hexamer and octamer sequences. Our results suggest that sequences longer than 9 bps would not provide enough number of appearance to survive statistical analysis.
One example of a LDSS-positive sequence, (Fig. 1, CTCTTC) has a peak of appearance at the TSS. Its complementary sequence (Fig. 1, GAAGAG) has a distinct distribution profile, showing that its appearance is sensitive to the direction of transcription. Although hexamers with this type of distribution profile tend to have only C and T in the sequence (see later), there seems to be weak sequence preference, and not all the sequences filled with C and T show a peak-positive distribution (Fig. 1, CCTTTT is a peak-negative example).
A second example (Fig. 1, CTATAA) is a TATA box-related sequence. This has a peak around -35 bp, and the peak is very sharp. The complementary sequence showed a different pattern with no peak (Fig. 1, TTATAG).
A third example (Fig. 1, TGGGCC) has a relatively wide and low peak. Complementary sequence of this sequence shows the same peak (Fig. 1, GGCCCA). Peak position and direction-insensitivity suggest that sequences with this type of distribution profile are so-called cis-regulatory sequences involved in transcriptional regulation . In fact, TGGGCC in Figure 1 is reported to be necessary for meristematic expression in Arabidopsis, and mutation to TGAA CC abolished the expression (Element II of Arabidopsis PCNA-2, ). Interestingly, distribution of the mutated sequence does not have any peaks (Fig. 1, TGAACC), demonstrating a good correlation between functionality and peak distribution. In addition, one base substitution, TGA GCC, also caused the loss of the peak (Fig. 1). It is common that one base substitution drastically changes the distribution profile (data not shown).
As controls, a set of random genomic sequences of 1 kb length was used for the distribution analysis instead of the promoter database. When sequences with distribution patterns of peak-positive sequences were applied to this analysis, they were found to have no peaks in the random genome fragments (Fig. 1, CTCTTC/random genome, CTATAA/random genome, TGGGCC/random genome).
Beside LDSS-positive elements, there are many LDSS-negative sequences. Among them, frequently observed sequences beyond the theoretical occurrence rate (0.24 per a 1 kb region) are rich in AT and might promote promoter context, and rare sequences are rich in GC and they might disturb promoter function when located within the promoter region. Therefore, it might be possible to utilize these LDSS-negative sequences as well for evaluation of promoter context.
Parameters for peak evaluation
Figure 2B shows the relationship between peak position and a parameter of peak strength. As shown, all the strong peaks locate downstream of -200 bp while weak peaks are scattered throughout the promoters. One important point of the figure is the continuous distribution of hexamers across the vertical axis. The continuous nature was also observed when RPH or RPA was represented in the graph on the vertical axis (data not shown). These results mean that there is no clear way to separate peaky and flat groups. In this study, we took a strategy to list sequences with strong peaks, leaving out a flat group and a group with ambiguous peaks.
Peak-positive hexamers can be classified according to their peak position
Y Patch and TATA Box identified from Arabidopsis hexamer analysis
Peak position1 (bp)
Peak width2 (bp)
Relative Peak Height (RPH)
Relative Peak Area (RPA)
The second group contains TATA box-related sequences. An example is shown as CTATAA in Figure 1. The characteristics of this group are high peak height, narrow peak width, and stringent peak position (Table 1, TATA Box). Similar to Y Patch, the TATA box group sequences are also found in the majority of Arabidopsis promoters, although promoters with the TATA Box within the peak are is about 1,000 or less for each sequence.
REGs identified from Arabidopsis hexamer analysis
Peak position (bp)
Peak width (bp)
Relative Peak Height (RPH)
Relative Peak Area (RPA)
In addition to these three groups, there is also small number of exceptional hexamers with peak positions in the core promoter (-13 to -60). They might constitute a minor type(s) within the core promoter (Table S1, "others" [see Additional file 1]. See also Table S2 and S3 for these elements). The complete list of the extracted sequences is shown in Table S1. The table shows 103 Y Patch, 39 TATA-related, 38 REG, and 22 unclassified hexamer sequences.
Directional preference relative to transcription
Comparison of Arabidopsis and rice promoters
Subsequently, we analyzed the distribution of octamer sequences. The average of octamer appearance rates is 15.7-fold less than the one of hexamers, consistent with a mathematical expectation of 16-fold difference (data not shown). Because rare sequences tend to show more fluctuations by chance, statistical evaluation was more critical for octamer analysis. We prepared random distribution populations and used them for statistical evaluation of each octamer (Figure S1 [see Additional file 2]). In this study, we have set a p value of 1 × 10-5 as a threshold. In addition, data of the complementary sequences was merged only for REG detection to increase total count of an octamer in the database. Through the octamer analyses, we have identified 350 and 418 LDSS-positive core elements (Table S2 [see Additional file 3] and S3 [see Additional file 4]), and 308 and 242 REG sequences from Arabidopsis and rice, respectively (Table S4 [see Additional file 5] and S5 [see Additional file 6]). Sum of the p values for all the extracted octamers of individual species were around 1 × 10-3 each, so false-positive sequences by pure random distribution are not likely to be included in the lists.
Classification of Arabidopsis LDSS-positive octamers by distribution profiles
Clustering of Arabidopsis REGs based on presence and absence in promoters
Subsequently, we did classification of 308 Arabidopsis REGs with the aid of the promoter database. For each promoter, number of appearance for each REG was scored, and two-dimensional REG-promoter clustering was performed. This REG-promoter association has revealed that 10,334 out of 12,951 Arabidopsis promoters have at least one REG at the region of -400 to -40 bp. This high coverage (80%) is due to the long list of REG sequences.
Classification of octamer REGs
At & Rice2
Element II of Arabidopsis PCNA-2, Site IIa of rice PCNA
PCF1, PCF2, TCP20
cell cycle/meristematic expression
"ACGT Core", G-box, ABRE,
bZIP family (GBF, TGA1, etc.), PIF3
environmental response (light, UV, drought, ABA)
overlapping with GT1 box (TTAACC)
Several REG groups were identified from Arabidopsis and rice octamer analysis
Another group shown in the table has the bZIP protein-binding motif containing ACGT core sequence. This group mediates various environmental signals . Both species have this group in common, but Arabidopsis has wider variations than rice (Table 4).
Classification of Arabidopsis and rice REGs are shown in Table 3. The largest group is the Group 1, which includes Element II of the Arabidopsis PCNA-2 involved in cell cycle-related expression, as mentioned above. As shown in the table, this group is well conserved between Arabidopsis and rice and has many members for both species. There are several other REG groups, some of which are rich in only Arabidopsis and some are found from both (several examples in Table 4 and summarized in Table 3). Comparison between Arabidopsis and rice suggests both conserved and differentiated types of REGs.
PLACE cis-elements found and not found in Arabidopsis REGs
ACGTATERD1 ACGT sequence required for etiolation-induced expression of erd1 (early responsive to dehydration) in Arabidopsis;
ABRELATERD1 ABRE-like sequence (from -199 to -195) required for etiolation-induced expression of erd1 (early responsive to dehydration) in Arabidopsis;
LTRECOREATCOR15 Core of low temperature responsive element (LTRE) of cor15a gene in Arabidopsis;
SORLIP1AT one of "Sequences Over-Represented in Light-Induced Promoters (SORLIPs) in Arabidopsis; Computationally identified phyA-induced motifs;
SORLIP2AT one of "Sequences Over-Represented in Light-Induced Promoters (SORLIPs) in Arabidopsis; Computationally identified phyA-induced motifs;
WBOXATNPR1 "W-box" found in promoter of Arabidopsis NPR1 gene; They were recognized specifically by salicylic acid (SA)-induced WRKY DNA binding proteins;
CACGTGMOTIF "CACGTG motif"; "G-box; Binding site of Arabidopsis GBF4;
MYB2CONSENSUSAT MYB recognition site found in the promoters of the dehydration-responsive gene rd22 and many other genes in Arabidopsis; Y = C/T; K = G/T;
MYBCORE Binding site for all animal MYB and at least two plant MYB proteins ATMYB1 and ATMYB2, both isolated from Arabidopsis; ATMYB2 is involved in regulation of genes that are responsive to water stress in Arabidopsis;
SITEIIATCYTC "Site II element" found in the promoter regions of cytochrome genes (Cytc-1, Cytc-2) in Arabidopsis; Y = C/T;
ACGTABREMOTIFA2OSEM Experimentally determined sequence requirement of ACGT-core of motif A in ABRE of the rice gene, OSEM; DRE and ABRE are interdependent in the ABA-responsive expression of the rd29A in Arabidopsis; K = G/T;
DPBFCOREDCDC3 A novel class of bZIP transcription factors, DPBF-1 and 2 (Dc3 promoter-binding factor-1 and 2) binding core sequence; Found in the carrot Dc3 gene promoter; Dc3 expression is normally embryo-specific, and also can be induced by ABA; The Arabidopsis abscisic acid response gene ABI5 encodes a bZIP transcription factor; abi5 mutant have a pleiotropic defects in ABA response; ABI5 regulates a subset of late embryogenesis-abundant genes; GIA1 (growth-insensitivity to ABA) is identical to ABI5;
GADOWNAT Sequence present in 24 genes in the GA-down regulated d1 cluster found in Arabidopsis seed germination;
WUSATAg Target sequence of WUS in the intron of AGAMOUS gene in Arabidopsis;
CDA1ATCAB2 CDA-1 (CAB2 DET1-associated factor 1) binding site in DtRE (dark response element) f of chlorophyll a/b-binding protein2 (CAB2) gene in Arabidopsis;
EMBP1TAEM Binding site of trans-acting factor EMBP-1; wheat Em gene; Binding site of ABFs; ABFs (ABRE binding factors) were isolated from Arabidopsis by a yeast one-hybrid screening system; Involved in ABA-mediated stress-signaling pathway;
HEXAT "Hex motif" ; Binding site of Arabidopsis bZIP protein TGA1 and G box binding factor GBF1; G-Box-like element;
UPRMOTIFIAT "Motif I" in the conserved UPR (unfolded protein response) cis-acting element in Arabidopsis genes coding for SAR1B, HSP-90, SBR-like, Ca-ATPase 4, CNX1, PDI, etc.;
RAV1AAT Binding consensus sequence of Arabidopsis transcription factor, RAV1; The expression level of RAV1 were relatively high in rosette leaves and roots;
DRECRTCOREAT Core motif of DRE/CRT (dehydration-responsive element/C-repeat) cis-acting element found in many genes in Arabidopsis and in rice; R = G/A;
ELRECOREPCRP1 ElRE (Elicitor Responsive Element) core of parsley (P.c.) PR1 genes; consensus sequence of elements W1 and W2 of parsley PR1-1 and PR1-2 promoters; Box W1 and W2 are the binding site of WRKY1 and WRKY2, respectively; W-box found in thioredoxin h5 gene in Arabidopsis (Laloi et al.);
ARR1AT"ARR1-binding element" found in Arabidopsis; ARR1 is a response regulator; N = G/A/C/T;
ARFAT ARF (auxin response factor) binding site found in the promoters of primary/early auxin response genes of Arabidopsis; AuxRE; Binding site of Arabidopsis ARF1 (Auxin response factor1);
HEXAMERATH4 hexamer motif of Arabidopsis histone H4 promoter;
IBOX "I box"; "I-box"; Conserved sequence upstream of light-regulated genes; Sequence found in the promoter region of rbcS of tomato and Arabidopsis;
MYB1AT MYB recognition site found in the promoters of the dehydration-responsive gene rd22 and many other genes in Arabidopsis; W = A/T;
MYB2AT Binding site for ATMYB2, an Arabidopsis MYB homolog; ATMYB2 is involved in regulation of genes that are responsive to water stress in Arabidopsis;
MYCATERD1 MYC recognition sequence necessary for expression of erd1 (early responsive to dehydration) in dehydrated Arabidopsis; NAC protein bound specifically to the CATGTG motif (Tran et al., 2004);
MYCATRD22 Binding site for MYC (rd22BP1) in Arabidopsis dehydration-responsive gene, rd22; MYC binding site in rd22 gene of Arabidopsis; ABA-induction;
PREATPRODH "PRE (Pro- or hypoosmolarity-responsive element) found in the promoter region of proline dehydrogenase (ProDH) gene in Arabidopsis;
RAV1BAT Binding consensus sequence of an Arabidopsis transcription factor, RAV1; The expression level of RAV1 were relatively high in rosette leaves and roots;
SREATMSD "sugar-repressive element (SRE)" found in 272 of the 1592 down-regulated genes after main stem decapitation in Arabidopsis;
TBOXATGAPB "Tbox" found in the Arabidopsis GAPB gene promoter; Mutations in the "Tbox" resulted in reductions of light-activated gene transcription;
AGCBOXNPGLB "AGC box" repeated twice in a 61 bp enhancer element in tobacco (N.p.) class I beta-1,3-glucanase (GLB) gene; "GCC-box"; Binding sequence of Arabidopsis AtERFs;
GAREAT GARE (GA-responsive element); Occurrence of GARE in GA-inducible, GA-responsible, and GA-nonresponsive genes found in Arabidopsis seed germination was 20, 18, and 12%, respectively;
LEAFYATAG Target sequence of LEAFY in the intron of AGAMOUS gene in Arabidopsis;
LTREATLTI78 Putative low temperature responsive element (LTRE); Found in Arabidopsis low-temperature-induced (lti) genes, lti78/cor78/rd29A and lti65;
MYBATRD22 Binding site for MYB (ATMYB2) in dehydration-responsive gene, rd22; MYB binding site in rd22 gene of Arabidopsis thaliana; ABA-induction;
SORLIP5AT one of "Sequences Over-Represented in Light-Induced Promoters (SORLIPs) in Arabidopsis; Computationally identified phyA-induced motifs;
ABREZMRAB28 ABRE; ABA and water-stress responses; Found in maize (Z.m.) rab28; maize rab28 is ABA-inducible in embryos and vegetative tissues; Found in the Arabidopsis alcohol dehydrogenase (Adh) gene promoter;
CCA1ATLHCB1 CCA1 binding site; CCA1 protein (myb-related transcription factor) interact with two imperfect repeats of AAMAATCT in Lhcb1*3 gene of Arabidopsis ; Related to regulation by phytochrome;
E2FANTRNR "E2Fa element" found in the tobacco RNR (Ribonucleotide reductase) gene promoter and in the Arabidopsis CDC6 gene promoter; Binding site of tobacco and Arabidopsis E2F; Involved in upregulation of the promoter at G1/S transition;
L1BOXATPDF1 "L1 box" found in promoter of Arabidopsis PROTODERMAL FACTOR1 (PDF1) gene; Y = C/T;
OCTAMERMOTIFTAH3H4 "Octamer motif" found in promoter of wheat histone genes H3 and H4, and corn histone genes H3 and H4; Arabidopsis histone H4; "histone-specific octamer";
PIATGAPB "PI" found in the Arabidopsis GAPB gene promoter; Mutations in the "PI" resulted in reductions of light-activated gene transcription;
RYREPEATVFLEB4 "RY repeat motif"; quantitative seed expression; Gene: Vicia faba LeB4; Soybean glycinin (Gy2); other dicot and monocot seed protein genes; Binding site of Arabidopsis B3-domain-containing transcription factor FUS3;
UP2ATMSD "Up2" motif found in 193 of the 1184 up-regulated genes after main stem decapitation in Arabidopsis; W = A/T;
ZDNAFORMINGATCAB1 "Z-DNA-forming sequence" found in the Arabidopsis chlorophyll a/b binding protein gene (cab1) promoter; Involved in light-dependent developmental expression of the gene; "Z-box";
Characterization of transcription start site
An example of Arabidopsis promoter
Tight positioning of the TATA boxes relative to the TSS fits with the general idea that the TATA boxes determine the position of the TSS. In addition, the YR Rule of Arabidopsis would be another important determinant as well. The Y Patches locate between the TATA boxes and the TSS, but they can be upstream of the TATA boxes, considering the wide distribution profiles (Figure 5). The role of the Y Patch is not known. The above three elements are orientation-sensitive, and constituents of a core promoter. REGs appear upstream of the TATA box, and they exist in an orientation-insensitive manner. Rice promoters share the above characteristics, showing architectural conservation between dicots and monocots.
An example of an Arabidopsis promoter that has the Y Patch and TATA box is shown in Figure 9B. Octamer analysis of the promoter revealed one cluster of Group 2 REGs (Table 3), one cluster of Y Patches, one cluster of TATA box, and YR Rule. An interesting feature of the figure is the multiple hits of a locus, detecting a longer element. This demonstrates that octamer analysis can detect long functional units as clusters of octamers.
Characteristics of LDSS analysis
In this study, we have identified hundreds of novel sequences solely based on local distribution in the promoter region of Arabidopsis and rice. Biological information, such as microarray data, was not used at all for sequence extraction, and it becomes useful only during interpretation of the extracted sequence. This method is equally sensitive in detection of major and minor motifs in a promoter population as demonstrated by simultaneous detection of major TATA elements and minor REG elements. This feature is an advantage of the LDSS method over other methods of detection of consensus sequences among promoter populations, such as Gibbs Sampling method. We successfully applied the LDSS method to Arabidopsis and rice promoters, and of course, it is applicable to bacterial and mammalian research as well.
The observed localized distribution is a direct result of the selection pressure. While the localization is an indication of a beneficial role for the organism, the relationship between local distribution of a sequence and its functionality is indirect. Therefore, the question arises if all regulatory elements can be picked up by the LDSS strategy.
When we compared REG sequences with established cis-elements in the PLACE database, it was found that 27 out of 48 Arabidopsis PLACE entries are absent in the extracted REGs (Table 5). These results indicate that not all of the functional elements are LDSS-positive, and thus some would not be detected by this method. There are two possibilities for the presence of cis-elements that do not show local distribution. One possibility is that these elements are relatively "new" so there has not been selection pressure for a long enough period. Another possibility is that there has not been any selection pressure because of functional differences from the LDSS-positive elements. The latter idea suggests localization-insensitive classes of regulatory elements that are distinct from REGs. So called long range-regulators [43, 44] might be one of the classes.
Generally, any functional sequences in the genome are recognized by trans-acting factors that are DNA-binding proteins. Promoter elements and their trans- factors have a relationship of co-evolution. Therefore, differentiation of REGs in the two species would reflect a different status of the corresponding trans-factors. Functional comparison of DNA-binding proteins of Arabidopsis and rice is expected to give some answers as to why these two species have differentiated REG sequences. As for the conserved REGs, it is reasonable that cell cycle-related elements (Group 1, Table 3) comprise the most conserved group, because the cell cycle is one of the most conserved activities in organisms.
REG sequences can be extracted form mammalian promoters as well. However, our preliminary analyses suggest that the LDSS method can detect much less REGs than of plants (YYY and JO, unpublished results). This may be reflected by different promoter architecture between plants and animals.
The discovery that the Y Patch is conserved in monocots and dicots is one of the major achievements of this study. A related motif is reported by Molina and Grotewold from Arabidopsis core promoter analysis using the Gibbs-sampling method (Motif 1 with a typical sequence, TTCTTCTTC, ). The biochemical role of Y Patch is not known, but its position, direction sensitivity, and its abundant nature strongly suggest that it is a general component of the core promoter. Our LDSS analyses suggest that human and mouse do not share this element with plants and thus this is a plant-specific core element (YYY and JO, unpublished results).
At the TSS, the Initiator (Inr) motif (Y Y A N T/A Y Y, TSS is underlined) is known as a recognition site by TFIID . Following their rules, the YR Rule can be considered as a less stringent form of Inr. According to this point of view, the YR Rule might be recognized by TFIID. The high coverage of the YR Rule is a useful feature for prediction of TSS. Recently, Carninci et al., have reported the same rule is applicable to mouse and human promoters as well , revealing conservation of YR Rule between plants and mammals.
This rule is not an artifact by the Cap-Trapper method that is the basis of TSS mapping of this study and mammalian studies mentioned above , because it is applicable to human TSS determined by another method (Oligo-Cap method, ) as well (YYY and JO, unpublished results).
A plant consensus around TSS (A/T n T/a C/t A/c a/t, TSS is underlined) is reported by Shahmuradov et al based on 217 dicot promoters (actual consensus is expressed by a matrix, ). This consensus also largely overlaps with YR Rule.
The TFIIB-Recognition Element (BRE) is another core promoter element of animal genes. It is located just upstream of the TATA box and has a GC-rich sequence, (G/C)(G/C)(G/C)CGCC [1, 48]. Our analysis did not detect the BRE as a LDSS-positive element, although CC is preferred at the neighboring sequence of the TATA box at the upstream side in both Arabidopsis and rice promoters (Table S2 [see Additional file 3] and S3 [see Additional file 4]).
LDSS analysis provides useful information toward precise promoter prediction
The hundreds of octamer sequences identified by the LDSS analysis can be used for promoter prediction. The presence of the TATA box is an important feature of a promoter, but there are many false-positives in the genome. For example, a TATA octamer sequence with the highest specific localization is found within the peak area 30% of times in the promoter region, meaning that 70% are found outside of the peak area. This is essentially consistent with a previous study, where more than 200,000 putative TBP-binding sites were detected from the Arabidopsis genome . Utilization of preferential sequence around the TATA box, and coexistence with the Y Patch and REG are expected to elevate accuracy of prediction. Although such a combinational approach is incorporated into several promoter prediction programs , motifs to be detected have been limited so far. Our long list of the LDSS-positive octamers is expected to serve as a thick dictionary for precise interpretation of plant genomes.
In this report, we showed that LDSS can be applied to plant genomes. We have successfully extracted hundreds of promoter elements as LDSS-positive octamers. All the observed behaviors of the isolated elements suggest functionality of these elements. Promoter architectures of monocot and dicot revealed in this study are well conserved, but there are moderate variations in the utilized sequences.
Preparation of promoter databases
Cap-Trapper  is one of the most reliable methods for identification of the 5' end of mRNA and thus suitable for determination of TSS. So-called full-length (fl) cDNAs of Arabidopsis and rice were made by the Cap-Trapper method, and around ten to twenty thousand of non-redundant fl-cDNA clones for each species have been completely sequenced [50, 51]. Therefore, we decided to use the information from the fl-cDNAs for positioning of promoters. Genome sequences of promoter regions from -1,000 to -1 bp were prepared with the aid of information of the 5' ends of fl-cDNAs of Arabidopsis [50, 52] and rice . The established Arabidopsis promoter database [50, 53] and a rice database with 11,370 promoters, prepared in this study, were utilized for our analysis.
Positions of rice fl-cDNA clones of rice  were mapped on to corresponding BAC clones according to description of "MappingData.txt" obtained from the KOME web site , and promoter regions from -1 kb to +200 bp relative to the TSS, that are 1.2 kbp long, were collected. BAC and fl-cDNA sequences were obtained from DDBJ. Special care was taken for 5' end of fl-cDNA sequences, and ones with less than 2 bp mismatch with the corresponding genomic sequences were used for the promoter mapping. Sequences of non-redundant 11,370 rice promoters have been prepared. For analyses of the TSS region, as shown in Figure 6, rice fl-cDNA sequences with no mismatch to the 5' end (6,209 promoters) were used. Establishment of the Arabidopsis promoter database is described elsewhere [50, 53]. Earlier analyses with Arabidopsis hexamers have been done using the distributed database containing 15,607 promoters. This database is based on distinct TSS and allows multiple promoters belonging to a single gene. A smaller set of 12,951 promoters was re-selected from the 15,607-version so as to pick-up one promoter from one gene, and used for octamer analyses. For preparation of random genomic fragments, non-overlapping Arabidopsis BAC clones were selected by consulting a TAIR web site , they were successively cut into 1 kb pieces and serial numbers were given to the fragments. Sequences corresponding to 3,000 randomly chosen numbers based on the Mersenne Twister method  were used as random genomic fragments of 1 kb length.
The programs used in this study will be freely provided upon request for non-profit purposes. A searchable web site to obtain results in this work will be released.
Generation of random distribution
Random distribution samples were generated with respect to Total Area, that is indication of total count in a promoter database. For each Total Area, 1,000 samples were prepared, and their RPA values were subjected to statistical analysis. Average and standard deviation are functions of Total Area (Figure S1 [Additional File 2]) and affected by a smoothing window. Model RPA populations of random distribution were calculated as the following equations:
REG detection (smoothing with a 21-bin (width of window), and Total Area < 2,000): log10(average) = -0.1861Ln(Total Area) – 0.5329, SD = 0.17 CORE detection (smoothing with a 3-bin, and Total Area < 10,000): log10(average) = -0.1784Ln(Total Area) – 0.8026, SD = 0.13
These models were utilized for estimation of p value for each octamer distribution.
Sequence analysis was achieved by a combination of home-made Perl and C++ programs and also Excel software (Microsoft Japan, Tokyo). The first step of the analysis was the preparation of index files for each promoter with all the possible 4,096 hexamer and 65,536 octamer sequences. Information of the index files was then rearranged for each hexamer and octamer sequence, and the occurrence of the short sequences was summarized according to the promoter position. Summarized distribution data of each hexamer was then subjected to smoothing with a bin of 15 bp. Generally, smoothing with a wide bin lowers the peak height of a sharp peak, and with a narrow bin capturing a wide and low peak is not always possible. Considering these tendencies, a bin of 21 bp was used for identification of octamer REGs, and a bin of 3 bp was used for octamer core elements. Octamer REGs were extracted after merging the distribution data of the complementary sequence to increase the count of occurrence. As for extraction of octamer Core elements that is orientation-sensitive, merging was avoided. Positions of octamers and hexamers were counted from the first base of the sequence. For example, the position of a hexamer sequence that locates from -6 to -1 is expressed as -6. Positions of average values for line smoothing are indicated at the centre of the region. Therefore, positions closest to TSS vary depending on the bin length as well.
Thresholds for distribution of peaks are as follows:
Hexamer: (peak height/Base Line > 3) & (peak height/SD > 5) & (Peak Area/basal fluctuation) > 5),
Octamer Core: (p value < 10-4) & (peak height/Base Line > 5) & (peak height/SD > 10) & (Peak Area/basal fluctuation > 6) & (peak position > -51),
Octamer REG: (p values < 10-4) & (peak height/Base Line > 3) & (Peak Area/total area > 0.1) & (peak height/SD > 5) & (Peak Area/basal fluctuation > 6) & (peak position <-50).
Fitting the distribution data with the Gaussian curve was achieved using Igor Pro (Hulinks, Tokyo). All the LDSS-positive octamers together with above parameters can be viewed at our web site ().
Clustering analyses were achieved with Cluster  and visualized with TreeView . For clustering of LDSS-positive elements based on distribution profiles, peak value of each profile was adjusted to 5.0. For REG-promoter clustering, number of each REG appeared at a region between -400 to -40 bp was scored for each promoter and a REG-promoter table was prepared. Among the Cluster options, the hierarchical clustering method (centroid linkage) gave the most natural results over the k-means and SOM methods.
Among the PLACE database , 48 entries with definition sequences of 8 bases or less and also with description containing "Arabidopsis" were subjected to REG survey.
Local Distribution of Short Sequences
transcription start site
This work was supported in part by KAKENHI (Grant-in-Aid for Scientific Research) on Priority Areas "Comparative Genomics" from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to Y.Y.Y. and J.O.).
- Carey M, Smale ST: Concepts and strategies: I. promoter and the general transcription machinery. Transcriptional regulation in eukaryotes. 2001, New York , Cold Spring Harbor Laboratory PressGoogle Scholar
- Butler JE, Kadonaga JT: The RNA polymerase II core promoter: a key component in the regulation of gene expression. Genes Dev. 2002, 16 (20): 2583-2592. 10.1101/gad.1026202.PubMedView ArticleGoogle Scholar
- Smale ST, Kadonaga JT: The RNA polymerase II core promoter. Annu Rev Biochem. 2003, 72: 449-479. 10.1146/annurev.biochem.72.121801.161520.PubMedView ArticleGoogle Scholar
- Antequera F, Bird A: Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci USA. 1993, 90 (24): 11995-11999. 10.1073/pnas.90.24.11995.PubMed CentralPubMedView ArticleGoogle Scholar
- Ioshikhes IP, Zhang MQ: Large-scale human promoter mapping using CpG islands. Nat Genet. 2000, 26 (1): 61-63. 10.1038/79189.PubMedView ArticleGoogle Scholar
- Kriwacki RW, Schultz SC, Steitz TA, Caradonna JP: Sequence-specific recognition of DNA by zinc-finger peptides derived from the transcription factor Sp1. Proc Natl Acad Sci USA. 1992, 89 (20): 9759-9763. 10.1073/pnas.89.20.9759.PubMed CentralPubMedView ArticleGoogle Scholar
- Bird A: DNA methylation patterns and epigenetic memory. Genes Dev. 2002, 16 (1): 6-21. 10.1101/gad.947102.PubMedView ArticleGoogle Scholar
- Blanchette M, Bataille AR, Chen X, Poitras C, Laganiere J, Lefebvre C, Deblois G, Giguere V, Ferretti V, Bergeron D, Coulombe B, Robert F: Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res. 2006, 16 (5): 656-668. 10.1101/gr.4866006.PubMed CentralPubMedView ArticleGoogle Scholar
- Tatarinova T, Brover V, Troukhan M, Alexandrov N: Skew in CG content near the transcription start site in Arabidopsis thaliana. Bioinformatics. 2003, 19 Suppl 1: i313-4. 10.1093/bioinformatics/btg1043.PubMedView ArticleGoogle Scholar
- Fujimori S, Washio T, Tomita M: GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics. 2005, 6 (1): 26-10.1186/1471-2164-6-26.PubMed CentralPubMedView ArticleGoogle Scholar
- Bajic VB, Brent MR, Brown RH, Frankish A, Harrow J, Ohler U, Solovyev VV, Tan SL: Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biol. 2006, 7 Suppl 1: S3 1-13.Google Scholar
- Sonnenburg S, Zien A, Ratsch G: ARTS: accurate recognition of transcription starts in human. Bioinformatics. 2006, 22 (14): e472-80. 10.1093/bioinformatics/btl250.PubMedView ArticleGoogle Scholar
- Bajic VB, Tan SL, Suzuki Y, Sugano S: Promoter prediction analysis on the whole human genome. Nat Biotechnol. 2004, 22: 1467-1473. 10.1038/nbt1032.PubMedView ArticleGoogle Scholar
- Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993, 262 (5131): 208-214. 10.1126/science.8211139.PubMedView ArticleGoogle Scholar
- Roth FP, Hughes JD, Estep PW, Church GM: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat Biotechnol. 1998, 16 (10): 939-945. 10.1038/nbt1098-939.PubMedView ArticleGoogle Scholar
- Bailey TL, Elkan C: The value of prior knowledge in discovering motifs with MEME. Proc Int Conf Intell Syst Mol Biol. 1995, 3: 21-29.PubMedGoogle Scholar
- van Helden J, Andre B, Collado-Vides J: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol. 1998, 281 (5): 827-842. 10.1006/jmbi.1998.1947.PubMedView ArticleGoogle Scholar
- Hughes JD, Estep PW, Tavazoie S, Church GM: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol. 2000, 296 (5): 1205-1214. 10.1006/jmbi.2000.3519.PubMedView ArticleGoogle Scholar
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, Volkert TL, Wilson CJ, Bell SP, Young RA: Genome-wide location and function of DNA binding proteins. Science. 2000, 290 (5500): 2306-2309. 10.1126/science.290.5500.2306.PubMedView ArticleGoogle Scholar
- Lieb JD, Liu X, Botstein D, Brown PO: Promoter-specific binding of Rap1 revealed by genome-wide maps of protein-DNA association. Nat Genet. 2001, 28 (4): 327-334. 10.1038/ng569.PubMedView ArticleGoogle Scholar
- Manson McGuire A, Church GM: Predicting regulons and their cis-regulatory motifs by comparative genomics. Nucleic Acids Res. 2000, 28 (22): 4523-4530. 10.1093/nar/28.22.4523.PubMedView ArticleGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423 (6937): 241-254. 10.1038/nature01644.PubMedView ArticleGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431 (7004): 99-104. 10.1038/nature02800.PubMed CentralPubMedView ArticleGoogle Scholar
- Prakash A, Tompa M: Discovery of regulatory elements in vertebrates through comparative genomics. Nat Biotechnol. 2005, 23 (10): 1249-1256. 10.1038/nbt1140.PubMedView ArticleGoogle Scholar
- Higo K, Ugawa Y, Iwamoto M, Korenaga T: Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 1999, 27 (1): 297-300. 10.1093/nar/27.1.297.PubMed CentralPubMedView ArticleGoogle Scholar
- Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E: AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics. 2003, 4: 25-10.1186/1471-2105-4-25.PubMed CentralPubMedView ArticleGoogle Scholar
- Steffens NO, Galuschka C, Schindler M, Bulow L, Hehl R: AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome. Nucleic Acids Res. 2004, 32 (Database issue): D368-72. 10.1093/nar/gkh017.PubMed CentralPubMedView ArticleGoogle Scholar
- Bülow L, Steffens NO, Galuschka C, Shindler M, Hehl R: AthaMap: from in silico data to real transcription factor binding sites. In Silico Biol. 2006, 6: 23-Google Scholar
- Molina C, Grotewold E: Genome wide analysis of Arabidopsis core promoters. BMC Genomics. 2005, 6 (1): 25-10.1186/1471-2164-6-25.PubMed CentralPubMedView ArticleGoogle Scholar
- Ohler U, Liao GC, Niemann H, Rubin GM: Computational analysis of core promoters in the Drosophila genome. Genome Biol. 2002, 3 (12): RESEARCH0087-10.1186/gb-2002-3-12-research0087.PubMed CentralPubMedView ArticleGoogle Scholar
- Elkon R, Linhart C, Sharan R, Shamir R, Shiloh Y: Genome-wide in silico identification of transcriptional regulators controlling the cell cycle in human cells. Genome Res. 2003, 13 (5): 773-780. 10.1101/gr.947203.PubMed CentralPubMedView ArticleGoogle Scholar
- FitzGerald PC, Shlyakhtenko A, Mir AA, Vinson C: Clustering of DNA sequences in human promoters. Genome Res. 2004, 14 (8): 1562-1574. 10.1101/gr.1953904.PubMed CentralPubMedView ArticleGoogle Scholar
- Cooper SJ, Trinklein ND, Anton ED, Nguyen L, Myers RM: Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome. Genome Res. 2006, 16 (1): 1-10. 10.1101/gr.4222606.PubMed CentralPubMedView ArticleGoogle Scholar
- Fickett JW, Hatzigeorgiou AG: Eukaryotic promoter recognition. Genome Res. 1997, 7 (9): 861-878.PubMedGoogle Scholar
- Trémousaygue D, Garnier L, Bardet C, Dabos P, Hervé C, Lescure B: Internal telomeric repeats and 'TCP domain' protein-binding sites co-operate to regulate gene expression in Arabidopsis thaliana cycling cells. Plant J. 2003, 33 (6): 957-966. 10.1046/j.1365-313X.2003.01682.x.PubMedView ArticleGoogle Scholar
- Foster R, Izawa T, Chua NH: Plant bZIP proteins gather at ACGT elements. FASEB J. 1994, 8 (2): 192-200.PubMedGoogle Scholar
- AtGenExpress. [http://www.arabidopsis.org/info/expression/ATGenExpress.jsp]
- Bevan M, Walsh S: The Arabidopsis genome: a foundation for plant research. Genome Res. 2005, 15 (12): 1632-1642. 10.1101/gr.3723405.PubMedView ArticleGoogle Scholar
- Yamaguchi-Shinozaki K, Shinozaki K: Organization of cis-acting regulatory elements in osmotic- and cold-stress-responsive promoters. Trends Plant Sci. 2005, 10 (2): 88-94. 10.1016/j.tplants.2004.12.012.PubMedView ArticleGoogle Scholar
- PLACE. [http://www.dna.affrc.go.jp/PLACE/]
- AGRIS. [http://arabidopsis.med.ohio-state.edu]
- Nakamura M, Tsunoda T, Obokata J: Photosynthesis nuclear genes generally lack TATA-boxes: a tobacco photosystem I gene responds to light through an initiator. Plant J. 2002, 29 (1): 1-10. 10.1046/j.0960-7412.2001.01188.x.PubMedView ArticleGoogle Scholar
- Carter D, Chakalova L, Osborne CS, Dai YF, Fraser P: Long-range chromatin regulatory interactions in vivo. Nat Genet. 2002, 32 (4): 623-626. 10.1038/ng1051.PubMedView ArticleGoogle Scholar
- Lettice LA, Horikoshi T, Heaney SJ, van Baren MJ, van der Linde HC, Breedveld GJ, Joosse M, Akarsu N, Oostra BA, Endo N, Shibata M, Suzuki M, Takahashi E, Shinka T, Nakahori Y, Ayusawa D, Nakabayashi K, Scherer SW, Heutink P, Hill RE, Noji S: Disruption of a long-range cis-acting regulator for Shh causes preaxial polydactyly. Proc Natl Acad Sci USA. 2002, 99 (11): 7548-7553. 10.1073/pnas.112212199.PubMed CentralPubMedView ArticleGoogle Scholar
- Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y: Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006, 38 (6): 626-635. 10.1038/ng1789.PubMedView ArticleGoogle Scholar
- Yamashita R, Suzuki Y, Wakaguri H, Tsuritani K, Nakai K, Sugano S: DBTSS: DataBase of Human Transcription Start Sites, progress report 2006. Nucleic Acids Res. 2006, 34 (Database issue): D86-9. 10.1093/nar/gkj129.PubMed CentralPubMedView ArticleGoogle Scholar
- Shahmuradov IA, Gammerman AJ, Hancock JM, Bramley PM, Solovyev VV: PlantProm: a database of plant promoter sequences. Nucleic Acids Res. 2003, 31 (1): 114-117. 10.1093/nar/gkg041.PubMed CentralPubMedView ArticleGoogle Scholar
- Lagrange T, Kapanidis AN, Tang H, Reinberg D, Ebright RH: New core promoter element in RNA polymerase II-dependent transcription: sequence-specific DNA binding by transcription factor TFIIB. Genes Dev. 1998, 12: 34-44.PubMed CentralPubMedView ArticleGoogle Scholar
- Carninci P, Kvam C, Kitamura A, Ohsumi T, Okazaki Y, Itoh M, Kamiya M, Shibata K, Sasaki N, Izawa M, Muramatsu M, Hayashizaki Y, Schneider C: High-efficiency full-length cDNA cloning by biotinylated CAP trapper. Genomics. 1996, 37 (3): 327-336. 10.1006/geno.1996.0567.PubMedView ArticleGoogle Scholar
- Seki M, Narusaka M, Kamiya A, Ishida J, Satou M, Sakurai T, Nakajima M, Enju A, Akiyama K, Oono Y, Muramatsu M, Hayashizaki Y, Kawai J, Carninci P, Itoh M, Ishii Y, Arakawa T, Shibata K, Shinagawa A, Shinozaki K: Functional annotation of a full-length Arabidopsis cDNA collection. Science. 2002, 296 (5565): 141-145. 10.1126/science.1071006.PubMedView ArticleGoogle Scholar
- Kikuchi S, Satoh K, Nagata T, Kawagashira N, Doi K, Kishimoto N, Yazaki J, Ishikawa M, Yamada H, Ooka H, Hotta I, Kojima K, Namiki T, Ohneda E, Yahagi W, Suzuki K, Li CJ, Ohtsuki K, Shishiki T, Otomo Y, Murakami K, Iida Y, Sugano S, Fujimura T, Suzuki Y, Tsunoda Y, Kurosaki T, Kodama T, Masuda H, Kobayashi M, Xie Q, Lu M, Narikawa R, Sugiyama A, Mizuno K, Yokomizo S, Niikura J, Ikeda R, Ishibiki J, Kawamata M, Yoshimura A, Miura J, Kusumegi T, Oka M, Ryu R, Ueda M, Matsubara K, Kawai J, Carninci P, Adachi J, Aizawa K, Arakawa T, Fukuda S, Hara A, Hashizume W, Hayatsu N, Imotani K, Ishii Y, Itoh M, Kagawa I, Kondo S, Konno H, Miyazaki A, Osato N, Ota Y, Saito R, Sasaki D, Sato K, Shibata K, Shinagawa A, Shiraki T, Yoshino M, Hayashizaki Y, Yasunishi A: Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science. 2003, 301 (5631): 376-379. 10.1126/science.1081288.PubMedView ArticleGoogle Scholar
- Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, Pham P, Cheuk R, Karlin-Newmann G, Liu SX, Lam B, Sakano H, Wu T, Yu G, Miranda M, Quach HL, Tripp M, Chang CH, Lee JM, Toriumi M, Chan MM, Tang CC, Onodera CS, Deng JM, Akiyama K, Ansari Y, Arakawa T, Banh J, Banno F, Bowser L, Brooks S, Carninci P, Chao Q, Choy N, Enju A, Goldsmith AD, Gurjal M, Hansen NF, Hayashizaki Y, Johnson-Hopson C, Hsuan VW, Iida K, Karnes M, Khan S, Koesema E, Ishida J, Jiang PX, Jones T, Kawai J, Kamiya A, Meyers C, Nakajima M, Narusaka M, Seki M, Sakurai T, Satou M, Tamse R, Vaysberg M, Wallender EK, Wong C, Yamamura Y, Yuan S, Shinozaki K, Davis RW, Theologis A, Ecker JR: Empirical analysis of transcriptional activity in the Arabidopsis genome. Science. 2003, 302 (5646): 842-846. 10.1126/science.1088305.PubMedView ArticleGoogle Scholar
- Sakurai T, Satou M, Akiyama K, Iida K, Seki M, Kuromori T, Ito T, Konagaya A, Toyoda T, Shinozaki K: RARGE: a large-scale database of RIKEN Arabidopsis resources ranging from transcriptome to phenome. Nucleic Acids Res. 2005, 33 (Database issue): D647-50. 10.1093/nar/gki014.PubMed CentralPubMedView ArticleGoogle Scholar
- KOME. [http://cdna01.dna.affrc.go.jp/cDNA/]
- TAIR. [http://www.arabidopsis.org/]
- Matsumoto M, Nishimura T: Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simuation. 1998, 8: 3-30. 10.1145/272991.272995.View ArticleGoogle Scholar
- yamHP. [http://www.gene.nagoya-u.ac.jp/~obokata-g/yyy/yamHP.html]
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998, 95 (25): 14863-14868. 10.1073/pnas.95.25.14863.PubMed CentralPubMedView ArticleGoogle Scholar
- EisenLab. [http://rana.lbl.gov/EisenSoftware.htm]
- Kosugi S, Ohashi Y: PCF1 and PCF2 specifically bind to cis elements in the rice proliferating cell nuclear antigen gene. Plant Cell. 1997, 9 (9): 1607-1619. 10.1105/tpc.9.9.1607.PubMed CentralPubMedView ArticleGoogle Scholar
- Martinez-Garcia JF, Huq E, Quail PH: Direct targeting of light signals to a promoter element-bound transcription factor [see comments]. Science. 2000, 288 (5467): 859-863. 10.1126/science.288.5467.859.PubMedView ArticleGoogle Scholar
- Yang T, Poovaiah BW: Calcium/calmodulin-mediated signal network in plants. Trends Plant Sci. 2003, 8 (10): 505-512. 10.1016/j.tplants.2003.09.004.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.