Genome-scale analysis and comparison of gene expression profiles in developing and germinated pollen in Oryza sativa

Background Pollen development from the microspore involves a series of coordinated cellular events, and the resulting mature pollen has a specialized function to quickly germinate, produce a polar-growth pollen tube derived from the vegetative cell, and deliver two sperm cells into the embryo sac for double fertilization. The gene expression profiles of developing and germinated pollen have been characterised by use of the eudicot model plant Arabidopsis. Rice, one of the most important cereal crops, has been used as an excellent monocot model. A comprehensive analysis of transcriptome profiles of developing and germinated pollen in rice is important to understand the conserved and diverse mechanism underlying pollen development and germination in eudicots and monocots. Results We used Affymetrix GeneChip® Rice Genome Array to comprehensively analyzed the dynamic changes in the transcriptomes of rice pollen at five sequential developmental stages from microspores to germinated pollen. Among the 51,279 transcripts on the array, we found 25,062 pollen-preferential transcripts, among which 2,203 were development stage-enriched. The diversity of transcripts decreased greatly from microspores to mature and germinated pollen, whereas the number of stage-enriched transcripts displayed a "U-type" change, with the lowest at the bicellular pollen stage; and a transition of overrepresented stage-enriched transcript groups associated with different functional categories, which indicates a shift in gene expression program at the bicellular pollen stage. About 54% of the now-annotated rice F-box protein genes were expressed preferentially in pollen. The transcriptome profile of germinated pollen was significantly and positively correlated with that of mature pollen. Analysis of expression profiles and coexpressed features of the pollen-preferential transcripts related to cell cycle, transcription, the ubiquitin/26S proteasome system, phytohormone signalling, the kinase system and defense/stress response revealed five expression patterns, which are compatible with changes in major cellular events during pollen development and germination. A comparison of pollen transcriptomes between rice and Arabidopsis revealed that 56.6% of the rice pollen preferential genes had homologs in Arabidopsis genome, but 63.4% of these homologs were expressed, with a small proportion being expressed preferentially, in Arabidopsis pollen. Rice and Arabidopsis pollen had non-conservative transcription factors each. Conclusions Our results demonstrated that rice pollen expressed a set of reduced but specific transcripts in comparison with vegetative tissues, and the number of stage-enriched transcripts displayed a "U-type" change during pollen development, with the lowest at the bicellular pollen stage. These features are conserved in rice and Arabidopsis. The shift in gene expression program at the bicellular pollen stage may be important to the transition from earlier cell division to later pollen maturity. Pollen at maturity pre-synthesized transcripts needed for germination and early pollen tube growth. The transcription regulation associated with pollen development would have divergence between the two species. Our results also provide novel insights into the molecular program and key components of the regulatory network regulating pollen development and germination.


Background
The formation of highly specialized haploid male gametophytes (pollen) from microspores involves a series of cellular events. The microspore released from tetrads quickly increases in size and then undergoes asymmetric mitosis (pollen mitosis I [PMI]) to generate a large vegetative cell and a small generative cell. Thereafter, the generative cell undergoes second mitosis (PMII), giving rise to two sperm cells, and the vegetative cell exits the cell cycle [1]. As a result, the pollen is specialized in function, and during pollination, it can quickly germinate, produce a vegetative cell-derived polarly growing pollen tube, and deliver the two sperm cells into the embryo sac to initiate double fertilization. In addition to its intrinsic function for sexual reproduction, pollen represents an excellent model system for uncovering the molecular mechanisms of fundamental cellular processes such as cell division, differentiation, fate determination, polar establishment, cell-to-cell recognition and communication [1,2].
Molecular genetic and biochemical studies have identified several genes involved in the regulation of pollen development, specifically those in germ division and sperm specification mainly in Arabidopsis [1,[3][4][5]. For example, TIO and kinesin-12A/kinesin-12B are required for PMI, and DUO1, DUO3, CDKA;1 and FBL 17 for PMII [1,6]. A recent study showed that DUO1, a germline-specific R2R3 Myb gene, is a key regulator integrating cell cycle progression and sperm specification [7]. However, compared with knowledge of the complicated and ordered cellular events leading to pollen maturation, which are accompanied by serial changes in gene expression profiles [1,4,5,8], our knowledge about the mechanism underlying pollen development is still limited.
Several studies have analyzed transcriptome features of mature pollen from Arabidopsis by 8 K Affymetrix AG microarrays (representing about 30% genes of the genome) [9,10] or GeneChip ATH1 arrays (representing about 80% genes of the genome) [11]. These studies revealed that mature pollen has a smaller and more unique transcriptome with a high proportion of selectively expressed genes than do vegetative tissues. These genes expressed selectively in pollen are functionally skewed towards cell wall metabolism, signalling and cytoskeleton dynamics [9][10][11], which suggests that the transcriptional characteristics involve pollen function specialization. Lee and Lee [12] compared gene expression profiles of Arabidopsis pollen under normal and cold-stress conditions using serial analysis of gene expression technology and found that most of the genes expressed in pollen were not affected by cold stress. Haerizadeh et al. [13] identified transcripts expressed selectively in mature pollen of soybean (Glycine max) and revealed that the transcriptome had a high proportion of transcripts involved in signalling, transcription, heat shock response, transporting and the ubiquitin/proteasome pathway. Importantly, Honys and Twell [14] systematically analyzed dynamic transcriptome profiles of pollen development from microspore to mature stages in Arabidopsis using GeneChip ATH1 arrays and revealed a decrease in proportion of transcript species and an increase in the proportion of pollen-specific transcripts during the development process. By using the same Arabidopsis ATH1 array, Borges et al [15] revealed most of these sperm cell-expressed genes were also expressed in mature pollen and 11% of them were enriched expression in the sperm cell. These sperm cell-enriched transcripts were preferentially involved in DNA repair, ubiquitinmediated proteolysis and cell cycle progression. Additionally, a comparison of the pollen transcriptome of mutants deficient in different MIKC* protein complexes and wild type plants in Arabidopsis showed the absence of the protein complexes affected the expression of more than 1300 genes during pollen maturation [16]. These studies gave a comprehensive picture of temporal dynamics of gene expression profiles in pollen development.
Pollen germination and polar tube growth need a coordinated action of multiple cellular and biochemical events; and the tip-focused intracellular Ca 2+ gradient and tip plasma membrane-localized Rop1 GTPase have been well documented as being important factors for regulation of polar establishment and growth of pollen tubes [17]. In contrast to the increased molecular information about development and maturity of pollen, that about genome-wide events underlying pollen germination and tube growth, which are essential for understanding the molecular mechanisms of polar tube growth and invasion into pistils, is limited. Inhibition experiments with numerous plant species including Arabidopsis, demonstrate that pollen germination and polar tube growth strictly depend on protein synthesis and are relatively independent of transcription [14,[18][19][20]. In Arabidopsis, relative to mature pollen, hydrated pollen and pollen tubes have a large number of newly transcribed genes [21]. Thus, de novo synthesis of transcripts in germinating pollen may be not crucial to germination and early tube growth, although the issue needs to be addressed by sequential comparison of transcriptomes from developing and germinated pollen.
Rice, one of the most important cereal crops, is the staple food for half of the world's population and has been used as an excellent model after Arabidopsis, because of its relatively smaller genome and the completion of the genome sequence. Rice pollen has several features different from Arabidopsis pollen. Rice pollen is representative of wind-pollinated tricellular pollen and has a thinner wall (0.8~1.2 μm) and fewer lipids in the coat layer than do other plants in Gramineae [22,23]. In addition, rice pollen appears to have less longevity under in vitro condi-tions than does dicot pollen [23] and loses germination activity in a short period of time [23][24][25]. Russell et al. [26] identified putative pollen allergens and gene organization of these allergens in rice on a genome-wide scale. To help understand the molecular regulation of pollen development and germination, we analyzed transcriptome features of rice pollen during five sequential developmental stages, from microspores to germinated pollen, using the Affymetrix GeneChip ® Rice Genome Array. These results shed light on the overall characteristics of expression profiles associated with pollen development and germination and on the molecular program and key components of the regulatory network regulating the processes.

Isolation and characterization of developing and in vitrogerminated pollen
To analyze changes in expression profiles during pollen development and germination, we isolated and purified uninucleate microspores (UNMs), bicellular pollen (BCP) and immature tricellular pollen (TCP) (see Materials and Methods) with purity of about 95%, 80% and 92%, respectively ( Figure 1A~F, and 1J). Mature pollen grains (MPGs) were collected from blossoming flowers, and all were tricellular ( Figure 1G~H). On fluorescin 3', 6'-diacetate staining, more than 90% of the isolated UNMs, BCP, TCP and MPGs were viable (data not shown). The condition used for pollen germination resulted in germinated pollen grains (GPGs) of 80% ( Figure 1I and 1J).

Transcriptome characteristics in developing and germinated pollen
We analyzed genome-wide gene expression profiles of all five samples --UNMs, BCP, TCP, MPGs and GPGs -along with sporophytic tissue callus cells, roots and leaves as controls, using the Affymetrix GeneChip ® Rice Genome Array. Three independent biological replicates were performed for each sample. The correlation coefficient value for each experiment was larger than 0.93. Real-time quantitative RT-PCR (qRT-PCR) analysis was used to confirm the array data. In total, 54 pollen-preferential/stage-enriched transcripts were analyzed (Additional file 1). The signal intensity values of these examined transcripts ranged from 189,609 to 1.1, and the ratio was from 22,992.57 to 4.71 (Additional file 1). The profiles produced by qRT-PCR and GeneChip ® Rice Genome Array showed a significant positive correlation for 45 of the 54 genes (r > 0.707, P < 0.05) (Figure 2), which demonstrated that about 83% of the microarray expression data could be confirmed by qRT-PCR.
The GeneChip ® Rice Genome Array contains probes to query 51,279 transcripts http://www.affymetrix.com. Analyses involving the microarray suite (MAS) 5.0 detection algorithm revealed 14,590 genes expressed in UNMs, 12,967 in BCP, 12,514 in TCP, 5939 in MPGs and 5945 in GPGs, in comparison with 16,000 in callus cells, 17,383 in roots and 17,424 in leaves (Additional file 2). To be cautious, we re-analyzed the raw data using DNA-Chip Analyzer (dChip) and obtained the same results. Thus, the developing and germinated pollen have a smaller transcriptome than do sporophytic tissues. These data also show greatly decreased transcript diversity from UNMs to MPGs, similar to the observation in Arabidopsis pollen [14]; whereas the transcript diversity of GPGs was similar to that of MPGs.
Furthermore, we analysed the correlation among the transcriptome profiles of pollen at different stages (Additional file 3). Because callus cells have active cell division activity and are a mass of undifferentiated cells, the transcriptome profile of the callus cells was used as a reference. With advanced pollen development, the similarities between the pollen and callus cell transcriptome profiles decreased greatly, from the highest for UNMs (r = 0.72) to the lowest for MPGs (r = 0.11). Accordingly, the expression profile for UNMs was more similar to that for BCP (r = 0.82) than to that for other stages, with the lowest similarity to the transcriptome profiles for MPGs and GPGs (r = 0.13). A similar result was also observed in Arabidopsis [14]. The findings suggested that UNMs appeared to use an expression program similar to that of callus cells for cell proliferation, and thereafter, the program in developing pollen was specified for pollen functional specialization. Interestingly, in line with the observation of MPGs and GPGs having almost the same number of diverse transcripts, the gene expression profiles of MPGs and GPGs were significantly and positively correlated (r = 0.99), which suggests that MPGs have stored a set of transcripts that will be used for germination and early tube growth. Combined with results of several previous studies of inhibiting transcription and translation which have showed pollen germination and early tube growth strictly depend on protein synthesis but not transcription [14,[18][19][20], our results confirmed transcript storage through pollen germination in rice.

Identification of development stage-enriched anddownregulated genes
We identified development stage-enriched genes by comparing the expression levels of genes in developing and germinated pollen and in sporophytic tissues (callus cells, roots and leaves) using the Z-score transformation normalization method [27], with cutoffs of ratio ≥ 2.0 and Zscore ≥ 3.75. The analysis revealed 2,203 development stage-enriched probe sets: 660 (corresponding to 568 unigenes) expressed preferentially in UNMs, 174 (146 unigenes) in BCP, 246 (198 unigenes) in TCP, 537 (296 unigenes) in MPGs and 586 (358 unigenes) in GPGs (Additional file 4b1, c1, d1, e1 and 4f1). This finding indi-cated a "U-type" change tendency in number of stageenriched transcripts during the process, and GPGs appeared to have a number of enriched transcripts similar to that of MPGs. Furthermore, we used the same method to screen stage-enriched genes in Arabidopsis developing pollen from an available database [14] and found 190 genes expressed preferentially in UNMs, 94 in BCP, 161 in TCP, and 313 in MPGs. These data indicated that in both rice and Arabidopsis, the number of stageenriched transcripts decreased sharply from UNMs to BCP and thereafter increased to a maximum level in MPGs. Relative to stage-enriched genes, genes downregulated in a development stage showed a distinct feature. Using the same parameter cut-offs of ratio ≤ 0.5 and Zscore ≤ -3.75 as for screening stage-enriched genes, we did almost not identified transcripts downregulated in each stage (only 1 in UNMs, 1 in BCP, none in TCP, 1 in MPGs and 3 in GPGs). Therefore, we used relatively low stringent parameters (ratio ≤ 0.5 and Z-score <-1.7245 (pvalue < 0.05)) to screen genes downregulated at each stage. The analysis revealed that 70 transcripts (corresponding to 65 unigenes) downregulated in UNMs, 52 (49 unigenes) in BCP, 143 (138 unigenes) in TCP, 429(411 unigenes) in MPGs, and 530 (507 unigenes) in GPGs (Additional file 4b2, c2, d2, e2 and 4f2), which suggested that stage-downregulated genes were substantially increased in number after the bicellular pollen stage.
Furthermore, we analyzed the functional features of these transcripts according to a part and/or an instance of the parent of gene ontology (GO) terms and BLAST  search results of transcripts without GO terms. Not taking into account the unknown transcripts (annotated as "expressed" or "hypothetical" or with no annotated coding sequence information in the TIGR rice genome annotation database http://rice.plantbiology.msu.edu/) and the transposable element-related transcripts, the remaining development stage-enriched and -downregulated genes could be classified into 13 functional groups and one "unclassified" group in which the genes had putative functional information but could not be classified clearly into the above 13 groups (Figure 3). The distribution of stage-enriched and -downregulated genes showed obviously dynamic change at distinct stages ( Figure 3 and Additional file 5). Both stage-enriched and -downregulated transcripts showed a functional skew toward metabolism, transcription/RNA process, and protein degradation; the latter two groups are important in regu-lation of transcription and protein turnover. This finding suggested that a network of transcription and protein turnover regulation is required for pollen development and function specification. However, functional features of the two datasets were strikingly different in below aspects. First, relative to stage-enriched genes associated with the cell cycle, cytoskeleton, cell wall and phytohormones, stage-downregulated genes implicated in these functional terms were in much lower number and were obviously stage-dependent. For example, stage-downregulated cell cycle-related transcripts preferentially occurred in BCP, and fewer phytochormone-related transcripts were identified in the stage-downregulated dataset. Second, among the stage-downregulated genes, signalling-related transcripts were overrepresented in UNMs, but among stage-enriched genes, such transcripts were overrepresented in BCP. Third, the number of defense/stress response-related transcripts increased greatly from UNMs and peaked in MPGs among the stage-enriched dataset, whereas decreased from BCP in the stage down regulated dataset. Unexpectedly, stagedownregulated transcripts implicated in vesicle trafficking were identified in pollen of all stages, but stageenriched transcripts implicated in the functional term were only in pollen from UNM to TCP stages. This suggested that development stage-dependent enrichment or downregulation of different genes is required for pollen development and function specification. Analysis of development stage-enriched transcripts in the molecular functional and cellular component terms "enrichment status" and "hierarchy" showed that cation transmembrane transporter and antiporter activity had statistical importance in TCP ( Figure 4A and 4C). Transcripts involved in membrane and membrane-bounded organelle seemed to be more important in TCP-enriched data ( Figure 4B). Vesicle trafficking activity was statistically significant in MPGs and GPGs in the development stage-downregulated dataset (Additional file 6).

Identification of genes expressed preferentially in developing and germinated pollen
To extensively analyze the molecular regulation of pollen development and germination, we further screened transcripts expressed preferentially in pollen (one or more stage) using sporophytic tissues as controls (ratio ≥ 2.0, Z-score ≥ 3.19) and those shared across all pollen stages and sporophytic tissues (see Materials and Methods). The screening identified 25,062 pollen-preferential transcripts (Additional file 4a) and 10,777 transcripts expressed constitutively both in pollen and sporophytic tissues (Additional file 4h). Relative to the constitutively expressed transcripts that may have a housekeeping function in pollen and sporophytic development, the pollenpreferential transcripts have an important function in pollen development. Therefore, we focused on the expression characteristics of 6 functional groups (2,858 of 25,062 preferentially expressed transcripts) related to cell cycle regulation, phytohormone signalling, the kinase system, the ubiquitin/26 S proteasome system (UPS), transcription, and defense/stress response that have diversely important roles in different tissues, including pollen, as revealed by numerous studies (see below sections). Cluster analysis of the 2,858 transcripts revealed 5 expression patterns ( Figure 5 and Table 1). The largest cluster was cluster 1 (c1), with 1,043 transcripts whose expression was at the lowest level in UNMs and thereafter increased to the highest level in MPGs and GPGs. The second and third largest clusters were clusters 0 (c0; 567) and 2 (c2; 516), respectively. Transcripts in c0 were upregulated from UNMs to GPGs. The expression level of transcripts in c2 increased from UNMs to MPGs but decreased from MPGs to GPGs. Cluster 3 (c3), of 333 transcripts, began to be up-regulated in UNMs, peaked at BCP and TCP, and decreased thereafter. Cluster 4 (c4) consisted of 399 transcripts, whose expression was highest in UNMs, was reduced to a relatively low level in TCP and remained at a low level in the following stages. Transcripts involved in distinct functional groups showed heterogeneous distribution in the 5 clusters. For example, most transcripts of calcium signal-related kinases were in c1 (19/33), which is in line with known important roles of calcium signalling in germination and polar tube growth [28]. Therefore, the change in expression patterns of distinct transcripts suggests the requirement of different development events from UNMs to GPGs.  (Table 1 and Additional file 7a). Overall, the number of TF transcripts showing late accumulation patterns (c0, c1 and c2) was greater than that showing middle (c3) and early (c4) accumulation patterns, which indicates that late development and maturation phases may need more TFs than the early phase. Transcripts of distinct families showed heterogeneous distribution in the five clusters. For example, most AP2-EREBP, bHLH, MYB/MYB-related and orphan transcripts were in c0, c1 and c2 clusters. C3H transcripts were mainly distributed in c1 and c4 clusters, SET and HMG transcripts were mainly in c4, and CSD and TAZ transcripts were only in c4. Multiple TF genes with distinct expression features were also identified in developing Arabidopsis pollen [13,14] and developing maize anthers [29]. And sperm cells of Arabidopsis pollen had their unique TFs [15]. The transcriptome of mature soybean showed an enrichment of TFs [13,14]. This finding suggested that different development events may require different TFs and/or a combination of TFs from distinct families.

Plant hormones
146 hormone-related transcripts showed pollen-preferential expression (Table 1 and Additional file 7b), and most (107) were associated with auxin (80) and ethylene (27) signalling, which indicates the importance of auxin and ethylene signalling in pollen development and germination/tube growth. The ethylene-related transcripts were mainly distributed in c0 and c1 clusters (19/27), whereas the auxin-related transcripts showed a relatively even distribution in all five clusters with larger numbers in c1 and c2. This difference in expression patterns suggests that ethylene may be implicated mainly in pollen maturity and germination, whereas auxin signalling may have more diverse functions in addition to possible roles in maturation and germination (see Discussion). As well, some transcripts related to gibberellin, brassinosteroid and jasmonic acid signalling showed distinct expression patterns (Table 1). Expression profile analyses of develop-ing rice anthers by 10 K [30] and 44 K microarray analysis [31] implied the important role of multiple phytohormones, such as gibberellins, anuxin and ethylene, in pollen development. Gene ontology (GO) term "enrichment status" for the development stage-enriched transcripts in TCP. Transcript with GO term "enrichment status" and "hierarchy" in A), "molecular function"; B), "cellular component"; and C), "biological process" branches. The classification terms and their serial numbers are represented as rectangles. The numbers in brackets represent the total number of genes that may be involved in the corresponding biological processes. The graph displays the classification terms "enrichment status" and "hierarchy". The color scale shows the p-value cutoff levels for each biological process. The deeper colors represent the more significant biological processes in the putative pollen pathway.

Ubiquitin/26S proteasome system
Our analysis revealed 603 UPS transcripts involving distinct UPS components with pollen-preferential expression, and 406 of them encode F-box proteins (Additional file 7)c, which have a crucial role of conferring specificity on the UPS for appropriate targets. These F-box transcripts had distinct expression patterns. The highest number of the F-box transcripts (108) showed an early accumulation pattern (c4), with relatively high distribution in c1, whereas the remaining transcript were distributed in c0, c2 and c3 clusters. Other transcripts of UPS components showed a distribution in the 5 clusters similar to that of F-box transcripts (Table 1).   The six functional groups involve transcription factors, phytohormones, cell cycle regulators, defence/stress responders, kinases and ubiquitin/26 S proteasome factors in different expression patterns (shown in Figure 5). Raw data for the clusters are in Additional file 7.

Defense/stress response-related transcripts
474 defense/stress response-related transcripts were preferentially expressed in pollen (Table 1 and Additional file 7f ). Of these, 82% (388/474) accumulated from a low level in UNMs to the highest level in MPGs and/or GPGs (c0, c1 and c2) ( Table 1). These late-accumulating transcripts were largely abiotic/biotic stress-response proteins such as disease-resistant NB-NRC, NBS-LRR and pathogenesis-related proteins and other stress-response proteins, such as cold acclimation protein COR413-PM1, harpin-induced protein and wound-induced protein WI12 (Additional file 7f ). This finding indicates that a global ability to deal with abiotic and biotic stress formed during pollen maturity may be essential to successful fertilization.

A comparison of rice and Arabidopsis pollen transcriptomes
Based on published data of transcriptomes of mature or developing pollen of Arabidopsis [14], we compared possibly conserved and diverse features of rice and Arabidopsis pollen transcriptomes using pollen-preferential and development stage-enriched transcript datasets. By a BLAST searching with a cutoff of E-value < 1.0E-05, we revealed that 62.4% (1195 genes) of 1916 stage-enriched genes in rice pollen have conserved counterparts in the Arabidopsis genome; 677 (containing 22 stage-enriched genes) of these homologous genes were expressed in developing Arabidopsis pollen ( Figure 6 and Additional file 8b). Consistent with this result, further analysis of rice pollen preferential genes showed that 56.6% of rice pollen-preferential genes (10,539 of 18,630) had homologous partners in the Arabidopsis genome, corresponding to 6434 genes in Arabidopsis genome ( Figure 6 and Additional file 8a). However, among the 6434 genes, 63.4% (4079 containing 705 pollen-preferential genes) were expressed in developing pollen of Arabidopsis ( Figure 6 and Additional file 8a  Table 2). Interestingly, 13 TF families, such as CSD, DBP and DDT, appeared to be specific to rice pollen, and none were identified in developing Arabidopsis pollen ( Table  2); whereas 7 families, such as AtRKD, CAMTA and REM which expressed in Arabidopsis pollen, were not identified in the rice pollen transcriptome dataset ( Table 2).

Analysis of cis-acting regulatory elements from rice pollen stage-enriched genes
To provide insights into the transcription regulation of gene expression during rice pollen development, we identified cis-acting regulatory elements from 2-kb regions upstream of the start codon of the development stageenriched genes using the PLACE database. Of 170 identified cis-elements, 107 (62.9%) had unknown functions (Additional file 10), which suggests that they represent the new cis-element in regulating transcription during rice pollen development. Among the known cis-elements, GTGANTG10 and POLLEN1LELAT52 are known pollen-specific cis-elements, identified in BCP-and TCPenriched genes, respectively, and also identified in rice male gamete-and tapetum-specific genes as revealed by LM-microrray [32]. Impressively, a high proportion of respective UNM-, TCP-and MPGs-enriched genes shared common cis-elements, such as CAGATAA in UNMs, CACGTG in BCP, AAATAAG in TCP, and ATATAT in MPGs. However, relative to the "U" type distribution of stage-enriched genes, the diversity of identified cis-elements showed a nearly reversed distribution, and more cis-elements were found in BCP. Additionally, Figure 5 Expression patterns analysis of 2,858 pollen-preferential transcripts of six functional groups. The 2,858 transcripts, involving transcription factors, phytohormones, kinases, cell cycle regulators, ubiquitin/26 S proteasome system and defence/stress responders, were distributed in five clusters (c0 to c4). The clusters were created by GeneCluster 2.0; raw data for each cluster are listed in Additional file 7. X-axis denotes the developmental stages from UNM to GPG; y-axis denotes the normalized expression level of the transcripts. the most cis-elements were identified in GPGs. The ciselements identified in respective stage-enriched genes appeared different. The results imply that different regulators or a combination of regulators are involved in regulation of pollen development at respective stages, and BCP and GPGs may require more diverse regulators than do other pollen.

Discussion
Our analysis of the genome-wide gene expression profiles of developing and germinated rice pollen revealed dynamic characteristics of transcriptomes during pollen development and germination, and led to the identification of 25,062 transcripts expressed preferentially in rice pollen among the 51,279 transcripts on the array. Of these, 2,203 showed development stage-enriched expression. Furthermore, the pollen-preferential transcripts involved in 6 functional groups --cell cycle regulators, phytohormones, kinases, UPS, transcription factors, and defense/stress responders --could be classified into 5 expression patterns. The transcripts from distinct functional groups showed heterogeneous distribution in the 5 expression patterns, which suggests that the change in expression profiles were associated with the requirement of different events in pollen development and germination. These data provide information on a large number of candidate genes for further elucidation of the molecu-lar mechanism underlying pollen development and germination and for use in control of pollen fertility and crop yield in rice.

Conserved and divergent features of the transcriptome of developing pollen between rice and Arabidopsis
Rice and Arabidopsis are the best-characterized experimental models for eudicot and monocot plants, respectively, two major evolutionary lineages within the angiosperms. The developing pollen from microspore to mature stages in rice and Arabidopsis shares important cellular events, and their mature pollen grains are tricellular [33]. Consistent with the common cellular features, the diversity of genes expressed during pollen development from microspore to mature pollen stages was greatly decreased in both two species. The development stage-enriched genes showed a similar change tendency in developing rice and Arabidopsis pollen (for details, see below), and had a similar distribution in most functional terms (Additional file 11, detailed in Additional file 5a and b for rice and Additional file 12 for Arabidopsis). The rice pollen transcriptome had homologs of several genes such as DUO3 (At1g64570), FBL17 (At3g54650) and GEX1 (At2g35630) (Additional file 8) which are key regulators of germline development in Arabidopsis [1]. However, developing rice pollen appeared to express more development stage-enriched transcripts associated  with defence/stress response in MPGs than do Arabidopsis pollen. Rice pollen at bicellular stage showed enrichment of more signalling-related transcripts than that in other stages as compared with Arabidopsis pollen, which showed signalling-related transcripts enriched in MPGs (Additional file 11). As well, these functional features of the transcription/RNA process seemed to greatly differ in developing pollen of rice and Arabidopsis (Additional file 11). This finding suggested a possible difference between pollen development in rice and Arabidopsis in the mechanism to handle defence/stress response, signalling and gene expression regulation. Consistent with this notion, our analysis showed 56.6% of rice pollen-preferential genes had homologs in the Arabidopsis genome, a finding similar to that from genome-wide comparison of rice and Arabidopsis or comparison of genes expressed in different organs [34]. However, besides the fact that 43.4% of rice pollen-preferential genes had no homologs in Arabidopsis genomes, a high proportion of genes conserved in Arabidopsis genome were not detected to express in Arabidopsis pollen (Additional file 8). These features, in combination with the finding that rice pollen preferentially expressed a set of unique TFs as compared with Arabidopsis pollen (Table 2), suggest a difference between rice and Arabidopsis in molecular regulation associated with pollen development.

A shift in gene expression profiles during pollen development is associated with the requirement of distinct cellular events
We showed a greatly decreased diversity of transcripts in developing rice pollen from UNMs to MPGs, which is consistent with the observations in developing pollen in Arabidopsis [14]. In contrast to this finding, the number of development stage-enriched transcripts showed a "Utype" change during the development process, with the lowest in BCP. A similar "U-type" change tendency can be observed for stage-enriched transcripts of developing Arabidopsis pollen by re-analyzing Arabidopsis microarray data now available. These results suggest that a shift in gene expression program may exist in pollen development and BCP may be a key point for the regulation of the shift. This notion was also supported by the following evidence. First, several early observations in different plant species showed that developing pollen expressed distinct early and late transcript populations at early and late stages, respectively [18]. Second, protein profiles were found to be different in early and late pollen [35]. Third, the diversity of transcripts from distinct functional groups both stage-enriched and stage-downregulated displayed stage-dependent changes. For example, relatively more stage-enriched cell cycle-related genes were in UNMs than in other stages, and BCP had more signalling-related transcripts than other pollen, whereas UNMs had more stage-downregulated transcripts implicated in signalling, and cell cycle-related transcripts were greatly downregulated in BCP ( Figure 3). As development advanced, transportation-related transcripts displayed statistical importance in TCP (Figure 4), and diverse wallrelated and defence/stress-related transcripts accumulated to a high level during pollen maturity ( Figure 3). Finally, relative to the smallest set of development stageenriched genes at BCP, more diverse cis-elements were identified in the set as compared with the UNM and TCP sets (Additional file 10). Taken together, these data suggest that the shift may be essential to changes in development events from cell division and differentiation at early stages to maturation at late stages.

Germinated pollen has a gene expression pattern similar to that of mature pollen
Transcription and translation inhibition experiments have shown that germination and early polar tube growth strictly depend on protein synthesis and are relatively independent of transcription in numerous plant species [14,[18][19][20]. Transcriptome research of Arabidopsis pollen also suggested that the transcriptome of mature pollen skews toward pollen germination and tube growth [14]. Analysis of expression profiles of anther-expressed genes in rice indicated that genes possibly implicated in germination and pollen tube elongation are accumulated in late stages [32]. However, direct molecular evidence was lacking. Our data showed GPGs had a transcriptome profile most similar to MPGs (99%) (Additional file 3). Together with the proteome observations that the protein expression profiles of GPGs and MPGs mainly show variation in expression levels in rice [36] and Gymnospermae pine [37], our data clearly indicate that pollen germination and early tube growth mainly depend on these pre-synthesized mRNAs in mature pollen, at least in rice. In addition, our results showed several genes were up-or downregulated during germination (c0 and c2), and GPGs had a set of enriched transcripts. Therefore, these transcriptionally changed and GPG-enriched genes could be involved mainly in late tube growth and interaction with stigma cells and possibly play roles in germination and early tube growth. However, a recent transcriptome study demonstrated that hydrated pollen and pollen tubes have a larger number of newly expressed genes than do mature pollen [21]. This distinction may be associated with the different characteristics of rice and Arabidopsis pollen, but detailed studies are needed.

Pollen development and germination are stringently regulated by 26S proteasomes
As an important molecular feature, rice pollen preferentially accumulated large numbers of UPS transcripts (Additional file 7c). Importantly, among these transcripts, 406 transcripts (369 unigenes) encode F-box proteins, which represent more than half of the now-predicted potential F-box proteins (687) in the rice genome [38]. Fbox proteins are considered to function as a key component of E3 ligase complexes to specifically recognize proteins targeted for degradation by UPS. Two E3 complexes, APC and Skp1/Cullin/F-box (SCF), are known to control cell cycle progression [39] by regulating the periodically selective degradation of cyclins and other cell cycle regulators such as CDC6 [40], CDT1a [41], E2Fc [42], and the CDK inhibitor ICK2/KRP2 [43][44][45].
Most of the cell-cycle regulator transcripts mentioned above accumulated at the highest level in UNMs and were co-expressed with several F-box transcripts involved in APC, skp1 and cullin (Table 1 and Additional file 7e). This finding suggests that early-accumulated Fbox transcripts are implicated in cell cycle regulation. However, our data showed 108 F-box transcripts accumulated at the highest level in UNMs. This preferential presence of multiple F-box transcripts in UNMs may imply other functions of F-box proteins besides cell cycle control.
In addition, 249 F-box transcripts showed preferential accumulation in pollen following completion of the cell cycle (Table 1). Accordingly, many F-box transcripts are enriched in sperm cells of Arabidopsis [15]. Although functions of F-box genes in pollen maturity, germination and tube growth still remain unidentified, several studies have revealed that in self-incompatible plants such as Antirrhinum hispanicum, the pollen-expressed F-box protein AhALF-S2 acts as a pollen determinant to control pollen function in the self-incompatible reaction [46], and UPS also has roles in regulating polarized cell morphogenesis [47]. In vitro experiments showed that inhibition of UPS activity strongly inhibited pollen germination in kiwifruit [48]. Together, these data indicate that multiple F-box transcripts, which are preferentially accumulated in pollen, are involved in pollen maturity, germination and tube growth after the completion of the cell cycle.

Cell cycle transcripts show distinct early and late accumulation patterns
The formation of tricellular pollen involves PMI and PMII, and the fate of daughter cells is determined after PMI [1,5]. Consistent with this cellular feature, transcripts encoding key cell cycle proteins, including cdc2, CYCK1, CYCB, ASF1, FtsZ, MinD, GlsA and Hsp70, were accumulated preferentially in UNMs (Additional file 7e). Generally, during the cell cycle, different CDK/cyclin complexes activate substrates functioning in the G1-to-S and G2-to-M transition and then trigger the onset of DNA replication and mitosis, respectively [49]. CDKs (cdc2) and cyclins are core regulators in the cell cycle, and the activity of CDK/cyclin complexes is controlled by several regulators [6,45,49]. The histone chaperone ASF1 is predicted to participate in DNA damage repair and histone acetylation in the mitotic S phase, and depletion of ASF1 results in the accumulation of S-phase cells [50][51][52]. FtsZ is required for formation of the division ring in bacterial cell division [53], and the position of the FtsZ-based division ring is mainly determined by MinD [54,55]. The ring is involved in the position determination of cell division. GlsA and Hsp70 interact as partner chaperones to regulate asymmetric division in Volvox carteri [56,57]. These results suggest that these proteins and/or their interactions would be implicated in regulation of pollen cell cycle. However, we found most of these key cell-cycle protein transcripts were downregulated as pollen entered the BCP stage. Most of the preferentially expressed cell cycle-related transcripts in BCP and/or TCP were those encoding general enzyme proteins such as serine/threonine protein phosphatases PP2 and PP1, dihydrolipoamide dehydrogenase, and nucleoside diphosphate kinase (Additional file 7e). This result implies that these key cellcycle protein transcripts expressed preferentially at the UNM phase may be essential for PMII, possibly through protein dephosphorylation/phosphorylation. Impressively, large numbers of transcripts encoding cell-cycle core regulators CYCA, CYCB, CYCC and CDC2, mitosis checkpoint proteins, and mitotic entry/ exit-related proteins cullin, skp1, Mob1, TDP and RAD23 were accumulated preferentially in MPGs and/or GPGs (Additional file 7e). A similar expression pattern was observed in Arabidopsis and soybean mature pollen [11,13]. In Arabidopsis, the sperm cells are in S-phase at anthesis, continue the cell cycle process during pollen tube growth and reach G2 just before fertilization, whereas the vegetative nucleus is arrested in G1 [1]. The above results suggest these pre-synthesized cell-cycle transcripts at pollen maturity have roles in mitotic progression after fertilization.

Kinases and phytohormones function in pollen germination and polar tube growth
Most of the transcripts encoding receptor/receptor-like kinases and those encoding calcium and phospholipid signalling-related kinases, MAPKs, WAKs and PTOs/ PTIs displayed late accumulation patterns, with the highest levels in MPGs and/or GPGs (Table 1). Specifically, 87% of the identified pollen-preferential receptor kinase transcripts accumulated to the highest levels in MPGs and/or GPGs (Table 1). Calcium signal and calcium concentration gradients play important roles in pollen germination and polar tube growth [58,59]. Phosphoinositide kinases participate in the regulation of cytosolic Ca 2+ concentration by promoting Ca 2+ sequestration or mobilizing intracellular Ca 2+ stores [60]. Re-organization of the cytoskeleton is one of the early events after hydration of mature pollen grains, and MAP affects the dynamics of microtubule cytoskeleton by phosphorylating microtubule-associated proteins [61]. Receptor kinases are required for pollen maturity, tube growth and pollenstigma interaction [62][63][64]. Together, these data suggest that late accumulated kinase transcripts are essential for pollen germination, tube growth and pollen-stigma interaction.
The importance of auxin and ethylene in pollen germination and tube growth has been a focus for a long time. In Arabidopsis, free IAA is lacking in very young pollen but is accumulated to extremely high levels in mature and germinated pollen; high IAA level is involved in the control of pollen tubes growth towards the egg cell in the ovule [65]. In rice, IAA is accumulated in anthers containing tricellular pollen [31]. Study of Torenia fournier revealed the roles of IAA in the increase of secretory vesicles, in enhanced synthesis of pectin and in the decrease of cellulose density in pollen tubes [66]. In rice pollen, 75% of the pollen-preferential auxin synthesis/signallingrelated transcripts, which involve auxin-conjugated hydrolysis, efflux-carrier, transport and synthesis proteins (enzymes) (Additional file 7b), showed late accumulation patterns (c0, c1 and c2). The expression profile seems compatible with the high level accumulation of free IAA observed in Arabidopsis pollen [65] and mature rice anthers [31]. These lines of evidence suggest that changes in free IAA levels involve the combined effects of auxin biosynthesis, conjugation and transport during pollen development and maturity.
Ethylene is also required for pollen germination and tube growth because inhibitors of ethylene biosynthesis strongly retard pollen germination and tube growth [67][68][69]. Although several observations reveal that auxin initiates ethylene production in pollen [67], current data show that pollination-mediated initial burst of ethylene in the pistil regulates early tube growth [68,69]. Our study revealed 81% (22/27) of pollen-preferential ethylene signalling-related transcripts accumulated at the highest level in MPGs and/or GPGs, and most of them encode the components of ethylene signalling rather than ethylene synthesis enzymes such as ACC synthase and ACC oxidase (Additional file 7b). These data appear to be compatible with the model of auxin action in pollination, which assumes that auxin from pollen diffuses into the pistil, where auxin stimulates the production of ethylene, which in turn triggers pollen germination/tube growth and ovary development. This finding suggests that ethylene signalling may be required for pollen tube growth and coordination between pollen tubes and pistil cells by interacting with auxin signalling. This notion seems to be supported by the concordant expression of auxin and ethylene signalling transcripts in MPGs and GPGs in rice and the observation that ovary development and pollen germination/tube growth are coordinately regulated by auxin and ethylene following pollination in orchid [67].

Conclusions
We analyzed the dynamic changes in genome-wide gene expression profiles of rice pollen during 5 sequential development stages, from microspore to germination. Overall, pollen development from microspores to mature pollen is associated with a great decrease in diversity of transcripts and the "U-type" change in the number of stage-enriched transcripts, with the lowest at the bicellular pollen stage. These features were conserved in the transcriptome of developing pollen of Arabidopsis. The gene expression profile of germinated pollen was similar to that of mature pollen. Our analysis also reveals that changes in functional groups of the stage-enriched anddownregulated transcripts and in expression patterns of pollen-preferential transcripts involving important regulatory proteins are compatible with the transition of distinct cellular events during pollen development and germination. A comparison showed that stage-enriched transcripts both in rice and Arabidopsis had similar distribution in most of functional terms but great difference in defence/stress response, signalling and transcription/ RNA process. A proportion of rice pollen-preferential genes had no homologs in the Arabidopsis genome or their homologs were not expressed in Arabidopsis pollen. Several transcription factors were identified to be diverged in the two species. These data supply the first comprehensive and comparative molecular information for further understanding the mechanism underlying pollen development and germination.

Plant materials
Rice cultivar Zhonghua 10 (Oryza sativa L. ssp. japonica) was used for this study. Roots and leaves were collected from 2-week-old seedlings grown in a climate chamber under a 12-hr light/12-hr dark cycle at 28°C. Callus cells were induced from rice seeds on N6 solid medium containing 2,4-D (2 mg/L) in the dark at 25°C for approximately 1 month. Other materials described below were harvested from rice plants grown under natural conditions in the growing season (from May to September) in Beijing, China.

Pollen isolation and purification
For pollen isolation, rice anther samples were first classified into uninucleate, bicellular, immature tricellular stages by the distance between auricles of the last two leaves and the length of the panicle and the floret [70,71]. Then, uninucleate mcirospores (UNMs), bicellular pollen (BCP), and immature tricellular pollen (TCP) were isolated from the classified anthers at the corresponding stages.
UNMs and TCP were prepared as follows: the anthers at the uninucleate and tricellular stages were crashed gently in 0.4 M mannitol at 4°C; the resulting slurry was filtered through 150 μm and subsequent 100 μm nylon mesh to remove anther debris, through 60 μm nylon mesh to remove cell debris, and finally through 30 μm nylon mesh to collect pollen. Bicellular pollen (BCP) was collected as described [14] with several modifications. Briefly, the anthers containing BCP were collected and gently ground in 0.4 M mannitol at 4°C. After filtering subsequently through 150 μm and 60 μm nylon mesh, the pollen was harvested by centrifugation at 500 g for 5 min at 4°C, and then purified by 25%/30%/45%/80% Percoll (Pharmacia) step gradient under 500 g centrifugation for 5 min at 4°C. Resulting BCP cells were collected from the 30%/45% fraction, pelleted by centrifugation at 500 g for 5 min at 4°C, and washed with 0.4 M mannitol. Mature pollen grains (MPGs) were collected at anthesis stage [22]. Germinated pollen grains (GPGs) were obtained as described [22,36]. The purity of the isolated pollen was determined by examination on light microscopy (Carl Zeiss) after 4',6-diaminophenylindole (DAPI) (Molecular Probes) staining, and viability of mature pollen was assessed by fluorescein 3',6'-diacetate (FDA) staining [72]. Germination ratio of mature pollen grains was evaluated on microscopy after culture in germination medium in vitro (Carl Zeiss).

RNA extraction
Total RNA was extracted from sporophytic tissues and isolated pollen at individual development stage by use of RNAplant reagents (Tiangen Biotech) and purified by use of the RNeasy Plant Kit (Qiagen) according to the manufacturer's instruction. The yield and purity of RNA were determined spectrophotometrically (Beckman Coulter DU640).

Affymetrix GeneChip hybridization and data analysis
For Affymetrix GeneChip analysis, 8 μg of total RNA was used for making biotin-labeled cRNA targets. All the procedures for cDNA and cRNA synthesis, cRNA fragmentation, hybridization, washing and staining, and scanning were conducted according to the GeneChip Standard Protocol (Eukaryotic Target Preparation, Affymetrix). The poly-A RNA Control and One-Cycle cDNA Synthesis kits were used in this experiment as described at the website http://www.affymetrix.com/products/arrays/ specific/rice.affx. Information about the GeneChip ® Rice Genome Array (MAS 5.0) could be accessed from the Affymetrix website http://www.affymetrix.com/products/arrays/specific/rice.affx. GCOS software (Affymetrix GeneChip Operating Software) was used for data collection and normalization. Overall intensities of all probe sets of each array were scaled to 500 to guarantee that hybridization intensity of all arrays was equivalent, and each probe set was assigned a "P" (present), "A" (absent) or "M" (missing) value and p-value from algorithms in GCOS. Correlation coefficient values were calculated for replicate experiments of different tissues, and the correlation of means for different tissues from replicated experiments was calculated. The percentage of probe sets that were present ("P") in each array was listed and compared. To identify genes expressed preferentially in rice pollen, the Z-score transformation normalization method was used to compare expression levels of genes from pollen at individual stages and sporophytic tissues and to directly calculate significant changes in gene expression between different samples. Z-scores were calculated by dividing the difference between the pollen (X i , median of triplicates) and the sporophytic tissue (μ, mean) with the standard deviation (SD) of all of the other tissues by the following equation: During Z-score calculation, we made a slight change by not including the value of specific stage of pollen for measuring μ and SD.

Zi
Xi SD = −m To identify probe sets expressed preferentially in pollen as compared with sporophytic tissues, we measured the relative ratio between pollen and other samples using the following equation: where P, S, R and L represent the expression level of a given gene in pollen, callus cells, roots and leaves, respectively.
For identification of probe sets expressed preferentially in each stage of pollen development, all three distinct sporophytic tissues were used as controls, and the relative ratio was measured by use of the following equation: where Px, S, R, L and P represent expression levels of genes in pollen at each stage, in callus cells, roots, leaves and a given pollen sample, respectively.

Bioinformatics analyses
The molecular function and cellular component term "enrichment status" and "hierarchy" of pollen-enriched genes were analyzed by use of EasyGo software (a web server, http://bioinformatics.cau.edu.cn/easygo/) [73].
Expression pattern analysis was performed with the mean values of replicates with use of GeneCluster 2.0 http://www.broad.mit.edu/cancer/software/ genecluster2/gc2.html, which allows for visualizing the profile of each cluster. After being normalized to mean 0 and variance 1, clusters were created with default parameters, except for cluster range 3-7. Different cluster ranges were compared, and the range of 5 was selected because the distribution of functional categories between clusters possessed the most significant difference (χ2 20 DF, 56.00, p ≤ 0.01).

Real-time quantitative RT-PCR
Total RNA was prepared as described above from each of three independent biological samples for each material. Real-time quantitative RT-PCR was performed as described [27]. Briefly, 2.5 μg of total RNA was used for first-strand cDNA synthesis with use of ReverTra Ace (TOYOBO). The cDNA samples were diluted to 2.5 ng/ μl. Triplicate quantitative assays were performed with use of the Stratagene Mx3000P system (Applied Biosystems) with 4 μL of each cDNA dilution and the Power SYBR Green Master mix (Applied Biosystems) according to the manufacturer's protocol. Gene-specific primers were designed by use of PRIMEREXRESS software (Applied Biosystems). The relative quantification method (delta-delta threshold cycle) was used to evaluate quantitative variation between the three independent replicates. Amplification of 18S rRNA was used as an internal control to normalize all data.

Cis-elements analysis
The Element software http://element.cgrb.oregonstate.edu/element_about.html was used for cis-element search in the promoter regions of stage-enriched genes. In the database, the promoter sequences for rice genes were mainly from IRGSP Rice (Oryza sativa ssp. japonica). We selected a 2,000-bp promoter region in each gene to search for the possible cis-element using the TIGR locus ID. The background model statistics were derived from the frequencies of all possible 3-8 mer words in the upstream sequences of 34,967 non-transposable-element-related japonica rice genes (based on TIGR version 5) in Affymetrix rice microarrays.

Additional material
Authors' contributions TW and YX designed the experiments and wrote the paper. LQW performed most of the experiments (pollen isolation and purification, RNA extraction, qRT-PCR, data analysis) and wrote the draft of the paper. WYX was responsible for the Affymetrix GeneChip hybridization and most of the data analysis. ZS analysed the data and designed the primers for qRT-PCR. ZYD revised and proofread the paper. All the authors read and approved the final manuscript.