Comparative transcript profiling of gene expression between seedless Ponkan mandarin and its seedy wild type during floral organ development by suppression subtractive hybridization and cDNA microarray

Background Seedlessness is an important agronomic trait for citrus, and male sterility (MS) is one main cause of seedless citrus fruit. However, the molecular mechanism of citrus seedlessness remained not well explored. Results An integrative strategy combining suppression subtractive hybridization (SSH) library with cDNA microarray was employed to study the underlying mechanism of seedlessness of a Ponkan mandarin seedless mutant (Citrus reticulata Blanco). Screening with custom microarray, a total of 279 differentially expressed clones were identified, and 133 unigenes (43 contigs and 90 singletons) were obtained after sequencing. Gene Ontology (GO) distribution based on biological process suggested that the majority of differential genes are involved in metabolic process and respond to stimulus and regulation of biology process; based on molecular function they function as DNA/RNA binding or have catalytic activity and oxidoreductase activity. A gene encoding male sterility-like protein was highly up-regulated in the seedless mutant compared with the wild type, while several transcription factors (TFs) such as AP2/EREBP, MYB, WRKY, NAC and C2C2-GATA zinc-finger domain TFs were down-regulated. Conclusion Our research highlighted some candidate pathways that participated in the citrus male gametophyte development and could be beneficial for seedless citrus breeding in the future.


Background
Seedlessness is a desired fruit trait for consumers, and a fruit is considered to be seedless if it produces no seeds, traces of abortion seeds, or significant reduced-number of seeds [1]. Some plants can set seeds asexually through apomixis. However, in most flowering plants, seed initiation requires signals activated by the double fertilization event that occurs in the embryo sac, and seed is produced sexually from the fertilized ovule [2,3]. Various phytohormones such as gibberellins (GAs), auxins and cytokinins are involved in this signaling process [4][5][6]. GAs and jasmonic acid/jasmonate derivatives (JAs) were found to play crucial roles in plant reproductive development [7,8].
Citrus is one of the most important fruit crops with great economic and health value around the world [9]. However, some citrus varieties are seedy, and seedy fruits have constrained the development of fresh citrus market. Therefore, breeding seedless citrus varieties is a long-term pursuit for citrus breeders worldwide [10,11]. Nowadays, Satsuma mandarin and navel orange are two of the most famous and widely grown citrus varieties, mainly due to their seedless trait. For decades, great progress on seedless citrus breeding was made by traditional approaches such as sexual hybridization, seedling and bud sport mutation. However, due to the peculiarities of citrus reproductive biology such as long juvenile period and nucellar polyembryony, traditional breeding is inefficient and costly [12]. Modern biotechnological approaches (e.g. somatic hybridization) have potential to effectively expedite breeding process of citrus [13][14][15]. As most citrus varieties can produce fruits parthenocarpically [16], male or female sterility, embryo sac abortion, self-incompatibility, polyploidy and even environmental stress can result in seedless citrus fruits [17,18]. Actually there were some successful reports about seedless fruit production by genetic transformation. Ectopic expression of iaaH gene with DefH9 as promoter to elevate auxin levels in placenta or ovules resulted in seedless fruits [19,20]. Another effective strategy was by specific expression of toxin proteins during early development of plant reproductive organs. Typical cases were the ectopic transformation of the Barnase gene from Bacillus amyloliquefaciens [21,22]. Potential cases were by specific expression of enzymes such as chloroplast Chaperonin 21 and ubiquitin extension protein S27a to induce cell disruption of seed tissues for parthenocarpic plants [11,23,24]. And in our laboratory, the Arabidopsis thaliana MAC12.2 gene had been introduced into precocious trifoliate orange (Poncirus trifoliata [L.] Raf) for production of potential seedless fruits [25].
Male sterility (MS) is one of the main causes for seedless fruit production in citrus. In recent years, great progress on MS was made with annual plants especially rice [26,27], Arabidopsis [28] and oil-rape [29], and a serial of genes regulated tapetum, anther and pollen development were identified. However, there remained very limited information on MS of perennial woody plants such as citrus. Ponkan mandarin (Citrus reticulata Blanco) is a widely grown citrus variety in China. Within this variety, many variants were derived through sexual hybridization and mutation such as bud sport mutation. 'Qianyang seedless' Ponkan mandarin (QS) is an elite seedless variant selected from bud sport mutation of a common seedy Ponkan mandarin, and it can set fruits with no seeds (even no seed rudiments) in open orchard [30,31]. In this article, QS and a common seedy Ponkan mandarin 'Egan NO.1' (EG) were used for comparative study. These two mandarins shared highly close genetic relationship based on molecular marker analysis and showed no distinctly morphological differences except that QS was completely male sterile while Egan No 1 has normal flower. In order to gain general understanding on genes involved in this MS mutation, suppression subtractive hybridization (SSH) [32] combining with cDNA microarray was performed to detect differentially expressed genes. Several candidate genes and related pathways were focused in particular. Our research identified some useful genes which could be beneficial to citrus seedless breeding. The results could help to reveal the molecular mechanism of male sterility of Ponkan mandarin and shed light on seedless trait formation of other perennial woody plant at the gene expression level.

Phenotype analysis of the floral organs of QS
Previous studies suggested that the floral organs (actually the whole plant) of QS had no morphological difference from the wild type. To further validate the phenotype of this seedless Ponkan mandarin, we measured the length of filament and pistil, and the average ratio of filament to pistil (filament length/pistil length) was 0.83 ± 0.01 for EG and 0.79 ± 0.01 for QS. And for EG, the pistil was 0.155 ± 0.01 cm longer than filament while for QS, the pistil was 0.166 ± 0.009 cm longer than filament. Above data further confirmed that the floral organs of both EG and QS had no morphological difference, and the seedless trait was not caused by malformation of reproductive organs. However, the number of pollen grains per anther of QS was 9.5% less than that of EG. The pollen dying viability of QS was 6.0% ± 1.0% (or 6.5% ± 1.0% for I 2 -KI 2 staining) in striking contrast to the high viability of 93.8% ± 0.9% (or 89.6% ± 2.5% for I 2 -KI 2 staining) for EG. Pollen germination test found that no pollen of QS could germinate. Furthermore, SEM assays showed abnormal structures of the pollen grains of QS (Figure 1), confirming that QS is male sterile.
Construction of SSH-cDNA libraries and overall feature of the differential transcript profiling To identify genes associated with the MS of QS, SSH cDNA libraries (both forward and reverse) were constructed from floral organs of QS and EG. A total of 6,048 cDNA clones derived from the SSH-cDNA libraries including 4,195 from the forward library and 1,853 from the reverse one were successfully amplified, and then used for a custom cDNA microarray. Each cDNA clone has triplicate spots on the array. The RNA samples of the four developmental stages (SF, MF, BF and OV) were used for array-hybridization. The fluorescent dyelabelled cDNA and hybridization strategy was employed for the microarray assay.
From the 6,048 clones printed on the glass slide, 279 cDNA clones (278 non-redundant) were differentially expressed (false discovery rate (FDR) <0.05 and a fold change ≥ 2) between QS and EG. Among these cDNA clones, 218 (78%) were down-regulated while only 61 (22%) showed up-regulated expression across the four developmental stages; and the differentially expressed clones peaked at full bloom stage (BF) (Figure 2). At this stage, many more clones showed down-regulated than upregulated expression. During the four developmental stages, one clone (GenBank accession no. JU497336) encoding a putative cysteine protease (tr[B9RRA4]) showed down-regulated expression at BF stage but up-regulated at OV stage (young ovaries of 2-3 days after flowering).

Sequencing of the differentially expressed clones and EST analysis
Among the 279 differentially expressed clones, 255 nonredundant clones were subjected to one single-pass sequencing. In all, 237 high-quality ESTs (average length was 496 bp) were yielded after eliminating vectors and unreliable sequences. These ESTs were assembled using CAP3 program, and 133 unigenes (43 contigs and 90 singletons) were obtained with sequence redundancy of 43.9%. The majority of the contigs (38) contained 2-5 ESTs, whereas only 5 contigs contained 6-11 ESTs, indicating an ideal normalization and subtraction. Of the 133 unigenes, 80 (60.1%) showed differential expression at BF stage. Subsequently, BLASTX search of the UniProt database showed that 20 unigens (15.0%) did not have significant hits (E-value ≤1.0 × e -5 ). However, when the 20 unigenes were used in BLASTN (E-value ≤1.0 × e -10 ) search of the Citrus clementina transcript database [33,34] with local Blast software (ftp://ftp.ncbi.nlm.nih.gov/blast/ executables/release/LATEST/), 17 genes had significant hits and high scoring pairs (HSP) showed high nucleotide identity. It suggested that these 20 unigenes were unique for citrus, and three of them were novel citrus genes.
Based on the microarray analysis, the relative expression profiles of all 255 ESTs were performed hierarchical clustering with cluster software (version 3.0). Four typical relative expression patterns were observed in QS versus EG at four developmental stages. Figure 3A and 3B showed a group of clones down-regulated mainly at squaring stage (SF) and full bloom stage (BF), respectively, while the other two groups of clones were down/ up-regulated constitutively during the developmental stages ( Figure 3C and 3D). In addition, candidate genes with putative function that could be important for the MS of QS were specifically collected (Table 1). It is noteworthy that 27.7% of the unigenes (not listed in the table) were only annotated as putative proteins or with no defined biological process besides 15% unigenes with no hits in the database.
GO annotations were conducted and three categories representing molecular functions, biological processes, and cellular components were assigned. Figure 4 showed the percentage distributions of GO terms (2nd level GO terms) based on biological process. It indicated that during the floral organ development, the majority of differentially expressed genes were involved in metabolic process (46%) or responded to stimulus (27%) and regulation of biological process (18%). In addition, the other two GO categories (molecular functions and cellular components) were also generated (data not shown). In the molecular function category, large proportion of unigenes may have binding activity (59%), catalytic activity (19%), or oxidoreductase activity (11%), while the cellular components consisted mainly of intracellular (57%) and membrane (23%).

Metabolic pathways involved in formation of seedless fruit
As large proportion of altered expressed genes were involved in varieties of metabolic processes. Based on the KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis, 36 different metabolic pathways were altered during the four developmental stages. Among these pathways, nine (25%) were related to amino acid metabolic pathway (Table 2), and genes involved in carbohydrate and energy metabolism showed down-regulated expression during subsequent developmental stages of floral organs. Besides, genes related to specific secondary metabolism such as terpenoids and polyketides metabolism were also found to be altered. Interestingly, a gene (JU497309) encoding fatty acyl-CoA reductase, which may be involved in lipid metabolic process, was identified (Table 1). This gene was found highly homologous with putative male sterile protein (GI: 255576327) in castor bean, fatty acyl-CoA reductase 3 (GI: 359500474) in poplar and male sterile 2-like protein (MS2) (GI: 3549681) in Arabidopsis. Herein, this gene was named as male sterile-like protein. And qRT-PCR analysis showed its expression level increased from SF to BF stages and then declined at OV stage. The expression pattern was similar in both QS and EG; however, it showed obviously higher expression level in QS than in EG during the developmental process ( Figure 5).

Differential expression of transcription factor genes
It is noteworthy that among the 133 unigenes, 12 were assigned to the category of transcription factor (TF) based on plant TF database (http://planttfdb.cbi.edu.cn/). Figure 6 showed the specific expression pattern of six AP2-ERF family TFs, two zinc-finger TFs, one MYB TF and one NAC TF using qRT-PCR assay. These TFs (except of NAC TF) had similar expression profile during the four developmental stages between EG and QS. For instance, among six AP2-ERF TFs, four (AP2-EREBP TF1, AP2-EREBP TF3, AP2/ERF domain containing TF2 and CBF/DREB-like TF) showed co-expression pattern like "V" type. It showed that the gene expression level in QS was higher than that in EG from SF stage to MF stage; however, these genes were subsequently repressed more obviously in QS from MF stage to BF stage, and the gene expression level was down-regulated mostly at BF stage. Two zinc-finger TFs (GATA TF8 and Cys2-His2 type) and one R2R3-MYB TF likewise showed similar "V" typevariation tendency. The other two AP2-ERF TFs (AP2-EREBP TF2 and AP2/ERF domain containing TF1) showed "V"-like type expression pattern in QS. However, the expression pattern of AP2/ERF domain containing TF1 was somehow different from others, as it showed relatively stabilized expression level during the four stages in EG. As for NAC TF, its expression level was downregulated obviously at BF and OV stages in QS compare with EG. It was notable that no expression was observed at OV stage in QS. The results suggested that these TFs could play important roles in the seedless phenotype formation, and the relative expression level in QS versus EG seemed to be key factor in this process.

Verification of microarray data
Two approaches were used to examine the quality of the microarray data. First, as one contig was assembled by several ESTs that were arrayed at random location in the microarray, so these ESTs sharing similar sequence or encoding the same gene would share similar expression pattern. Additional file 1: Figure S1 showed that four ESTs (F2-13 G, F6-15I, F7-18O, F8-12A) were assembled into one unigene (JU497321) which encoded methionine synthase, and these four ESTs truly shared similar expression pattern. For the other approach, qRT-PCR was performed on 11 unigenes using gene-specific primer pairs. Expression patterns were compared at the four developmental stages between QS and EG. Additional file 2: Figure S2 showed the correlation analysis of the ratio values of differential expression level from microarray to that from qRT-PCR. Linear regression [(average microarray ratio value) = a (qRT-PCR value) + b] analysis showed a good coefficient of variation (R 2 = 0.847). These results confirmed the reliability of the microarray data.

Discussion
Here, we combined SSH and microarray techniques to investigate potential mechanism underlying seedlessness in Ponkan mandarin. SSH was proved to be an efficient and popular approach to enrich and identify differentially expressed genes between wild-type and its mutant or treatment [35,36]. However, because of high sensitivity of SSH, usually a large number of clones could be obtained but inevitably included some false-positive ones. Screening the SSH libraries to identify some candidate genes using microarray and to validate using qRT-PCR has proved to be a high-throughput and efficient way [37][38][39]. However, relatively few clones were isolated in this study. Of the 6,000 clones, only 279 cDNA clones (See figure on previous page.) Figure 3 Cluster analysis of expression profiles of altered expressed genes in the QS versus EG. A showed a cluster of ESTs that were down-regulated mainly at squaring stage (SF) when the tetrads were produced and the microsporocyte underwent meiosis. B showed ESTs that were down-regulated especially at full bloom stage (BF). C and D suggested a cluster of ESTs that were down-regulated and up-regulated constitutively during the four developmental stages respectively. The ratio value was log2 transformed for each gene and used for the hierarchical clustering analysis. were identified as differentially expressed. Such results may suggest that there were little variations between QS and EG mandarins in gene expression. It was hypothesized that bud sport mutant was likely caused by single gene mutation, DNA methylation or retroelement activity [40,41]. In this research, various types of DNA markers including SCAR [42], and SSR (172 pairs of primers), MSAP (96 pairs of primers) and AFLP (13 Figure 4 Distribution of the unigenes according to the biological process (2nd level GO terms). A total of 86 unigenes were annotated. As one gene product could be assigned to more than one GO term, the percentages will add up to more than 100%. pairs of primers) were employed to analyze the polymorphism between these two mandarins, and no repeatable polymorphic bands were detected (data no shown). These results suggested that very few nuclear genes were altered during the developmental stages.
For the four developmental stages we chose, immense efforts were taken to determine which time-point was pivotal for stamen development, but there has no criteria for citrus gametophyte development. Though criteria for gametophyte development was available in model plant Arabidopsis [43], it can not be directly applied herein. Semi-thin and paraffin sections were performed in this study to survey the microsporogenesis of QS, and it was found that abnormal tetrads produced at the tetrad stage and subsequently the microsporocyte underwent abnormal meiosis. This process mainly occurred at SF stage (the diameter of floral organs is about 3 mm) (unpublished data). Additionally, large proportion (about 59.7%) of differentially expressed genes was found in BF when the anthers and pollen grains were almost mature, indicating that this time-point might be also important.

Amino acid metabolic process
Of the metabolic pathways with altered expressed genes, 25% were involved in amino acid metabolism. Amino acids were not only primary metabolic products for normal growth and development but also cell signaling molecules and regulators of gene expression and protein phosphorylation cascade [44]. Interestingly, among these amino acid metabolism pathways, two genes were downregulated across the developmental stages in QS versus EG, one (JU497356) encoding glutamate-ammonialigase (EC 6.3.1.2), the other (JU497338) encoding betaglucosidase (EC 3.2.1.21). In higher plants, glutamateammonialigase catalyzes ATP-dependent conversion of glutamate and ammonia into glutamine which occupies a central position of amino acid metabolic pathway [45], and this metabolic process is critical for coordinating metabolic balance in rice [46]. And beta-glucosidase could be used for the cellulosic ethanol industry [47] and has diversity of functions in plants. In maize, Zm-p60.1 encoding a beta-glucosidase could release active cytokinin, and might function in vivo to supply the developing maize embryo [48]. Additionally, some betaglucosidases affect the properties of cell wall [49] and are associated with freezing tolerance, such as the SFR2 in Arabidopsis [50]. Some beta-glucosidases are related to the efficiency of microspore embryogenesis [51]. It is noteworthy that a gene (JU497374) encoding asparagine synthase (EC 6.3.5.4) was down-regulated exclusively at SF (early stage of stamen development). And asparagine is one central intermediate in nitrogen assimilation and transportation in plant [52,53]. Recent studies showed that this gene played important role in defense against pathogens and salt stress [54,55]. Additionally, genes related to carbohydrate metabolism and energy metabolism also showed down-regulated expression in QS mainly at BF and OV (late stage of stamen development). These results suggested that the vital activities of QS weakened during early development stages of stamen, and the metabolic process of nutrition and energy was also impaired at subsequent stages of stamen development especially when the stamen was mature. Two genes involved in cysteine/methionine metabolism and participated in the biosynthesis of ethylene were also identified in this study. One (JU497321) encodes 5-methyltetrahydropteroyltriglutamate-homocysteine Smethyltransferase (EC 2.1.1.14) is likely involved in the biosynthesis of L-methionine. And the methionine can be transformed into S-adenosylmethionine (SAM) (the precursor of ethylene) [56]. The other one (JU497373) encodes aminocyclopropane carboxylate oxidase (EC 1.14.17.4) and is a pivotal enzyme during the biosynthesis of ethylene. In addition, genes involved in the synthesis of IAA (indole-3-acetic acid) were also identified such as a gene (JU497377) encoding Indole-3-acetatebeta-glucosyltransferase (EC 2.4.1.121). These results implied that the endogenous phytohormones might be involved in the male gametophyte development of citrus.

Transcription factors
It was known that floral organ formation and function were influenced by TFs regulation. In our research, twelve unigenes were assigned to the category of transcription factor, and six of them were identified as AP2-ERF family members. AP2-ERF TF containing highly conserved AP2/ERF DNA-binding domain, is a large family unique in plant. In our research, four AP2-ERF members showed similar expression pattern. AP2-EREBP TF1 was closely homologous with atERF107 (AT1G19210). This gene was likely involved in the regulation of gene expression by stress factors and by components of stress signal transduction pathways. However, until now, no experimental evidence was available. AP2-EREBP TF3 showed high similarity with ERF5 (AT5G47230.1). ERF5 might play an important role in plant innate immunity likely through coordinating chitin and other defense pathways [57]. Other research suggested that ERF5 and ERF6 might potentially overlap in their function and acted as positive regulators of JA/ethylene-mediated defense [58]. In tomato, this gene was mainly involved in responses to drought and salt stresses [59]. As for AP2/ERF domain containing TF2, its closest relative was ERF104 (AT5G61600.1). Recent studies showed that ERF104 was in vivo substrate of MPK6, and ethylene could release ERF104 and allow liberated ERF104 to access target genes related to plant defense [60]. CBF/DREB-like TF was of high similarity with CBF4 (AT5G51990.1) which was critical regulator involved in cold acclimation and drought adaptation [61,62].
In addition, AP2-EREBP TF2 was highly homologous with RAP2.4 (AT1G78080.1). RAP2.4 acted at or downstream of a converging point of light and ethylene signaling pathways, and it coordinately regulated multiple developmental processes and stress responses [63]. As for AP2-ERF domain containing TF1, its expression pattern was different from other five members. It showed high similarity with DREB26 (AT1G21910.1). In plant, RAP2.6, RAP2.6 L, DREB26 and DREB19 exhibited tissue specific expression and participated developmental processes as well as biotic and/or abiotic stress signaling [64]. Though previous researches emphasized the functions of these AP2-ERF TFs on resistance against biotic and abiotic stresses, AP2-ERF TFs were also participated in plant development such as embryo patterning [65], and stamen emergence [66].
Additionally, two MYB (R2R3-MYB) transcription factors also showed differential expression between QS and EG. In plant, MYB TF family was categorized into 3 subfamilies according to the number of adjacent repeats of MYB-domain. Of them, R2R3-MYB subfamily contains the largest number of members. Like the AP2-ERF TF family proteins, MYB family proteins also function in various plant-specific processes. In Arabidopsis, MYB TFs were found as key regulators involved in development, metabolism and biotic and abiotic stress responses. Among these MYB TFs of Arabidopsis, AtMYB26 is involved in determining endothecial cell development within the anther and is essential for anther dehiscence [67]. AtMYB33 and AtMYB65 redundantly facilitate anther and pollen development [68]. AtMYB80 regulates exine formation and acts downstream of AtMYB35; and AtMYB103 is required for tapetal development and microsporogenesis, especially for callose dissolution and exine formation [69,70]. AtMYB125 positively control male germ cell division and commit progenitor germ cells to sperm cell differentiation [71,72]. In rice, CSA gene encoding MYB TF functions as a key transcriptional regulator for sugar partitioning during male reproductive development, and the CSA mutant showed reduced levels of sugars and starch in floral organs which lead to MS.
Interestingly, in our results, one MYB TF showed similar expression pattern with AP2-ERF TFs that downregulated at BF stage when the anther and pollen grains are mature. This MYB TF termed as R2R3-MYB TF was closely related to ATMYBR1/ATMYB44 (AT5G67300.1), and AtMYB44 was likely to enhance drought and salt stress tolerance by suppressing the expression of genes encoding PP2Cs, which was described as negative regulators of ABA signaling [73]. Previous report showed that AtMYB44 was with changed expression during late embryogenesis and seed maturation [74]. And notably there was a NAC domain protein (JU497421) highly homologous with ANAC102 (AT5G63790.1). ANAC102 was an important regulator of seed germination and activated a seed-specific subset of genes under low-oxygen stress; it was also necessary for the viability of Arabidopsis seeds following low-oxygen treatment [75].
In summary, these results suggested that these AP2-ERF TFs and the MYB TF functioned redundantly and coordinated with other TFs which involved in the complex network regulating floral organ development. Further research should emphasize on the isolation of proteins interacted with these TFs.

Conclusion
An integrative approach combining SSH and microarray was employed to explore the transcriptional changes of a seedless bud sport mutant of Ponkan mandarin. A number of differentially expressed genes were identified. And the majority of genes were down-regulated in the mutant, especially those related to basic metabolic process. Metabolism of nutrition and energy might be impaired during male gametophyte development of the mutant, and TFs and phytohormones might play important regulatory roles during this process. Our research gained general information of citrus MS at transcription level and could provide some clues for further exploration of MS in citrus species.

Accession numbers of sequences and microarray data
All the sequences generated in the study were deposited in GenBank with accession numbers from JU497308 to JU497435. Five sequences which are shorter than 200 bp longer than 100 bp are attached in Additional file 3.
(See figure on previous page.) Figure 6 Relative expression pattern of six AP2-ERF family TFs, two zinc-finger TFs, one MYB family TF and one NAC domain TF. The accession number of each TF was given inside the parenthesis. Relative expression was defined as the expression level in QS versus EG. Columns and bars represent the means and standard errors (n = 3) respectively.
Microarray data and experimental information from this study were deposited in the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE38094.

Plant materials and phenotype analyses
Two Ponkan mandarin (Citrus reticulata Blanco) cultivars, Qianyang seedless (QS, mutant type) and Egan NO.1 (EG, a common seedy Ponkan mandarin, wild type) were grown in the same orchard of 'Fenghuangshan' citrus production area in the city of Dangyang, Hubei province, China. These two scion cultivars were seven years old when sampling in 2010, with trifoliate orange (Poncirus trifoliata L. Raf.) as the rootstock. Flower samples were collected from both cultivars in parallel including 4 continuous phonologically developmental stages ( Figure 7C): squaring stage (SF, about 20 DBF), medium bud stage (MF, about 10 DBF), flowers at full bloom stage (BF) and young ovaries of 2-3 days after flowering (OV). All the flowers were bagged to prevent cross-pollination, and when sampled in the field, all the samples were frozen in liquid nitrogen as quickly as possible and then stored at −80°C until needed.
The morphology of mature anthers were investigated with fluorescence stereo-microscope ( Figure 7A; 7B) (Leica MZ FLIII, German) and image was captured with a digital camera (Nikon Coolpix, Japan). The pollen grain number per anther was counted. In brief, anthers from mature flowers were collected and mixed randomly, each time 40 anthers were dissected and pollen grains were suspended in 25 mL sterile water with 4-5 drops of surfactant (Tween-20, Amresco solon, OH). The viability of mature pollen grains were evaluated by dying with 1% acetic acid magenta as well as 1% iodine potassium iodide (I 2 -KI 2 ) solution. After staining for 5 min, pollen grains were observed using BX-61 fluorescence microscope (Olympus, Japan) and Images were captured with DP70 CCD digital camera system. At least 1,000 pollen grains were counted. These experiments were repeated three times. The morphology of pollen grains was examined by scanning electron microscope (SEM) (NTC JSM-6390LV, Japan). For SEM, anthers at various developmental stages were pre-fixed with 2.5% glutaraldehyde in 0.1 M sodium phosphate buffer (pH 7.2) for 24 h, dehydrated twice using a gradient ethanol serial (30%-50%-70%-85%-95%-100%), then replaced ethanol with isopentyl acetate for 20 min. After that, samples were dried with critical-point drying method then sputtered coating with gold. Representative images were captured.

RNA extraction and mRNA isolation
The materials (floral organs) for RNA extraction were sampled from at least six independent plants, and mixed randomly. Total RNA from flower samples at four stages (SF, MF, BF and OV) were extracted with modified Trizol method according to [76]. The RNA pellets were washed with 75% (V/V) ethanol twice, dissolved in RNase-free water and stored at −80°C until use. By mixing equal amount of RNA of the four stages, RNA pools from both QS and EG were established in parallel. Then mRNA was isolated from each of the RNA pools using the Oligotex mRNA mini kit (Qiagen, Germany). The quality of RNA was determined by Nanodrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and 1.2% agarose gel electrophoresis.
Suppression subtractive hybridization (SSH) cDNA libraries construction and cDNA inserts amplification Two micrograms of mRNA was used to synthesize cDNA for suppression subtractive hybridization (SSH). The SSH was performed with the PCR-select TM cDNA subtraction kit (Clontech, Palo Alto, CA, USA) according to the user manual. And both forward (the seedless cultivar QS as tester and the seedy cultivar EG as driver) and reverse (EG as tester while QS as driver) SSH were conducted. For cDNA libraries construction, two hybridizations were performed followed by two rounds of PCR amplifications to enrich the desired differentially expressed sequences. Then the second PCR-amplified cDNAs were purified and ligated into the T/A cloning vector pMD18-T (Takara, Japan) overnight at 4°C. Then the ligated products were transformed into Electro MAX TM DH5α-E TM cells (Invitrogen, USA) and incubated at 37°C, 160 r/m for 1 h, then cultured on SOB-MgCl 2 solid media with ampicillin (60 μg ml -1 ) to generate the primary cDNA libraries. The transformed white bacteria were randomly picked and grown on 384-well plates containing Luria Broth (LB) liquid media with ampicillin (100 μg ml -1 ) at 37°C overnight (about 16 h). Glycerol (Amresco, USA) (4.4% final) was added for storage at −80°C.
A total of 8,000 cDNA clones were randomly picked from forward and reverse SSH libraries and used as for subsequent PCR templates. Each PCR was performed in a 100 μl reaction mixture using nested primers of SSH according to [77]. The PCR products were precipitated with equal amount of isopropyl alcohol and washed with 75% (V/V) ethanol, then re-suspended in 40 μl sterile water. The yield and quality of the PCR products were determined by Nanodrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA), and then run on 1.2% agarose gel and examined by Bio-Rad UV spectroscopy (Bio-Rad Laboratories, Washington, DC, USA) to confirm single clone (Additional file 4: Figure S3). Finally the validated PCR products were stored at −80°C for custom microarray.
Microarray slides fabrication and preparation of fluorescent dye-labelled cDNA About 40 microlitre of PCR products were re-precipitated by adding 100 μl of anhydrous ethanol and were dissolved in EasyArray TM spotting solution (CapitalBio Corp, China) at a final concentration of 0.1-0.5 μg μl -1 and then printed on amino-silaned glass slides with a SmartArrayer TM microarrayer (CapitalBio Corp). Each clone was printed triplicate. The particular procedures for microarray fabrication were conducted according to [37].
The relative gene expression profiles of QS at four developmental stages (SF, MF, BF and OV) compared with the corresponding four stages of EG were investigated by microarray analysis. For each stage, three sets of total RNA samples were extracted independently, and then RNA pool was constructed by mixing aliquot of RNA from the three sets of RNA samples. An aliquot of 5 μg total RNA from the RNA pool was used to produce Cy5/Cy3-labelled cDNA employing an RNA amplification combined with Klenow enzyme labeling strategy according to the protocol by [78]. Cy5/Cy3-labelled cDNA was hybridized with the microarray at 42°C overnight. Hybridization was performed in duplicate by dye swap (Cy5-labelled cDNA of QS versus Cy3-labelled cDNA of EG, and Cy5-labelled cDNA of EG versus Cy3labelled cDNA of QS). And then the arrays were washed with 0.2% SDS, 2 × SSC at 42°C for 5 min, and 0.2% SSC for 5 min at room temperature.

Microarray data analysis and EST sequence analysis
Arrays were scanned with a confocal laser scanner, LuxScan TM -scanner (CapitalBio Corp.) and the resulting images were analyzed with LuxScan TM 3.0 software (CapitalBio Corp.). cDNA spots were screened and identified with the methods described by [77]. A spatial and intensity-dependent (LOWESS) normalization method was employed and normalized ratio data were then log transformed [79]. Differentially expressed genes were identified using a t-test, and multiple test corrections were performed using FDR. Genes with FDR <0.05 and a fold change ≥2 were identified as differentially expressed genes.
All the clones differentially expressed in at least one of the four stages were subjected to single-pass sequence using standard high throughput sequencing by BGI-Wuhan, China. All sequences were edited to omit vectors and low quality segments at 5' and 3' ends, then removal of sequences shorter than 100 bp with SeqClean software. Sequence reads were assembled by CAP3 program [80] with default parameters. Then all the unigenes were annotated using BLASTx with a cut-off value of 1.0 × e -5 by searching the UniProt database (http://www. ebi.ac.uk/uniprot/). GO-KEGG-EC annotation was performed based on Annot8r platform [81]. Hierarchical clustering of transcript accumulation was performed with Cluster software (version 3.0) [82].

Quantitative real-time PCR verification and candidate TFs analysis
Total RNA was extracted from QS and EG collected at four different developmental stages with the Trizol methods mentioned above. Primer pairs were designed with the Primer Express software (Applied Biosystems, Foster City, CA, USA). Primer sequences of 11 candidate genes for verification were provided in Additional file 5: Table S1, and primer sequences of 10 TFs were provided in Additional file 6: Table S2. Single strand cDNA was synthesized with the prescription of the Revert Aid TM first strand cDNA synthesis Kit (Fermentas, Life Science, EU). Then each cDNA sample was pre-amplified using the citrus house-keeping gene β-actin and normalized for subsequent real-time quantitative PCR (qRT-PCR). The PCR program differed in terms of the annealing temperature of each primer pair and the length of the predicted PCR products. The qRT-PCR was performed using the ABI 7500 Real Time System (PE Applied Biosystems, Foster City, CA, USA) with the method as described by [83]. And relative transcript change was analyzed by 2 -ΔΔc(t) .