Isolation and expression analysis of cDNAs that are associated with alternate bearing in Olea europaea L. cv. Ayvalık

Background Olive cDNA libraries to isolate candidate genes that can help enlightening the molecular mechanism of periodicity and / or fruit production were constructed and analyzed. For this purpose, cDNA libraries from the leaves of trees in “on year” and in “off year” in July (when fruits start to appear) and in November (harvest time) were constructed. Randomly selected 100 positive clones from each library were analyzed with respect to sequence and size. A fruit-flesh cDNA library was also constructed and characterized to confirm the reliability of each library’s temporal and spatial properties. Results Quantitative real-time RT-PCR (qRT-PCR) analyses of the cDNA libraries confirmed cDNA molecules that are associated with different developmental stages (e. g. “on year” leaves in July, “off year” leaves in July, leaves in November) and fruits. Hence, a number of candidate cDNAs associated with “on year” and “off year” were isolated. Comparison of the detected cDNAs to the current EST database of GenBank along with other non - redundant databases of NCBI revealed homologs of previously described genes along with several unknown cDNAs. Of around 500 screened cDNAs, 48 cDNA elements were obtained after eliminating ribosomal RNA sequences. These independent transcripts were analyzed using BLAST searches (cutoff E-value of 1.0E-5) against the KEGG and GenBank nucleotide databases and 37 putative transcripts corresponding to known gene functions were annotated with gene names and Gene Ontology (GO) terms. Transcripts in the biological process were found to be related with metabolic process (27%), cellular process (23%), response to stimulus (17%), localization process (8.5%), multicellular organismal process (6.25%), developmental process (6.25%) and reproduction (4.2%). Conclusions A putative P450 monooxigenase expressed fivefold more in the “on year” than that of “off year” leaves in July. Two putative dehydrins expressed significantly more in “on year” leaves than that of “off year” leaves in November. Homologs of UDP – glucose epimerase, acyl - CoA binding protein, triose phosphate isomerase and a putative nuclear core anchor protein were significant in fruits only, while a homolog of an embryo binding protein / small GTPase regulator was detected in “on year” leaves only. One of the two unknown cDNAs was specific to leaves in July while the other was detected in all of the libraries except fruits. KEGG pathway analyses for the obtained sequences correlated with essential metabolisms such as galactose metabolism, amino sugar and nucleotide sugar metabolisms and photosynthesis. Detailed analysis of the results presents candidate cDNAs that can be used to dissect further the genetic basis of fruit production and / or alternate bearing which causes significant economical loss for olive growers.


Background
Olive (Olea europaea L.) has long been among important topics of agricultural research due to its well -known nutritional and health value. Therefore, numerous studies on physiological [1][2][3][4] phytochemical [5][6][7][8][9], molecular systematic [10][11][12][13] and molecular genetics / genomics [14][15][16][17] aspects of olive have been reported. Further genetic studies involving molecular mechanism of fruit set, fruit development, fruit detachment and alternate bearing in olive have not been widely reported though there are reports on various aspects of alternate bearing such as endogenous and environmental factors [4,18,19]. While the idea of generating a genetically modified olive tree has not been welcome, it is possible to explore molecular mechanisms of the common problems of olive through molecular genetics approaches. Identifying transcription factors specific for important genes of fruit senescence only, for instance, can lead to subsequent steps of controlling these molecules for potentially getting a relatively more uniform harvest. One of the first steps to achieve such long term aims is constructing and characterizing cDNA libraries to identify genes that are specific to certain tissues and / or developmental stages. Furthermore, since the olive genome has not yet been sequenced, olive molecular genetics studies mostly depend on cDNA libraries to identify novel genes, or genes associated with certain processes such as fruit development and senescence.
Although numerous cDNA libraries of various plant tissues and organs under very specific conditions such as phosphorus stressed roots [20], root hairs [21], glucose stressed root tips [22], nodules [23], ripe fruit detachment tissues [24] and leaves [25,26] are available, olive cDNA libraries have largely been restricted to fruit libraries [10,16,17,27,28]. Hence, reports such as comparison of olive genes expressed in leaves of "on year" and that of "off year", or comparison of olive genes expressed in fruited leaves and non-fruited leaves are rare.
We have recently reported micro RNAs [29] and global transcripts [30] associated with alternate bearing in olive. In this study, olive leaves of trees in "on year" and in "off year" in July (when fruits first appear) and in November (harvest time) were harvested into liquid nitrogen and used to construct cDNA libraries to identify cDNAs specific for each time and condition. Additionally, a fruit cDNA library was also constructed to further confirm the specificity of the cDNAs obtained from each library. Analyses of the results revealed cDNAs specific for each library, and hence, a number of candidate cDNAs associated with alternate bearing were identified. Additionally, bioinformatics tools such as detailed BLAST searches, GenOnthology and KEGG analyzes on the obtained sequences were applied to extract further information about these cDNA molecules.

Selection of the reference gene
Using the appropriate reference gene in quantitative realtime PCR (qRT-PCR) to normalize the initial total RNA template amounts is one of the most important factors affecting the reliability of the qRT-PCR results. That is why commonly used reference genes should be tested first in the organism in use to accurately pick the one that has no variation based on changes because of diverse factors [31]. Since reference genes in olive have not widely been reported for qRT-PCR, expression levels of seven commonly used reference genes (see materials and methods for gene names) in various plants were determined via qRT-PCR ( Figure 1) and GAPDH was decided to be an appropriate reference gene to use with olive.
Brief overview of the cDNAs obtained from the libraries GenBank homologous records for each insert sequence obtained through BLASTn search [32] at NCBI, revealed 11% -16% protein coding gene homologs and 3% unknown cDNAs while the remaining 84% to 89% constituted non-coding RNA molecules including rRNAs and tRNAs ( Figure 2). All the cDNA sequences (except JNF1, JNF32 and JNF87) had similarity to cDNA records of other plants previously registered in the GenBank databases (Tables 1,2 Figure 1 qRT-PCR amplification of the reference genes evaluated. Upper panel shows the Ct values of the all 7 reference genes together. The lower panel is simplified from upper panel to display how well GAPDH and beta-actin confirm each other and hence are proper reference genes for olive tissues studied. The tissues used were the same ones used to construct the cDNA libraries except pedicels were collected from Uslu cultivar (UP) and Kiraz cultivar (KP). had no similarity for any records in any available database. In each library, more than half of the cDNAs were new for olive genome ( Figure 2).

cDNA contents and qRT-PCR validation of the individual libraries
In July, "on year" leaves (JF), 4% of the protein coding gene homologs was detected as P450 monooxygenases (three and one, gi85068677 and gi15425796, respectively) while homologs of menthone:neomenthol reductase 1, a transcription factor (gi113367217) and a protease (gbAF043539) were represented by 2% (Table 1). The remaining five cDNAs were represented with 1% (Table 1). qRT-PCR confirmed the abundance of P450 monooxygenases mentioned in JF library, and revealed embryo defective binding / small GTPase regulator (gi78498847) to be specific to "on year" leaves in July ( Figure 3).
In July "off year" leaves (JNF), the predominating cDNAs were (JNF31, JNF48, JNF84) again homologs of P450 monooxygenase (gi85068677) that was also confirmed by qRT-PCR ( Figure 3). The second most abundant cDNAs were homologs of ethylene responsive protein (gi5669653), wound stress protein (gi51457947), a gene complex of multiple constitutive proteins and intergenic spacers (gi17085617, similar to gi170785601 and also detected once in each of NNF and F) and ribosomal protein L10 (gi21386914). Only in this library (JNF) were detected cDNAs (JNF1, JNF32, and JNF87) that were completely novel to any available nucleotide database. qRT-PCR revealed wound stress protein (gi51457947), one of the abundant cDNAs in JNF library, as a specific cDNA to "off year" leaves in July.
Although the abundance of the cDNAs detected in November, "on year" leaves (NF) were the least (no more than 2%) among all libraries generated (Table 3), NF2 (a putative dehydrin), NF9 (a putative metallothionein) and a cDNA from JNF library (JNF31, a putative cold stress induced protein / dehydrin) appeared to express at very high levels (5, 9 and 11 fold GAPDH respectively). Homologs of cryptochrome 1 (cry1) mRNA, DnaJ heat shock family protein (gi42569238), and a calmodulin binding protein (gi145359627) were detected as 2% (Table 3) but their abundance was not confirmed by qRT-PCR.
A cDNA that is homologous to both ARIADNE (gi145360514, a ubiquitin -protein ligase) and a zinc finger family protein (gi91806300) was the most abundant cDNA of the remaining two libraries (NNF and F). It was detected at a rate of 3% in each library (Table 4,  Table 5). qRT-PCR analysis revealed this cDNA's expression as 6 fold to 11 fold more in F library than in any other library ( Figure 3). The remaining cDNAs in the  NNF library were detected only once ( Table 4). The cold stress-induced protein / dehydrin homolog (NNF31) along with the other putative dehydrin (NF2) and the putative metallothionein (NF9) were the most abundant cDNAs in both of the November libraries (NF and NNF) from the leaves ( Figure 3). After ARIADNE -like protein homologs, the second most abundant (2% each) cDNAs in fruit flesh library ( Table 5) were homologs of UDP-glucose 4-epimerase (gi37781555), acyl-CoA binding protein (gi6002103) and triosephosphate isomerase (gi602589). qRT-PCR revealed all the cDNAs (except F51 which is similar to a PSII binding protein) isolated from fruit flesh library were specific to fruits. A cDNA (F10) that has a weak similarity to predicted nuclear-pore anchor protein (gi844268) was also fruit specific ( Figure 3).

Bioinformatics
Among the olive ESTs analyzed, 35 sequences displayed significant BLASTx matches within the genes registered in the NCBI database. In order to predict the reliability assessment of the ESTs or alignment quality, sequence similarity ( Figure 4a) and E-value distribution graphs ( Figure 4b) were generated from sequences based on the BLASTx results. The species distribution of olive ESTs based on BLASTx hits had the highest sequence homology to Vitis vinifera (~96%), and followed by Populus trichocarpa (~68%), Arabidopsis thaliana (~48%) and Oryza sativa (~47%) (Figures 4c, 4d) (see Additional file 1). The functional annotation and categorization of each olive EST based on Gene Ontology (GO) terms were analyzed using the Blast2GO suite. For each transcript, a set of GO term information including; accession, annotation term and basic definition is shown in table (see Additional file 2). In addition, these transcripts representing genes with known function were categorized by biological process, cellular component and molecular function according to the ontological definitions of the GO terms. The transcripts in the biological process ( Figure 5a) category were related to metabolic process (27%), cellular process (23%), response to stimulus (17%), cellular localization (8.5%), multicellular organismal process (6.25%), developmental process (6.25%), reproduction (4.2 %), multi -organism process (2.1%), biological regulation (2.1%), cell wall organization or biogenesis (2.1%) and cellular component biogenesis (2.1%). In the cellular component category (Figure 5b), most of the GO terms were mainly related to cellular (48.7%) and organelle (35.9%) components such as cell periphery and intracellular organelle parts, followed by macromolecular complex (10.25%), extracellular region (2.57%) and membrane enclosed lumen (2.57%). As for the molecular function (Figure 5c) category, most abundant GO terms were involved in binding (52.8%) such as nucleic acid and transition metal ion binding, catalytic activity (25%), electron carrier activity (8.34%), transporter (5.54%) and structural molecule (5.54%) activity as well as enzyme regulator (2.78%) activity (For all functional categories, the pie charts and sequence distribution tables pertaining to the olive ESTs are presented in the Additional file 3). Using KAAS, each olive EST was assigned with a KEGG orthology (KO) number with the SBH (single-directional best hit) assignment method and the numbers subsequently were mapped to one of the KEGG's reference metabolic pathways. Consequently, a total of 13 main metabolic pathways were generated through the use of KAAS pathway mapping and the sequences largely correlated with essential metabolisms of galactose metabolism (1) , amino sugar and nucleotide sugar metabolism (1), photosynthesis (1), other glycan degradation (1), monoterpenoid biosynthesis (1) and followed by spliceosome (2), ribosome (2) and circadian rhythm -plant (1) ( Table 6, and detailed information and images about the pathway found in the Additional file 4.

Discussion
The approach to isolate differentially expressed genes To isolate differentially expressed cDNAs, it is essential to start with total RNA molecules extracted from tissues of identical conditions. With this respect, the trees we selected were genetically identical and grew virtually in the same micro-environment. The same constitutive cDNA (gi170785601 / gi170785617, a gene complex of multiple tRNA genes and Photo System II binding proteins) was detected in four (all except JF) of the five libraries (Tables 1,2,3,4 and 5) and was confirmed with qRT-PCR while homologs of cDNAs reported to express at certain stress conditions such as cold stress induced protein / dehydrin (NNF31) and temperature induced lipocalin (NNF23) or at certain metabolic processes such as Acyl-CoA binding protein (F13F17) and triosephosphate isomerase (F4F22) were detected at the expected libraries (Table 4, Table 5). At least 2 cDNAs from each library (except NNF) were confirmed by qRT-  PCR to be specific to the tissues (leaves of JF, JNF, NF, NNF and fruits of F) that they were initially detected through sequencing of the plasmid inserts from arbitrarily selected colonies. Hence, the approach (of using total RNA to prepare cDNA libraries, instead of purified mRNA pool) we used to detect differentially expressed cDNAs in the libraries, has proven to be reasonable. Furthermore, more than half of the cDNAs of each library did not match olive records in nucleotide (2718 sequences) and EST (9845 sequences) databases of NCBI which contains cDNAs derived from leaves, fruits and flowers (Database version: 31.11.2012).
Overall cDNA profile of the libraries In July, "on year" leaves (JF) and in "off year" leaves (JNF), homologs of cytochrome P450 monooxygenases (JF45 and JF146/JF151/JF187) appeared to be dominating cDNAs (Table 1, Table 2, Figure 3). Embryo defective   Figure 3 qRT-PCR analysis of the cDNAs obtained from all libraries in this study. The cDNAs were separately amplified from each tissue of which cDNA libraries were constructed (Five separate plots are horizontally aligned). Dark shaded boxes highlight the cDNAs that had the highest expression in tissues where they were detected through cDNA library screening. Light shaded columns and unshaded columns highlight cDNAs obtained from a specific library. Highly expressed cDNAs from each library were labeled. Expression levels are average values of at least 3 reactions. Error bars are indicated for each cDNA. JF: July fruited leaves, JNF: July non-fruited leaves, NF: November fruited leaves, NNF: November non-fruited leaves, F: Fruits. NF22 expressed at least 37 fold GAPDH in each library and therefore it was marked with an interruption sign. NNF22 was not included in the qRT-PCR analyses. The bars lower than 1 correspond to expression level that is less than that of GAPDH. The absence of bars is due to too low expression levels to show on the graph.
binding / small GTPase protein homolog (JF153) appeared to be strictly specific to JF. Interestingly, JF45 was detected more in "on year" leaves and fruits but less in "off year" leaves ( Figure 3) suggesting it might have a role in "on year" but more in July than that of November. On the contrary, the putative metallothionein (JF9) appeared to be associated with "on year" leaves but more in November. One (JNF1) of the two unknown cDNAs isolated from July "off year" leaves were found to express in all the tissues studied except fruits while the other (JNF32JNF87) was specific to July leaves only. In November, "on year" leaves (NF), a homolog of EMB lyase (gi30687496) was the most abundant cDNA (4%), yet it was detected in this library only. Based on the NCBI record (EMB2734), this putative lyase is predicted to function in breaking of C-C, C-O and C-N bonds during embryo development. Detecting this cDNA in maturing fruit bearing leaves makes meaningful sense as for developing embryo (as a sink tissue), nutrients from the leaves (as source tissues) should be supplied [33]. Most other cDNAs of November "on year" leaves (NF) also appeared to be associated with cold stress and embryo development which were the specific conditions for NF library: NF2 is a homolog of dehydrin (gi157497150) that has been reported to function in low temperatures and seed development [34]. NF9 is similar to metallothionein (gi12963447) and has been reported to function in senescence [35,36]. Likewise NF8 homolog JERF1 (gi22074045) has been reported to involve in gene expression at cold [37], and NF58 homolog glycosyl hydrolase (gi79598780) is associated with biotic / abiotic stress, lignification and cell wall reconstruction [38].
The cDNA that is homolog of both ARIADNE (gi145360514, a ubiquitin-protein ligase) and a zinc finger family protein (gi91806300) was the most abundant in both November, "off year" leaves (NNF) and in fruit flesh (F) libraries. qRT-PCR results revealed 6 fold to 11 fold more expression of this cDNA in fruits than in other libraries but did not confirm as one of the most abundant cDNAs in fruits nor in "off year" leaves. Combined with ubiquitin association, these results suggest ARIADNE homolog in olive is most probably a constitutively expressed cDNA. NNF91 / NNF97 and NNF24 are homologs of a splicing factor subunit (gi91806300) and a translation initiation factor (gi51599168), respectively, and they both were detected at very low level (less than 0.3 fold GAPDH) in all libraries (Figure 3).
Given the fact that these two trees are genetically identical and grow virtually in the same micro-environment, overall results present cDNAs differentially expressed in leaves, "on year" leaves, and in "off year" leaves. Constitutively expressed genes, most of which have not been detected in olive before, and several unknown cDNAs and / or genes are also reported. It should be kept in mind that alternate bearing is a result of complicated biotic and abiotic processes including environmental  factors, physiological responses of the trees in the form of activation and repression of endogenous metabolic pathways [18,19,[39][40][41], which in turn are also based on the genetic background of the tree. Large phenotypic variation has been observed, including year-to-year variation of a single genotype, as well as variation among and within (multiclonal) cultivars under the same environment. Hence it is not possible to clearly enlighten the genetic players of alternate bearing in a single cDNA screening or even a complete transcriptome analysis. Multiple approaches involving several years follow up of the selected trees / cultivars are needed to identify certain or key genetic players of alternate bearing in olive. There are no comprehensive reports on the genetic basis of alternate bearing in olive, however, and hence these results constitute important information for one of the first steps of a genetic dissection of olive periodicity which causes significant economic loss for olive growers. Through exploring these cDNAs further, it is possible to isolate genes that are key regulator of fruit formation and / or periodicity in olive.

Bioinformatic analyses
Through the bioinformatic analyses it was possible to extract further additional information about the cDNAs as well as about olive in general. BLASTx analysis revealed olive has a surprisingly high (96%) similarity to grapevine (Vitis vinifera), although these two plants are not even the same order (Vitis in Vitales while Olea in Scrophulariales) in systematics. The second most similar plant to olive is Populus (a tree with no fleshy fruits) with a much lower (68%) similarity. This suggests the cDNAs captured are directly or indirectly associated with the pathways of fruit formation and / or production that are in turn related to periodicity. GO terms categorization grouped the cDNAs into common processes, localizations and functions such as metabolic process, cellular localization and binding, respectively, which reflect a general profile of typical cell while differentially expressed cDNAs were also significantly represented such as 17% of the cDNAs in the "response to stimulus" category, and 25% of the cDNAs in the "nucleic acid and transition metal ion binding, catalytic activity" category. The metabolic pathways generated through the use of KAAS pathway mapping were largely correlated with essential metabolisms such as galactose, amino sugar, nucleotide sugar metabolisms and photosynthesis confirming the constitutive status of the majority of the cDNAs obtained.

Conclusions
In summary, we have isolated and analyzed cDNAs that are associated with alternate bearing in olive. A P450 monooxigenase homolog expressed more in the "on year" than that of "off year" leaves in July. Two putative   Tables 1-5 for the accession numbers of these sequences. According the table, there is a correlation between the biosynthesis (galactose, amino sugar and nucleotide sugar, monoterpenoid and photosynthesis) and ribosomal activity in fruited and non-fruited leaves dehydrins expressed significantly more in "on year" leaves than that of "off year" leaves in November. Homologs of triose phosphate isomerise, UDP -glucose epimerase, acyl -CoA binding protein, and a putative nuclear core anchor protein appeared fruit specific, while a homolog of an embryo binding protein / small GTPase regulator was detected in "on year" leaves only. An unknown cDNA was specific to leaves in July. KEGG pathway analyses of the sequences correlated with essential metabolisms such as galactose metabolism, amino sugar and nucleotide sugar metabolisms and photosynthesis. Detailed analysis of the results presents candidate cDNAs that can be used to dissect further the genetic basis of fruit production and / or alternate bearing.

Experimental design and the confirmation of the genetic identity of the individual trees
Two side by side olive (Olea europaea L. cv. Ayvalık) trees (approximately 4 m apart from each other), one in "on year" (high fruit yield) and one in "off year" (almost no fruits on the tree), were picked in Gömeç Orchard of Edremit Olive Seedling Growing Station. The trees (about 5 m high) were transferred into soil around 15 years ago from scions that were taken from the same tree. The scions had first dipped into indole butyric acid and then rooted in sandy soil before they were transferred into soil. Leaves from "on year" tree and from "off year" tree were randomly collected and separately deposited (for each tree) in liquid nitrogen and directly (or after keeping in -80°C freezer until use) used for total RNA extraction. Total RNA extraction from fruits and pedicels were conducted as described above. To make sure the selected two trees have the same genetic identity, total genomic DNA (gDNA) was isolated using Plant DNeasy Kit (Qiagen, Germany) and used as template for PCR reactions to amplify JNF96, NF2 and NNF31 separately from these two trees. PCR products were then sequenced at RefGen (Gen Araştırmaları ve Biyoteknoloji, Ankara) using an ABI 3130XL Genetic Analyzer (Applied Biosystems, Fostercity, CA) with a BigDye Cycle Sequencing kit (Applied Biosystems, Fostercity, CA). JNF96, NF2 and NNF31 were proved to have unique DNA sequence in 29 olive cultivars tested (unpublished data) and hence were utilized as markers to determine the genetic identity of the two trees used in this study. Comparison of the sequences revealed no nucleotide differences (100% identical) for any of the three markers between the two trees, and hence their genetic identity was confirmed.

Construction of cDNA libraries
Total RNA extraction was performed using RNeasy Kit (Qiagen, Germany) following manufacturer instructions.
RevertAid H minus 1st Strand cDNA Synthesis Kit (Fermentas, Lithuania) was used to synthesize the first strand cDNA molecules which were then incubated with RNase H (Fermentas, Lithuania) to remove RNA strand of DNA -RNA hybrids. The second strands were synthesized with DNA Polimerase I (Fermentas, Lithuania). Fifteen units of T4 DNA Polimerase (Fermentas, Lithuania) was used for blunting the double strand cDNA molecules which were then column -purified with a PCR Purification Kit (Qiagen, Germany) and cloned into pJET1.2 (Fermentas, Lithuania) using CloneJET™ PCR Cloning Kit (Fermentas, Lithuania). Manufacturers' protocols of the kits were followed in each reaction. Glycerol stocks were prepared for each colony that was confirmed to harbor an insert bearing plasmid (pJET1.2) through restriction digestion (of 100 randomly picked colonies from each library) with BglII (Fermentas, Lithuania). Plasmids from insert -positive clones were isolated using GeneJET TM Plasmid Miniprep Kit (Fermentas, Lithuania) and sequenced at RefGen (Gen Araştırmaları ve Biyoteknoloji, Ankara) using an ABI 3130XL Genetic Analyzer (Applied Biosystems, Fostercity, CA) with a BigDye Cycle Sequencing kit (Applied Biosystems, Fostercity, CA). Since detecting the most abundant genes of each specific condition (such as "on year" leaves or "off year" leaves) was the aim of the study, it was reasoned that the non-coding RNAs (rRNAs and tRNAs) should not be removed when preparing the first strand cDNA templates. Therefore total RNA (instead of isolated mRNA) was intentionally preferred for library construction to detect significantly abundant genes in each specific condition. Oligo dT primers were used instead of random oligos, however, to increase the number of protein coding cDNAs detected. Obtaining around 15% protein coding gene homologs versus around 85% non coding RNA (rRNA and tRNA) from each library on average ( Figure 2) suggested that the approach was reasonable. The five cDNA libraries constructed were named as JF (July, "on year" leaves), JNF (July, "off year" leaves), NF (November, "on year" leaves), NNF (November, "off year" leaves) and F (Fruit flesh).

Quantitative real-time PCR analysis of cDNAs
To confirm the spatial and temporal expression status of cDNAs, qRT-PCR was conducted on a Rotor-Gene 6000 W (Qiagen AG Hilden, Germany) using FastStart Universal SYBR Green Master (Roche Mannheim, Germany) for all the cDNAs obtained. qRT-PCR reaction for each gene was run at least in triplicates and repeated when a deviation more than 1 Ct (cycle threshold) was observed. Hence the Ct values were obtained by averaging at least of three different reactions. Cycling conditions were set as one cycle of 95°C for 5 minutes followed by 35 cycles of 94°C for 20 seconds, 50°C for 20 seconds and 72°C