A genome-wide 20 K citrus microarray for gene expression analysis
© Martinez-Godoy et al. 2008
Received: 27 February 2008
Accepted: 03 July 2008
Published: 03 July 2008
Skip to main content
© Martinez-Godoy et al. 2008
Received: 27 February 2008
Accepted: 03 July 2008
Published: 03 July 2008
Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant.
We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database  was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability.
This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global studies in citrus by using it to catalogue genes expressed in citrus globular embryos.
In the last years, microarray technology has demonstrated the power of the high-throughput study of gene expression in the unravelling of key processes of plant biology [2–4]. Microarrays have become especially relevant for crop species where little genome information is available, and where intensive laboratory work is necessary to get insight into a particular biological process, as well as to identify candidate target genes for future breeding .
Citrus is the most economically important fruit crop in the world, with a total production of 105 million metric tons. There is a plethora of important commercial species and varieties, including sweet oranges, mandarins, lemons and grapefruits. Variety improvement efforts have been hampered by general characteristics of citrus biology, such as apomixis, sexual incompatibility or prolonged juvenility, that limit classical molecular biology approaches. Functional genomics is then viewed as a relatively easy way to move forward into the identification of candidate genes of agronomical relevance, and to the understanding of biological processes important for citriculture.
Two years ago, aiming to develop genomic tools to assist future citrus research, we generated an EST collection covering a wide range of tissues and developmental stages, as well as biotic and abiotic stress situations, and constructed a first-generation cDNA microarray containing 6875 putative unigenes to initiate the characterization of citrus transcriptome . This first microarray has been used so far to monitor the transcriptional response of citrus in ovaries and young fruit during development and ripening of citrus flesh , during CTV virus infection , or under water stress conditions , as well as to predict citrus varieties using expression profiles .
However, to perform expression analysis in citrus at a more global scale, new microarray platforms with increased genome representation are mandatory. cDNA microarrays are still a valuable tool for transcriptomic analysis in many species [11–14]. In plants, a cDNA array containing more than 10.000 unigenes has been recently generated for canola . Although cDNA microarrays are being gradually substituted by oligo arrays due to reduction of manipulation steps during fabrication, and to their ability to detect similar members of some gene families, the validity of both platforms to perform reproducible and biologically consistent results has been clearly demonstrated, and the lack of concordance between microarray platforms has proven to be a failure of the metrics used to evaluate such concordance . Moreover, cDNA microarrays seems to be the best option for comparative, evolutionary and ecological studies of closely related species , taking profit that cross-hybridization is expected to occur in cDNA arrays when sequence homology between targets and probes is higher than 70% . This is especially relevant for citrus, a tree grown as a combination of the fruit-producing scion variety bud-grafted onto a rootstock variety adapted to the soil and environment, as many studies combine both parts of the tree. Here we describe the design and creation of a publicly available cDNA microarray that include 21,081 putative unigenes of citrus. Our microarray complements the recently released Citrus Affymetrix GeneChip  and provides an alternative tool to perform global transcriptomic assays in these species. Although the majority of gene fragments spotted on the array were isolated from Citrus clementina, the cDNA nature of our microarray extends its use to any citrus species [8, 10], allowing also comparison of scion/rootstock expression . To illustrate their utility, we use this microarray to catalogue genes expressed in citrus globular embryos, and show how embryogenesis in citrus proceeds expressing a similar set of genes as it does in Arabidopsis.
To further reduce the sequence redundancy in the 27,551 citrus unigenes, a number of unigene clusters (or "superunigenes"), grouping different unigenes with extensive sequence overlapping, was obtained (see Material and Methods). Members of a superunigene could represent highly similar family members, alternative splicing or polymorphisms. Since their sequence is very similar, they are expected to identify the same mRNA species under standard hybridization conditions if used in cDNA microarrays . In an attempt to reduce such eventual spot cross-hybridizations, only one representative cDNA clone per superunigene was selected to be printed in the microarray, and only clones producing a single PCR product were accepted (see Material and Methods), which produced a total of 21,081 reasonably specific cDNA probes. Additional file 1 shows functional annotation of the genes represented in the microarray, including the ID, description and E value of the first BLAST hit from the databases used for annotation (UniRef90  and Arabidopsis TAIR full set of proteins ), as well as their Gene Ontology classification  and pfam domains .
In order to estimate the genomic representation of the microarray, Arabidopsis sequences similar to the citrus unigenes present in the microarray were identified and used for Gene Ontology  functional classification (see Material and Methods). Arabidopsis similar sequences (BLASTX E value lower than 10-20) were found for 13,266 citrus unigenes (63% of the total unigenes in the microarray). The remaining 37% did not have any match in the Arabidopsis genome with a BLASTX E value lower than 10-20. As discussed in a former paper  a proportion of these could be citrus or tree-specific genes, and demonstrate the importance of molecular studies in crop species, that can reveal interesting proteins and new biosynthetic pathways not yet discovered in other systems.
Genome-wide feature of the microarray. Comparison of numbers and percentages of genes at the Biological Process Gene Ontology between citrus and Arabidopsis.
Genome-wide feature of the microarray
anatomical structure morphogenesis
amino acid and derivative metabolic process
DNA metabolic process
protein modification process
carbohydrate metabolic process
lipid metabolic process
protein metabolic process
secondary metabolic process
regulation of gene expression, epigenetic
response to abiotic stimulus
response to biotic stimulus
response to endogenous stimulus
response to external stimulus
response to stress
To demonstrate the potential of our microarray as an alternative to the existing Citrus GeneChip , a comparison between unigenes present in both platforms was performed. First, to equally evaluate the number of genes represented in every chip, we assembled the consensus sequences of the unigenes in the Affymetrix chip according to our assembly parameters (see Materials and Methods). The 33,879 transcripts were reduced to 24,400 unigene clusters (or "superunigenes"), against the 21,000 present in our cDNA array. In addition, we have estimated how many genes are represented in our microarray and not in the Affymetrix one. A BLAST search of the sequences represented in our cDNA array against the consensus sequence of those included in the Affymetrix chip revealed that 6248 genes did not found a positive match with E value lower than 10-20 (7064 with E value lower than 10-50) [see Additional file 3]. It implies that they could be analyzed only if using our cDNA array. These results demonstrated that the microarray platform presented in this paper constitutes a complementary tool to the Affymetrix GeneChip for genome-wide transcriptomic analysis in citrus plants.
Embryogenesis is a critical stage of the plant life cycle. The egg cell develops into a multicellular organism via a precise sequence of events . During the first phase of embryogenesis, the body plan is being established, consisting in a shoot meristem, cotyledons, hypocotyl and root meristem along the apical-basal axis, and a concentric arrangement of epidermis, ground tissue and vascular cylinder along the radial axis. Understanding the molecular mechanisms underlying embryogenesis can provide insight into developmental and metabolic regulation of this important stage of plant biology, and a big effort has been made in the two last decades in that direction . A number of important genes and pathways have been identified, and recently, global analysis in Arabidopsis has been performed to identify a set of expressed genes during different stages of early embryogenesis .
We have performed a pilot experiment to catalogue the set of expressed genes in the citrus late globular embryo. Citrus exhibit polyembrionic seed development . In many species, non-zygotic embryos develop from the maternal nucellar tissue of the ovule surrounding the sexual embryo sac and develop together with the zygotic one. We crossed Citrus clementina (cv. Clemenules) with Fortune (C. clementina × C. tangerina) (see Materials and Methods) to obtain monoembryonic seeds and assure the analysis of expression only in zygotic embryos.
First, this experiment constitutes a proof of use of our microarray, demonstrating the utility of a multispecies citrus cDNA microarray for expression studies. Second, Arabidopsis orthologs of many genes present in the microarray are already known to be expressed in the embryo, and it would be interesting to confirm whether these genes are also expressed in citrus embryos. Moreover, the study could reveal novel interesting genes expressed during embryogenesis that initiate future works aimed to decipher their implication in this process.
Five biological replicates were performed. Correlation between replicates ranged between 75% and 90%. A total of 13,341 genes were considered present in the late globular embryo, according to the criteria explained in Materials and Methods [see Additional file 4]. That constitutes the 63% of the 21,081 citrus unigenes examined in our microarray. In a recent paper,  found 77% of the 22,800 genes of the ATH1 Arabidopsis Genechip to be expressed in the torpedo stage of embryogenesis. Although the number of present genes should be taken as an estimation depending of the threshold values applied in each case, it reveals that virtually the whole cellular machinery is activated during embryogenesis, reflecting the high metabolic activity of meristematic and differentiating cells.
Although mainly studied in Arabidopsis, overall processes during plant embryogenesis are thought to be similar in other species . Of the 293 EMB genes from Arabidopsis catalogued by the SeedGenes project , aimed to identify genes that give seed phenotype when disrupted by mutation, 210 of them had a citrus ortholog and were present in the microarray, and 71% of these were found expressed in the globular embryo of citrus [see Additional file 5]. The remaining ones could be present in a different embryo stage, or not detected due to their low expression , although the possibility of not being expressed in the citrus embryo do not has to be neglected. Citrus orthologs of the Arabidopsis genes involved in embryo pattern-formation , could also be detected by our microarray: orthologs of GNOM, a gene involved in the establishment of the apical-basal axis, MONOPTEROS, whose mutation alters the normal division of embryonic cells, ZWILLE, involved in establishing the primary shoot meristem in the embryo, or KEULE, gene responsible for the correct cytokinesis of the cell, were also expressed in the citrus embryo.
Other genes or gene families recently known to have a role in plant embryogenesis are also expressed in citrus embryos. Involvement of cell wall and remodelling of cell architecture , regulation of mRNA stability and translation through poly-A binding proteins [36, 37], regulation of development through pentatricopeptide repeat proteins , the involvement of vesicle trafficking in organ development  or the role of cell cycle genes in early stages of embryogenesis  has been confirmed in citrus embryos by expression of sets of genes belonging to these functional categories. Similarly, the well described role of auxins in establishment of embryo polarity  or the recent implication of brassinosteroids in the acquisition of embryonic competence  was confirmed in citrus embryos by expression of citrus orthologs of genes related to signalling and biosynthesis of these hormones [see Additional files 6, 7, and 8].
Much less is known about how early embryos prepare themselves for pathogen attack. It has been suggested that developing barley embryos activate a developmental defense activation programme where expression of defence genes is explained to involve control by developmental signals rather than induction by pathogens . Lipoxygenases (LOX1 and LOX2) enzymes, that catalyse the first committed step in JA biosynthesis, have been described to be expressed in developing embryos . We also found expression in globular citrus embryos of LOX1 and LOX2 homologues and of an ortholog to AT1g67460, a 13-lipoxygenase enzyme considered so far to have minimal activity in embryos. Moreover, functional classification of present genes reveals around 9% of genes belonging to the category "response to stress", 8% to the category "response to abiotic stress, 3% to the category "response to abiotic stress", and 3% to the category "Defense". These data point towards a deployment of protection mechanisms in the citrus seed, already activated at the globular stage.
We have constructed a citrus 20 k cDNA microarray which can be used for gene expression analysis in different species of citrus. We also provide access to a web-browsable database as a companion tool for this microarray. The database contains every structural and functional annotation related to the unigenes represented in the microarray. From a series of experiments on embryos development in citrus, it could be stated that our microarray allows reproducible global expression analysis in citrus, and that citrus embryogenesis share with the model plant Arabidopsis thaliana many aspects of the developmental programme aimed to established the basic body plan of the adult plant. We would like to offer this microarray and the companion database to the citrus research community with the hope that future use of these genomic tools will uncover clues of the transcriptional regulation of genes in different citrus species, and during different aspects of productivity, like plant resistance, plant development, or fruit quality.
ESTs processing and assembly were performed by using EST2uni , an open, parallel software package which uses different standard EST analysis tools for automated EST preprocessing, assembly and unigenes annotation. For the present work, EST2uni was used with the following tools. Raw sequences and base confidence scores were obtained from raw chromatogram files using the program phred [45, 46]. Low-quality and cloning vector regions were removed from the sequences with Lucy , and ESTs that were left with less than 100 non-vector good-quality bases after trimming were discarded from further analyses.
Repetitive elements and low-complexity regions were masked with RepeatMasker  and SeqClean , respectively. For repeat masking, the eucotyledons-specific repeats database was used. Vector sequence contaminants were also removed with SeqClean, using NCBI's UniVec database . Clean, vector-free EST sequences were submitted to dbEST division of GenBank (accession numbers CX286781 to CX309414, and FC868488 to FC932655). Assembly of reads in contigs and singletons to estimate the redundancy of the ESTs, get the consensus sequences of the redundant ones, and obtain the unigene set was made with tgicl , using the following default parameters: 30 bases minimum overlap length, 94% minimum percent identity for overlaps, and 30 bases maximum length of unmatched overhangs. Poly(A/T) tails and open reading frames (ORFs) were predicted for the unigenes using ESTScan . ESTScan was also used to obtain reverse complimentary sequences of the unigenes when necessary.
A number of unigene clusters (or "superunigenes"), grouping different unigenes with extensive sequence overlap (more than 300 bp with more than 90% identity, and covering more than 50% of the length of one of the unigenes), were obtained from the initial unigene set using BLAST. In order to avoid spot cross-hybridization, only one representative per superunigene was selected to be printed in the slides. These representatives were selected according to the following criteria: single PCR product, EST sequence length greater than 300 bp and covering at least 90% of the unigene consensus sequence, and GC content not greater than 80% in a 70 base-long sliding window. Where more than one clone in a superunigene satisfied all the criteria, the longest one was selected to ensure that full-length clones were used for printing when possible. Where no clone in a superunigene satisfied all the criteria, the requirements were progressively relaxed until a representative clone was selected. Only single PCR-product was mandatory, and unigenes without clones satisfying this criteria are not represented in the microarray. The microarray was submitted to the ArrayExpress database (accession number A-MEXP-1017).
cDNA clones being the best representative for each superunigene were selected to be PCR-amplified in a final volume of 100 μL using 4 ng of plasmid template, 400 nM of each primer, and 200 μM dNTPs. The reaction products were analyzed by agarose gels, and purified using the Multiscreen-PCR 96-well Filtration System (Millipore). Only PCR reactions yielding single bands were transferred to printing plates, at a final concentration of 150 ng/μL in PRONTO Universal Spotting Buffer (Corning Life Sciences). PCRs were printed onto UltraGAPS aminosilane Corning slides, using a MicroGrid II arrayer (Genomic Solutions). Printed slides were UV-crosslinked at 150 mJ and store in a desiccator until use. Lucidea Universal ScoreCard (GE Healthcare) spike controls were diluted in 100 ng/μL spotting buffer and printed on the array for quality evaluation. Each calibration and negative controls from the Lucidea kit were spotted several times across the whole area of the array. Every selected clone was spotted once.
Using EST2uni , structural and functional annotation of unigenes obtained in the assembly step was performed as follows: Di-, tri- and tetra-nucleotide simple sequence repeats (SSR) were detected with Sputnik . Putative single nucleotide polymorphisms (SNPs) were found by EST2uni using a locally developed algorithm. As ESTs have frequent sequencing errors, only positions with a quality score above 39 were considered, and sequence discrepancies between ESTs in the same contig were marked as putative SNPs only if the polymorphism was confirmed by more than one EST in the contig. Lastly, because cDNA libraries were constructed using oligo-dT primer for the reverse transcriptase reaction, unigenes were aligned with the Arabidopsis complete proteins database to predict if there were full-length clones for each unigene.
For the functional annotation of unigenes, BLASTx was carried out in EST2uni against: 1) the UniRef90 non-redundant protein clusters database  (downloaded October 2006: UniProtKB release 8.9 of October 2006), and 2) the predicted full set of Arabidopsis thaliana proteins provided by TAIR  (downloaded September 2006: TAIR6 of November 2005). BLASTn searches were also made in EST2uni against all the public citrus sequences at GenBank , including ESTs (downloaded October 2006). All these analyses were performed using BLAST default parameters and arbitrary non-stringent threshold of 10-5 for E value. Unigenes were annotated with the description of the most similar UniRef90 cluster of proteins. When no significantly similar UniRef90 cluster was found, unigenes were annotated with the first informative description (i.e., not containing words such as "unknown", "anonymous", or "hypothetical") of the BLAST hits, if any, against the databases of Arabidopsis proteins and GenBank citrus DNA sequences, in this order. Unigenes were annotated as highly similar to the first BLAST hit when the E value was lower than 10-15. BLASTX hits with an E value higher than 10-10 were not considered for annotation. Gene Ontology  annotation of the Arabidopsis more similar proteins was used for GO annotation of the citrus unigenes. A BLASTX E value lower than 10-20 was required to use the GO annotation of the Arabidopsis proteins to the corresponding citrus gene. A HMMER search  was also done to identify putative PFAM domains  in the unigenes. Finally, a bi-directional BLAST comparison was also performed with Arabidopsis protein database to obtain a set of putative orthologs. In these analyses, two sequences were considered orthologs when each one was the first hit in a BLAST search with the other. All these unigene annotations are automatically stored by EST2uni in a MySQL relational database  which can be accessed by Internet using a web browser .
Late globular zygotic embryos were manually extracted from citrus seeds obtained after pollination of Citrus clementina (cv. Clemenules) pistils with Fortune (C. clementina × C. tangerina) pollen, and stored at -80° C prior to use. Five embryos were pooled together and total RNA was extracted using RNAeasy microKit from Qiagen, and quantified using Nanodrop spectrophotometer.
RNA samples were amplified using MessageAmp II amplification kit from Ambion, using 1.5 g as starting material. 7.5 μg of UTP-aminoallyl-amplified RNA (aRNA) were labeled using Cy3 or Cy5 dye (GE Healthcare), purified using Megaclear columns (Ambion), and quantified using Nanodrop spectrophotometer. 200 pmol of labeled-aRNA were dried and resuspend in hybridization buffer containing 3×SSC, 0.1% SDS, 0.1% salmon sperm DNA and 50% formamide. In each slide, embryo sample was labeled with Cy5. A reference sample was labeled with Cy3 for proper normalization. Microarray hybridization was performed manually using Telechem Hybridization Chambers, following Corning instructions. Briefly, slides were prehybridized for 30 min in 3×SSC, 0.1% SDS, 0,1 mg/mL BSA, rinsed twice with water before drying. Slides were hybridized overnight at 42° C and washed in 2×SSC, 0.1% SDS for 5 min at 42° C, 0.1×SSC, 0.1% SDS for 10 min at room temperature, and 0.1×SSC for 5 min at room temperature. Slides were dried in a table centrifuge and scanned using a GenePix 4000B scanner from Molecular Devices, at 10 μm resolution, 100% laser power and at PMT values adjusted so that total intensity in both channels is equal. Microarray images were analysed using GenePix 6.0 software (Molecular Devices).
Fruits were randomly collected from different field plants. Five biological replicates were done, each one containing five embryos coming from different fruits. Slides were global median normalized so that the median of the median of ratios of every valid spot is equal to 1. After normalization, signal in negative controls was checked to be undetectable, and average signal of internal controls known to be expressed during Arabidopsis embryogenesis was checked to be similar in all replicates. A gene was considered "present" in a microarray if its Cy5 median intensity was above two times the median intensity of its local background. A gene was considered "present" in the embryo if it was considered "present" in at least four of the replicates. Functional interpretation was done with FATIGO+ , using the corresponding Arabidopsis ortholog lists.
The 20 k citrus microarray is the result of a coordinated effort between the 'Instituto de Biologia Molecular y Celular de Plantas (Universidad Politecnica de Valencia - Consejo Superior de Investigaciones Cientificas)', the 'Instituto Valenciano de Investigaciones Agrarias (Conselleria de la Comunitat Valenciana)', and the 'Instituto de Agroquimica y Tecnologia de Alimentos'. We would like to acknowledge people who participated in the generation of the EST collection that allowed the construction of this microarray. We would specially like to thanks to Dr. Luis Navarro, Dr. Manuel Talon, Dr. Lorenzo Zacarias, Dr. Ramon Serrano, Dr. Vicente Pallas, Dr. Miguel Angel Perez-Amador, and Dr. Vicente Conejero for the time dedicated to management and other issues concerning the generation of this microarray.
This project was jointly sponsored by "Agroalimed" and "Conselleria de Agricultura, Pesca y Alimentacion de la Comunidad Valenciana".
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.