Evolutionary insights into scleractinian corals using comparative genomic hybridizations

Background Coral reefs belong to the most ecologically and economically important ecosystems on our planet. Yet, they are under steady decline worldwide due to rising sea surface temperatures, disease, and pollution. Understanding the molecular impact of these stressors on different coral species is imperative in order to predict how coral populations will respond to this continued disturbance. The use of molecular tools such as microarrays has provided deep insight into the molecular stress response of corals. Here, we have performed comparative genomic hybridizations (CGH) with different coral species to an Acropora palmata microarray platform containing 13,546 cDNA clones in order to identify potentially rapidly evolving genes and to determine the suitability of existing microarray platforms for use in gene expression studies (via heterologous hybridization). Results Our results showed that the current microarray platform for A. palmata is able to provide biological relevant information for a wide variety of coral species covering both the complex clade as well the robust clade. Analysis of the fraction of highly diverged genes showed a significantly higher amount of genes without annotation corroborating previous findings that point towards a higher rate of divergence for taxonomically restricted genes. Among the genes with annotation, we found many mitochondrial genes to be highly diverged in M. faveolata when compared to A. palmata, while the majority of nuclear encoded genes maintained an average divergence rate. Conclusions The use of present microarray platforms for transcriptional analyses in different coral species will greatly enhance the understanding of the molecular basis of stress and health and highlight evolutionary differences between scleractinian coral species. On a genomic basis, we show that cDNA arrays can be used to identify patterns of divergence. Mitochondrion-encoded genes seem to have diverged faster than nuclear encoded genes in robust corals. Accordingly, this needs to be taken into account when using mitochondrial markers for scleractinian phylogenies.


Background
Coral reefs are one of the most productive and diverse ecosystems on our planet. As such, they are of immense ecological and economic importance. Yet, these tropical marine ecosystems are currently threatened by a multitude of factors including climate change-induced mass bleaching events [1], disease [2,3], pollution [4,5], overfishing, and eutrophication [6][7][8]. Understanding the effects of multiple threats to corals is necessary in order to predict how coral populations will respond to continued disturbance. Genetic and genomic tools now exist that allow us to understand the molecular underpinnings of coral health and stress [9][10][11][12][13][14].
In particular, cDNA microarrays have accelerated the discovery of stress-responsive genes and mechanisms in recent years in a wide range of non-model organisms [15][16][17]. cDNA microarrays can assay the expression of thousands of genes simultaneously from control and experimental specimens. Large-scale microarray studies on marine organisms such as porcelain crabs [18], damselfish [19], and gobies [20,21] have provided transcriptomic information in relation to environmental physiology. Small-scale [22,23] and large-scale cDNA microarray studies have been carried out on different scleractinian coral species including Montastraea faveolata, Acropora palmata, and Acropora millepora exposed to environmental stress [9][10][11][12][13][24][25][26][27]. However, comparative studies in other coral species are imperative to provide insight into the molecular differences between coral species and to determine the extent to which previous findings can be generalized. Yet, the establishment of new microarray platforms is highly time and resource intensive. Nevertheless, microarray studies are not necessarily restricted to the species from which the cDNAs were generated (i.e. cDNAs from A. palmata). Heterologous hybridization is the methodology by which cDNAs from non-reference species are used for hybridization to microarrays (e.g. cDNAs from Acropora millepora hybridizing to an A. palmata microarray). This process has been described extensively for different non-model organisms including birds, primates, pigs, and bony fish [28][29][30][31][32]. Renn et al. [28] systematically showed that a microarray composed of cDNAs from the African cichlid Astatotilapia burtoni yielded biologically meaningful gene expression patterns from heterologous hybridizations spanning evolutionary divergence times from < 10 to > 200 million years (Ma). As expected, the number of spots giving a reliable signal decreased with increasing phylogenetic distance; nevertheless, 3,000-4,000 spots out of 4,500 gave a signal at the largest phylogenetic divergence, which corresponds to 66%-88% of unique spots on the array. Although the ability to detect small fold changes decreases with increasing evolutionary distance, a study on the heat shock response of a damselfish (Pomacentrus moluccensis) utilizing an oligonucleotide microarray designed for zebrafish (Danio rario-divergence time from 11-300 Ma) reported statistically significant gene expression changes at less than two-fold in magnitude [19].
Prior to hybridizing non-reference cDNAs to a microarray, it is important to use genomic DNA (gDNA) to estimate the projected efficiency of a microarray for heterologous hybridization experiments. The hybridization of gDNA to a cDNA microarray is an example of a comparative genomic hybridization (CGH). In this case gDNA from a non-reference species can be competitively hybridized to the array with gDNA from the reference species, or gDNA from non-reference species can be hybridized alone. The signal intensity of each spot on the microarray is dependent on the sequence similarity and gene copy number between both species (i.e. high sequence divergence = low signal intensity). For example, Renn et al. [28] showed that when labeling gDNA from the reference species Astatotilapia burtoni, 93% of spots showed intensity levels two standard deviations over background. In a separate study, gDNA from Drosophila melanogaster showed an average of 4.2% greater hybridization than Drosophila simulans gDNA to a microarray designed for D. melanogaster [33], suggesting that about 95% of the spots yield biological reliable information.
In addition to determining the amount of reliable spots, CGH can also provide valuable information on gene evolution. Numerous studies on Drosophila [34], yeast [35,36], Salmonella [37], and Yersinia [38] have used microarrays to study gene evolution. A particularly relevant study of the ectomycorrhizal fungus Paxillus involus and related strains used a cDNA microarray to screen for rapidly evolving genes [39]. Therefore CGH can also be used to identify potentially fast-evolving genes and species-specific adaptations when comparing related species [40].
We have employed CGH against A. palmata microarrays containing 13,546 cDNAs using gDNA from Acropora cervicornis, Siderastrea radians, and Montastraea faveolata. This allowed us to: (1) establish the number of "good spots" that can be expected when performing heterologous hybridizations with a range of species at different evolutionary distances; (2) analyze a genome-wide rate of gene evolution; and (3) identify candidates for rapidly diverging genes. Our results show that more than 84% of the spots are likely to provide biologically relevant information across large evolutionary distances (>240 Ma), i.e. the results obtained from these spots can be expected to be scientifically valid. Analyses of the highly divergent gene fractions further provided insights into molecular differences of the two coral clades present today, namely the robust and complex corals, which separated approx.
240 Ma. Our results suggest that mitochondrial-encoded genes might have played an important role during the evolution of the robust coral clade.

Results and discussion
Sequence identity and hybridization signal A strong correlation between sequence identity and hybridization signal/ratio is a prerequisite for the use of heterologous hybridizations in interspecies microarray experiments. We used the A. palmata-M. faveolata comparison to analyze whether hybridization ratios were significantly correlated with the underlying sequence divergence for two reasons: (1) this comparison reflects the largest evolutionary distance in our experiments; and (2) because transcriptome sequence data for M. faveolata were readily available. Briefly, orthologs between A. palmata and M. faveolata were identified by reciprocal tBLASTx of the A. palmata spotted cDNAs to a M. faveolata transcriptome data set [41]. We compared the 13,546 cDNA clones spotted on the A. palmata microarray to 17,703 cDNA sequences from M. faveolata. A total of 193 unique spots representing orthologs with alignment lengths above 200 bp were identified and used for subsequent analysis. Linear regression of percent sequence identity (%ID) to log 2 hybridization ratios of these spots showed a significant correlation (R 2 = 0.39, p < 0.0001, Figure 1) despite the large evolutionary distance (>240 Ma). Although the correlation observed is not strong, it is similar to what has been observed in previous studies conducted using complete genome sequences in bacteria [42] and Drosophila species [34]. These results show that sequence identity and signal intensity are significantly correlated despite a considerable amount of variation and underline the suitability of the A. palmata microarray platform for heterologous hybridizations with coral species across large evolutionary distances as has been previously shown for other species [19,28,43]. The variation observed is likely to stem in part from using genomic DNA for the hybridization on cDNA microarray chips. Despite the high identity throughout the coding regions, the spots on the array do not contain any intronic sequences, which might influence the hybridization signal and add to the variation.

Detection of sequence divergence
In order to determine the amount of suitable spots for heterologous hybridizations with different species, we conducted an Estimated Probability of Presence (EPP) analysis using the software GACK [44]. The EPP analysis assigns a probability for each spotted cDNA sequence of being present (i.e. conserved), slightly divergent, or highly divergent in the non-reference species and therefore allows to statistically identify conserved and divergent genes based on their hybridization signal intensity ratios [44].
As expected, analysis of the number of divergent genes across species showed an increase of divergent genes and a decrease of conserved genes with increasing evolutionary distance ( Figure 2). Specifically, we found that the percentage of conserved genes ranged from 94.83% in the evolutionary closest comparison between A. palmata and A. cervicornis and 84.51% in the comparison between A. palmata and M. faveolata. Accordingly we observed an increase of divergent genes from 0.96% to 4.16%. Interestingly, the amount of genes that could not be classified as being either conserved or highly divergent also increased with phylogenetic distance ( Figure 2).
We used MrBayes [45] to examine the phylogenetic relationships of the coral species using the two mitochondrial genes cytochrome c oxidase subunit I (cox1) and cytochrome b (cytb). Sequence data were compared to presence/absence hybridization data as provided by GACK. Both datasets provided trees with identical topology but slight differences in branch lengths indicating that hybridization data recapture sequence-based data and can therefore be used to assess sequence divergence (Figure 3a and b). More specifically, CGH experiments provide a shortcut to assessing sequence divergence in a comparative framework in many different genes and species for a fraction of the cost of sequencing [46]. Interestingly, a comparison of evolutionary trees based on the fraction of annotated and non-annotated genes showed a high increase in branch length separating both acroporids from S. radians, which implicates fast divergence of non-annotated genes within the complex corals. A   similar increase in branch length is also observed for the complex/robust clade distance; yet, the difference is not as pronounced as with the acroporids and S. radians (Figure 3c and d).
We determined the number of unique spots suitable for heterologous hybridization for the different species by defining "good" spots according to their classification in the GACK analysis as being 'conserved'. Our results showed that more than 94% of the spots are likely to provide biological relevant information for species within the Acroporidae family while we found that >89% of the spots can be used for species of the complex clade and >84% of the spots when using species of the robust clade (Table 1). These percentages represent 12,733, 12,056, and 11,379 spots with respect to the total number of unique spots on the A. palmata cDNA platform. The 'conserved' gene criterion proved to be a much more conservative approach to determine spot fidelity and resulted in the lowest amount of conserved genes when compared to other methods, such as the use of two standard deviations above background as standard cut-off [28] or methods relying on M values [19,47]. Hence, our approach is likely to underestimate the total amount of suitable spots, especially for more closely related species like A. cervicornis. However, we favor a  Number of annotated and non-annotated genes in the highly divergent and conserved gene fractions.
more conservative approach since the correlation of sequence identity and hybridization signal ratios is known to become weaker with increasing sequence divergence [42,47] resulting in impaired biological relevance of data from spots with low hybridization signals. Taken together our data indicate that the A. palmata array can be used for heterologous hybridizations with scleractinian coral species from both clades.

Analysis of divergent and conserved genes
Analysis of the fractions of divergent genes revealed a large number of non-annotated genes across all comparisons. Statistical analysis (Chi square) confirmed a significantly higher number of genes without annotation in the divergent gene fraction across all four species comparisons (p < 0.0001, Table 1). Conversely, annotated genes were significantly overrepresented in the conserved genes fraction (p < 0.0001, Table 1). Comparison of trees generated from either annotated or nonannotated genes showed the same topology, however, the branch lengths were considerably larger for the nonannotated gene fractions (Figure 3), which further shows that non-annotated genes are diverging at a higher rate. Previous studies in Drosophila, corals, and Symbiodinium [48][49][50] suggested that non-annotated genes appear to evolve at a higher rate than annotated genes. In general, genes without homologues in other taxa are considered to be lineage-or species-specific and are therefore termed taxonomically restricted genes (TRGs) [51]. TRGs are thought to play an important role in lineageand species-specific adaptations and have been hypothesized to be a source of phenotypic diversity [52][53][54]. In scleractinian corals, many genes involved in biomineralization such as some galaxin orthologs appear to be unique to corals and are therefore considered to be coral-specific TRGs [55]. Other TRGs of corals include SCRiPs, a novel family of putatively secreted, small, cysteine-rich proteins that appear to function during development [56]. We analyzed the overlap of highly divergent genes across all comparisons to identify genes that appear to evolve faster across families and/or clades (Additional file 1). Our analysis showed a successive increase of highly divergent genes with increasing evolutionary distance. We identified a total of 120 unique spots to be highly divergent in A. cervicornis and 294 unique spots in S. radians when compared to A. palmata. Both complex corals shared only 5 unique spots whereas 19 unique spots were shared between all species (Figure 4, Additional file 2). However, it is likely that the 19 unique spots shared between A. cevicornis, S. radians and M. faveolata also contain genes that are actually rapidly diverging in A. palmata and hence appear as highly diverged across all species comparisons. The largest overlap of highly divergent genes was found between S. radians and M. faveolata, which shared 190 unique spots. However, of these 190 unique spots we found 116 to be without annotation and further 37 annotated as predicted, putative, or otherwise uncharacterized protein. A similar result was found for all other comparisons. Of the 60 unique spots found to be highly divergent in the A. cervicornis -A. palmata comparison only 18 had a functional annotation while only 127 out of the 203 spots unique to M. faveolata-A. palmata comparison were annotated (Additional file 1, Additional file 3, Additional file 4). The large amount of nonannotated genes in the divergent gene fraction did not allow the identification of specific pathways and/or gene groups that might potentially be rapidly diverging with the exception of mitochondrial genes, which are discussed below.

Evolution of the robust clade
The comparison between the robust clade (also referred to as the short clade because of their shorter 16 s and 12 s mitochondrial sequences [57,58]) coral M. faveolata and the complex coral A. palmata revealed 452 putatively divergent genes of which 203 were exclusively divergent in the robust-complex clade comparison, i.e. they did not appear to be divergent in the comparisons within the complex clade corals. Interestingly, these included most of the mitochondrial-encoded genes such as NADH-ubiquinone oxidoreductase subunits 1, 4, 5 and 6 as well as cytochrome c oxidase subunit 1, 2, 3 and cytb. This suggests that the mitochondrial genome of robust corals underwent a phase of rapid divergence while the majority of nuclear encoded genes diverged considerably slower.
Previous studies found that anthozoan mitochondrial genomes display a lower mutation rate than nuclearencoded genes [59][60][61][62]. Hellberg et al. [60] for instance reported that the mitochondrial encoded-gene cox1 of the two complex corals Balanophyllia elegans and Tubastrea coccinea showed significantly lower synonymous substitution rates than nuclear-encoded genes. In line with that, Kitahara and colleagues [63] showed that the average nucleotide difference of the mitochondrial cox1 within the clades was less than 8%. However, the same study showed that the average difference of the cox1 gene between the complex and the robust clade was 19.1%. Interestingly, phylogenetic comparison between the complex clade and the more basal sister group corallimorpharia showed that the average nucleotide difference of cox1 was only 13.6%, which is considerably lower than the 21.3% average difference found between robust corals and corallimorpharia. This further suggests that the mitochondrial genome of robust corals must have undergone a phase of rapid divergence during or since the evolutionary split from the complex coral clade.
Indeed, more detailed analysis on the mitochondrial genomes of Acropora tenuis and species from the Montastraea annularis complex (M. franksi, M. faveolata and M. annularis) showed strong indications for nonneutral and unequal rates of evolution, i.e. the mitochondrial genome of robust corals has been under strong positive selection during or after the evolutionary split of the complex and robust clades [64]. Consequently, Fukami et al. [64] proposed that robust corals might have passed through a general phase of faster evolution. Our results corroborate these findings additionally suggesting that this phase of faster evolution might have been predominantly restricted to the mitochondrial genome while the average divergence rate of nuclearencoded genes remained largely unchanged. This is an interesting finding which points towards an important role of the coral mitochondrion or mitochondrialencoded genes during the evolution of the robust clade. For instance, mitochondrial bioenergetics has been discussed as a potential major force in speciation through co-evolution of mitochondrion and nuclear-encoded mitochondrial genes. This can result in specific coadaptations that can lead to incompatibilities and consequently to reduced fitness and reproductive barriers for certain haplotype combinations [65,66]. Rawson and Burton observed reduced performance for various fitness traits in interpopulation hybrids of the copepod Tigriopus californicus, which appeared to be associated with co-adaptation between cytochrome c (nuclear encoded) and cytochrome c oxidase (mitochondrial encoded) [66]. Subsequent analyses suggested a single amino acid substitution in the cox1 subunit as cause for a lower activity and consequently for the observed interpopulation hybrid breakdown [67].
The evolutionary forces that can lead to co-evolution of nuclear-and mitochondrial-encoded genes are diverse and include climatic adaptation as well as specific adaptations to an ecological niche or changes in the environment [65]. To date it is unclear whether the complex and robust coral clades diverged before or after the Permian-Triassic extinction event [68][69][70][71]; yet, both scenarios are in line with strong environmental changes and the sudden availability of new ecological niches. Such strong changes might have favored a rapid adaptation of mitochondrial bioenergetics and thus a phase of rapid divergence of the mitochondrial genome of robust corals.
Corroborating data that the mitochondrial genome underwent a phase of rapid divergence and strong positive selection has interesting implications for current coral molecular phylogenies since many are mainly based on mitochondrial genes [57,58,63,68,70,72]. One of these implications is that the uneven evolutionary rates of coral mitochondrial sequences do not reflect evolutionary divergence time and are therefore suboptimal to resolve phylogenetic relationships within the order Scleractinia. With the complex clade coral genome of Acropora digitifera at hand [73] and the robust coral genome of Stylophora pistillata being currently sequenced (Voolstra lab at KAUST), we will soon be able to perform phylogenetic analyses using a variety of nuclear-encoded genes that will further shed light on the evolution of the scleractinian coral clades.

Conclusions
In this study we have demonstrated that the microarray platform available for A. palmata can be successfully used to study evolution of scleractinian coral species of both the complex and robust clade. Our results suggest that the platforms currently available might be sufficient to study a wide range of scleractinian coral species, thereby superseding the time and resource consuming development of further platforms for scleractinian coral species. The use of CGH and heterologous hybridizations as tools to (1) study genome-wide gene divergence, (2) identify candidates for rapidly diverging genes, and (3) compare transcriptomic responses to stress among different coral species will greatly enhance our understanding of coral evolution and genomics. While RNAseq might provide higher resolution, microarrays supersede sequencing-based approaches in terms of cost, comparability, and targeted approaches, e.g., compare selected subsets of genes or low expressed genes. Here, we found indications for a potentially important role of the coral mitochondrion/mitochondrial-encoded genes in the evolution of the robust coral clade by analyzing differences in divergence of mitochondrial and nuclear encoded genes. This also has important implications for the use of mitochondrial sequences for scleractinian coral phylogenies.

Coral sampling
Samples of M. faveolata and S. radians were collected in Puerto Morelos, Mexico during November 2008 on the permit registration MX-HR-010-MEX folio 016. Three colonies of M. faveolata were sampled using a hammer and chisel, and three unattached colonies of S. radians were taken from a sea grass bed. Three samples of A. cervicornis were collected in Bocas del Toro, Panama during March 2008 on the permit SEX/A-26-07-branch tips of three separate colonies were broken off using a hammer and chisel.

DNA extraction, amplification, and microarray hybridization
Between 50-100 mg of frozen coral tissue were scraped off the samples using a metal corer and DNA was extracted using the PowerPlant DNA extraction kit (MoBio Laboratories, Carlsbad, CA, USA) with the following modifications: following tissue homogenization, samples were spun twice to pellet skeletal debris; and during incubation with Buffer PB1, 1 mg/mL RNase A was added.
Extracted DNA was quantified using a NanoDrop ND-1000 spectrophotometer. Fragmentation of the DNA for whole genome amplification was assessed using the Agilent Bioanalyzer DNA7500 Kit and subsequent fragmentation steps were omitted since the DNA already fulfilled the required fragment size. A total of 25 ng of DNA from each sample was amplified using the GenomePlex Complete Whole Genome Amplification Kit (Sigma Aldrich, Saint Louis, MO, USA) according to the manufacturer's instructions but using 16 cycles of amplification.
Equal amounts of amplified gDNA from three colonies per species were pooled and subjected to Cy3 and Cy5 labeling using the BioPrime Plus Array CGH Indirect Genomic labeling System (Invitrogen, Carlsbad, CA, USA) in order to account for intraspecific sequence variation. Labeling efficiency was analyzed using a Nano-Drop ND-1000 spectrophotometer.
The microarrays used in this study were generated as described in [9] and experiments were performed as follows. Appropriate Cy3 and Cy5 labeled DNAs were mixed together in a hybridization buffer containing 0.25% SDS, 25 mM HEPES and 3 × SSC, resulting in a final volume of 70 μl. The hybridization mixtures were boiled for 2 min at 99°C and allowed to cool at room temperature for 5 min. The cooled hybridization mixtures were pipetted under an mSeries Lifterslip (Erie Scientific), and hybridization took place in Corning hybridization chambers overnight at 55°C. Microarrays were washed once in 2 × SSC, 0.03% SDS heated to 55°C for 5 min. followed by one wash in 1 x SSC and another wash in 0.2 x SSC for 5 min each. The slides were kept in 0.2 × SSC until analysis. Slides were dried via centrifugation and scanned using an Axon 4000B scanner. The experimental setup followed a reference design, i.e., all samples were hybridized against the same pool of labeled A. palmata DNA. For each species, a total of four hybridizations were performed, including two dye swap hybridizations in order to account for potential dye bias i.e. two hybridizations with Cy3 labeled M. faveolata DNA against a Cy5 labeled A. palmata reference and two hybridizations with Cy5 labeled M. faveolata DNA against a Cy3 labeled Cy3 A. palmata reference were performed. The same hybridization scheme was used for A. cervicornis and S. radians.

Data extraction and analysis
Microarray slides were scanned as described in [10]. Spot intensities were extract and background subtracted using TIGR Spotfinder 2.2.3 [74]. The data were quality filtered, and normalized using TIGR MIDAS 2.21 printtip-specific LOWESS [74]. Data have been deposited NCBI's GEO [75] and are accessible through GEO Series accession number GSE37279. All clone sequences and annotations are available via the EST database: http://sequoia.ucmerced.edu/SymBioSys/index.php.
For all analyses, we only considered spots that were present in at least 3 out of 4 replicates. The log 2 ratios were averaged per species and the means were used as input for the GACK software [44]. The analysis was performed using the "Trinary Output" option, which classifies genes as either being present (1), slightly divergent (0) or highly divergent (−1). Cut-offs of 10% and 90% probability for present and highly divergent genes were used for subsequent analysis [44].
For the correlation analysis of sequence identity and hybridization signal ratio, the sequences of the probes spotted on the A. palmata array were blasted against a M. faveolata transcriptome data set and orthologs were determined by using reciprocal tBLASTx [76]. A total of 330 orthologs were identified, of which 193 had alignment lengths >200 bp, and were thus used for subsequent analysis. Plots and statistical analysis were performed using R [77]. Statistical analysis of the distribution of highly divergent and conserved genes across annotated and non-annotated genes was performed with GraphPad Prism 5 using a Chi square test (df = 1, p < 0.05).