Collection of Material
Gametes from the two coral species were collected during mass spawning events in the Florida Keys at Horseshoe and Key Largo Dry Rocks (Ap) and Little Grecian (Mf) in August 2003, 2004 and 2005, as described in Szmant et al. . Briefly, conical nets were suspended over spawning colonies to collect the positively buoyant gamete bundles. Gamete bundles from multiple colonies (and in the case of A. palmata, from multiple reefs) were combined within an hour to obtain cross-fertilization among different genets (these hermaphroditic species do not self-fertilize). Sperm concentrations were not measured but the gametes were kept concentrated in a ratio of 20 % gamete bundles to 80 % seawater to maintain high sperm concentrations. After one hour, sperm were washed out with several rinses of clean filtered seawater. Batches of fertilized eggs were put into 4 L plastic bins for culture at concentrations of about 2–3 thousand embryos per liter. Water was changed 2–3 times per day or more frequently if water quality conditions declined. M. faveolata larvae reached a swimming planula stage by 48 hours after fertilization, while those of A. palmata took about 60 hours.
At 5 days post-fertilization for M. faveolata and 9 days post fertilization for A. palmata, larvae were randomly assigned to one of two parallel treatments: one to be infected with Symbiodinium and one to remain non-symbiotic. Larvae assigned to the infection treatment were inoculated with 1000 cells/ml of either strain Cass KB8 (Clade A, isolated from the jellyfish Cassiopeia xamachana) for A. palmata. M. faveolata larvae were inoculated with strains SSPe, Mf10.14b, and 704 (Clade B, from the gorgonian Pseudopterogorgia elisabethae, M. faveolata, and the gorgonian Plexaura kuna, respectively). Both species were maintained in the presence of Symbiodinium until sampled 6 days (A. palmata) or 8 days (M. faveolata) later. Larvae in both treatments were maintained at the same densities, were subjected to the same schedule of water changes, and were sampled at the same time. Upon sampling, larvae were washed to remove exogenous Symbiodinium, removed from the water, and snap frozen.
cDNA library construction
We sampled tissue for construction of cDNA libraries from 5 different developmental/symbiotic life history stages for each species. For Acropora palmata we collected 1) freshly spawned eggs, 2) planula larvae at 96 hrs post-fertilization, 3) 15 day old larvae either infected with Symbiodinium ribosomal clade A strain Cass KB8 or 4) remaining uninfected, and 5) tissue from adult corals. For Montastraea faveolata we collected 1) freshly spawned eggs, 2) planula larvae at 60 hours post fertilization, 3) 13 day old larvae either infected with Symbiodinium ribosomal clade B strains SSPe, Mf10.146, and 704, or 4) remaining uninfected, and 5) tissue from an adult colony.
Total RNA was isolated from tissue samples using Qiazol reagent (Qiagen), according to manufacturer's instructions, and passaged through a 21G syringe to lyse the cells (larvae) or bombarded with glass beads to blast tissue off of the skeleton (adult colony). To remove residual phenol or other contaminants, the RNA was purified using an RNEasy clean up kit (Qiagen). Total RNA was quantified using a Nanodrop spectrophotometer, and RNA quality was assessed using an Agilent Bioanalyzer.
To construct the cDNA libraries, we used the Clontech SMART cDNA Library Construction Kit with the pDNR-lib vector. The cDNA was PCR-amplified using the Advantage 2 PCR kit, using the SMART 5' PCR III primer and CDS III/3' PCR primer, using between 18 and 26 cycles, depending on the starting amount of RNA. To minimize cloning incomplete or degraded transcripts, we preferentially selected cDNA > 500 bp, by first passing the SfiI-digested cDNA over CHROMA SPIN-400 columns, and then in some cases, cutting out a > 500 bp smear from a 1.1% agarose gel. The size-selected cDNA was ligated to the pDNR-lib vector. Electrocompetant cells were transformed with the vector, grown overnight in liquid suspension and then plated onto Teknova LB agar plates with 30 μg/ml chloramphenicol. Colonies were picked into 384 well plates using a QBot robot (Genetix), and were sequenced from both 5' and 3 ' ends on ABI 3730 Sequencers at the Joint Genome Institute (JGI).
EST Assembly and Generation of a Unigene dataset for each species
We generated a non-redundant set of genes from each of the two coral species, by grouping ESTs from all of the life history stages for each coral. For each species, EST clusters and consensus sequences were generated using the Joint Genome Institute's EST Analysis Pipeline, briefly described as follows. Base assignment and quality scores were assigned using the Phred software [49, 50]. Vector, linker, adapter, poly-A/T, and other artifact sequences were removed using the Cross_match software (available with the Phrap package), and an internally developed short pattern finder. ESTs shorter than 100 bp were removed from the data set, as were contaminant sequences, such as E. coli, common vectors, and sequencing standards. To enrich for nuclear protein-coding genes, we removed rRNA and mitochondrial DNA sequences from the EST dataset prior to alignment and clustering.
Pair-wise EST alignments were generated using the Malign software (Chapman, et. al., unpublished), a modified version of the Smith-Waterman algorithm  which was developed at the JGI for use in whole genome shotgun assembly. ESTs sharing an alignment of at least 96% identity, and 100 bp overlap are assigned to the same cluster. ESTs from the same cDNA were assigned to the same cluster even if they did not overlap. Consensus sequences for the EST clusters (unigenes) were generated using Cap3 . The resulting EST and unigene datasets is generally free from vector, E. coli, coral rRNA and mtDNA sequences, and therefore represents a predicted set of nuclear protein-coding genes and non-classified RNAs.
ESTs described in this paper are available through NCBI: GenBank accession DR982333-DR988505, EY021828-EY031784 and FE038910-FE040597.
Identification of Symbiont ESTs in the host unigene sets
To estimate the extent of contamination of host libraries from symbiont RNA, we performed a tblastx search of the coral unigenes against a Symbiodinium sequence database, consisting of 2002 nucleotide sequences available in GenBank for the genus Symbiodnium. Genes that were identified from this process were then checked by tblastx against nr to confirm that Symbiodinium or another dinoflagellate or plant was the best hit. Such genes were concluded to be contaminants from the symbiont. We identified only 1 cDNA in M. faveolata, which was clearly from Symbiodinium (top tblastx hit to a Symbiodinium cob). In A. palmata, we detected 3 protein coding genes from Symbiodinium: 4 cDNAs encoding peridinin chlorophyll protein (the major accessory pigment protein in Symbiodinium), 4 cDNAs encoding cob from Symbiodinium, and 1 cDNA encoding cox1 (top hit was to another dinoflagellate, Pfiesteria piscicida).
Homology searching and prediction of protein-coding genes and secreted proteins
To functionally characterize our unigene sets, we performed a blastx analysis (e-value cutoff 1e-5) against three databases: the GenBank non-redundant DNA and protein databases (nr), the Swiss-Prot database of manually curated protein sequences (swissprot), and the Gene Ontology database of controlled vocabulary terms that describe gene and protein attributes.
As only a subset of the unigenes returned significant hits to any of these databases, we took other approaches to identify unigenes that might function in the establishment and maintenance of the symbiosis. (1) We identified protein-coding ESTs from the set of unigenes that had had no blast hits, using the GETORF algorithm. Using this algorithm, we identified unigenes that contained at least a 300 nt open reading frame beginning with the ATG start codon. (2) We used SignalP to identify the canonical N-terminal amino acid motif that targets nascent proteins into the classical ER-secretory pathway. Secreted proteins and membrane-associated proteins are known from other systems to play a significant role in host-pathogen interactions, particularly in the initial recognition process, and are therefore of particular interest for studying the coral-Symbiodinium interaction.
A. palmata and M. faveolata transcriptomes
To compare the transcriptomes between the two species, we compared the A. palmata unigene dataset to the M. faveolata dataset, using KEGG-assigned categories , through submission of the unigene datasets to the KAAS web-based annotation tool. This method generates BLAST comparisons against the KEGG GENES database, to assign KEGG Orthology identities. Genes were summed to represent larger-order biological processes. To examine large scale differences in gene expression between life history stages, we used the more extensive dataset from A. palmata to compare large-scale patterns of gene expression in the four major life history stages that we sampled (egg, early planula stage, late planula stage, and adult). We used the Gene Ontology assignments to classify the libraries by GO-defined Biological Process and Gene Ontology Terms which were mapped to larger order biological processes.
We then identified developmental stages that contained a significantly higher or lower number of genes functioning in each biological process, using GeneMerge, a statistical tool for generating rank scores for over-representation in each study set of genes (each developmental stage), by comparing it to the whole population of unigenes (all developmental stages) .
Prediction of candidate symbiosis genes
We searched for genes that may be functioning in regulating symbiotic interactions between the host and symbiont by examining EST datasets for those unigenes that were expressed in the two symbiotic stages that we sampled (LPI, AC), and were not expressed in any of the stages that lack symbionts (E, P, LPU) (See Table 1 for abbreviations). We cannot exclude the possibility that some of the putative symbiosis-related ESTs are actually from the symbiont, although there was no evidence from the blast results to support this, as the top BLAST hits were to animal genes.
Comparisons of the cellular and molecular basis for pathogenic vs. mutualistic animal-bacterial associations are revealing that host-bacterial interactions are structured similarly regardless of whether the interaction is mutualistic or pathogenic. As a result, it is extremely useful as a starting point to identify components of the innate immune and host response systems that have been identified from pathogenic or parasitic associations. The coral-dinoflagellate symbiosis adds a new dimension to understanding host-microbial associations, as the coral microbial partner is a eukaryote (related to the apicomplexan parasites such as Plasmodium and Toxoplasma), rather than a prokaryote. To identify candidate genes that may function in the host-microbe interaction processes, we examined our EST datasets to identify 1) protein domains (search against Pfam database of protein domains) and genes (blastx against nr and Swissprot) that are known to play roles in 1) cell-cell or pattern recognition interactions, and 2) signaling pathways that are known to be involved in host response to microbial infection.
Comparisons between cnidarian datasets
To identify potentially significant genes involved in the evolution of corals, we performed analyses to identify genes in common to scleractinians that appear to be under positive selection. Ferritin type I corresponds to DR984234 in A. palmata and DY579151 in A. millepora. Ferritin type II corresponds to EST accession DR985990 in A. palmata and DY577778 in A. millepora. A tblastx of A. palmata against Nematostella ESTs identified homologues of ferritin type I and type II in N. vectensis. The best tblastx matches that are in the NCBI nucleotide database were used (N. vectensis ferritin type I: gi|82866539|, E-value 4e-57; N. vectensis ferritin type II: gi|82875723|, E-value 9e-77).
We tested for evidence of positive selection by comparing the nonsynonymous substitution rate (dN) to the synonymous substitution rate (dS). We used site-specific Maximum Likelihood models (ML) to detect positive selection on specific amino acids. We implemented models M1a (neutral) and M2a (selection) and M7 (beta) and M8 (beta&ε) [55–57]with the codeml program in PAML . Alignment gaps and ambiguity characters are removed in PAML prior to dN/dS calculations. Data analyses and computer simulations have showed that these pairs of site models are well suited to detect positively selected sites [56, 59–61]. A likelihood-ratio test (LRT) was used to compare the neutral with the corresponding selection models. The test statistic -2ΔlnL follows a χ2 distribution with critical values to be 5.99 and 9.21 at 5% and 1% (df = 2). When the LRT is significant a Bayes Empirical Bayes (BEB) procedure was used to identify amino acid under positive selection. Overall dN/dS values ('one-ratio' model) were calculated with the Model M0. The F3x4 model of codon frequencies was used.
To identify genes that may be restricted to scleractinian corals, we identified unigenes that were highly similar between A. palmata and M. faveolata but that had no similarity to any sequences in nr, nor to the assembled genome of the sea anemone Nematostella vectensis or the assembled EST dataset from the hydrozoan Hydra magnipapillata.