Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral , Acropora hyacinthus

Background Corals are capable of launching diverse immune defenses at the site of direct contact with pathogens, but the molecular mechanisms of this activity and the colony-wide effects of such stressors remain poorly understood. Here we compared gene expression profiles in eight healthy Acropora hyacinthus colonies against eight colonies exhibiting white syndrome-like symptoms, all collected from a natural reef environment near Palau. Two types of tissues were sampled from diseased corals: visibly affected and apparently healthy tissues. Results Tag-based RNA-Seq followed by weighted gene co-expression network analysis identified groups of co-regulated differentially expressed genes between all disease states (diseased, ahead of the lesion, and healthy). Most of the differentially expressed genes were found between tissues at the lesions and asymptomatic (healthy and ahead of the lesion) tissues. These genes were related to innate immunity, oxidative stress responses, lipid metabolism, and calcification. Network analysis also revealed groups of genes regulated specifically in the tissues from diseased colonies that were not yet showing obvious symptoms of disease, indicating a systemic response to infection. Conclusions These observations suggest that tissues ahead of the lesion of disease progression exist in a transitional state between health and lesion appearance. Alternatively, these gene expression profiles capture physiological differences between colonies with varying disease susceptibilities.


Background
Corals are capable of launching diverse immune defenses at the site of direct contact with pathogens, but the molecular mechanisms of this activity and the colony-wide effects of such stressors remain poorly understood.Here we compared gene expression profiles in eight healthy Acropora hyacinthus colonies against eight colonies exhibiting white syndrome-like symptoms, all collected from a natural reef environment near Palau.Two types of tissues were sampled from diseased corals: visibly affected and apparently healthy tissues.

Results
Tag-based RNA-Seq followed by weighted gene co-expression network analysis identified groups of co-regulated differentially expressed genes between all disease states (diseased, ahead of the lesion, and healthy).Most of the differentially expressed genes were found between tissues at the lesions and asymptomatic (healthy and ahead of the lesion) tissues.These genes were related to innate immunity, oxidative stress responses, lipid metabolism, and calcification.Network analysis also revealed groups of genes regulated specifically in the tissues from diseased colonies that were not yet showing obvious symptoms of disease, indicating a systemic response to infection.

Conclusions
These observations suggest that tissues ahead of the lesion of disease progression exist in a transitional state between health and lesion appearance.Alternatively, these gene expression profiles capture physiological differences between colonies with varying disease susceptibilities.

Background
Increasing rates of disease have contributed greatly to global coral population declines over the last few decades [1,2].The broadly defined "white syndrome" in Indo-Pacific regions, characterized in the field by tissue loss resulting in exposure of the coral skeleton, has been attributed to Vibrio spp.[3], a genus of bacteria involved in several coral diseases [4][5][6][7][8].Other reports find no evidence of pathogenic bacteria in diseased corals [9] and instead blame stress-triggered programmed cell death for the manifestation of symptoms [10].These conflicting conclusions, drawn mostly from culturing assays and histological observations, are further confounded by insufficient knowledge of the cnidarian immune response.
Corals, like all invertebrates, rely entirely on innate immunity for protection from invading pathogens.Features of innate immunity in corals include physical barriers [11], molecular pattern recognition [12], secretion of antimicrobial macromolecules [13], and cellular responses (e.g., phagocytosis) [14][15][16].Recent efforts to characterize those features of immunity using various cnidarian genome and transcriptome sequence databases have identified putative components of coral stress management and immune response pathways by homology with better-studied organisms.The Acropora digitifera genome project revealed striking differences in innate immunity complexity in corals compared to a closely related cnidarian, Nematostella vectensis [17].Whereas the N. vectensis genome encodes only a single Toll/Toll-like receptor (TLR), the A. digitifera genome included at least four TLRs, along with other related immune signaling molecules.Miller et al. reported the presence of TLR signaling components, including adaptor proteins that link that cascade with other signaling events, in the expressed peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/012211doi: bioRxiv preprint first posted online Dec. 5, 2014; sequence tag library of another acroporid coral, A. millepora [12].Together these elements suggest an ability of corals to respond to pathogen-associated molecular patterns (PAMPs) via TLR recognition and integrate that signal to cellular responses such as inflammation and apoptosis.Toll/TLR signaling can activate NF-κB transcription factor that, upon nuclear localization, up-regulates transcription of immune response genes.In corals, the identities of those response genes and the roles they play remain unclear.Some suggested immune response genes include lectins, complement c3 and apextrin, proteins involved in non-self recognition, aggregation and cell lysis respectively [18].Metabolism and calcification genes demonstrated differential expression in addition to immunity genes in A. millepora challenged with bacterial and viral immunogens, providing a more comprehensive picture of cellular events during an acute infection [19].
Global RNA-sequencing of A. cervicornis displaying signs of White Band Disease (WBD) revealed that disease significantly affected the expression of genes involved in immune processes and apoptosis [20].The up-regulation of phagocytic cell surface receptors and reactive oxygen species (ROS) producing enzymes suggested that the phagocytosis and degradation of damaged cells drives the WBD response in corals.
These coral sequencing projects and experimental immune challenges have provided conclusive evidence that corals are capable of launching defensive responses upon direct contact with pathogens.A coral's ability to communicate the recognition of that pathogen along the colony, however, is less understood.Coral polyps utilize a gastrovascular system lined with flagellated gastrodermal cells to transport organic products and zooxanthellae within the colony [21].These channels are used to allocate energetic resources to areas that need them most, such as fast-growing branch tips [22-peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not 24] and wounded regions [25].Radiolabeled carbon accumulation experiments have shown that corals preferentially direct energetic resources towards physically damaged regions [25] but away from disease-induced lesions [26].These findings suggest that healthy coral tissues might possess means to detect and respond to an advancing disease lesion, but it is still unclear what the physiological consequences of this action might be. Here we examine the gene expression profiles of A. hyacinthus affected with white syndrome-like symptoms (Figure 1A) to determine the molecular consequences of the disease.White syndrome advances along a colony in a way such that a distinct lesion forms between affected and unaffected tissues (Figure 1B).Tissues ahead of the lesion are presumably healthy, while tissues at the lesion boundary are actively sloughing cells in response to infection.We compared gene expression profiles among three disease states: (1) affected and (2) apparently healthy tissues from diseased colonies and (3) tissues from completely unaffected colonies.Comparing healthy regions of diseased colonies to completely disease-free individuals provided an opportunity to look for expression patterns that might indicate disease susceptibility and/or a colony-wide systemic effect of infection.We used tag-based RNA-Seq, which allows systems-level investigations, to probe all cellular responses to infection.Gene co-expression network analysis revealed groups of co-regulated genes with shared cellular functions correlating to different disease states.We compiled this information to provide a system-wide assessment of the systemic molecular response to chronic disease in corals.
peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.

Differential Gene Expression between Disease States
Sequencing yielded an average of 6,367,219 reads per sample.An average of 19.5% of these remained after filtering and of these, an average 31.45%mapped to the transcriptome.A total of 44,701 isogroups (clusters of contigs representing the same gene, from here on referred to as "genes") were detected.Reads were converted to unique transcript counts by removing PCR duplicates, yielding an average of 78,325 counts per sample.Technical replicates were summed for a final average of 156,650 unique transcript counts per sample.A generalized linear model with contrasts between all three tissues detected differentially expressed genes between disease states (Additional Between all contrasts, a total of 757 genes passed the FDR cutoff of 10%. peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/012211doi: bioRxiv preprint first posted online Dec. 5, 2014; Principal coordinate analysis of the variance-stabilized data for all genes revealed expression differences mainly between disease (D) and the other two health states (Figure 2A).PCoA using the 3827 isogroups with an unadjusted p-value of less than 0.05 for any contrast revealed more differences between disease states, but a significant overlap in expression of healthy and AL tissues remains (Figure 2B).peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.

Gene Ontology (GO) Enrichment
Functional enrichments between all three contrasts allow a general examination of the molecular functions and biological processes being differentially regulated between disease states.The enriched groups of both the disease-healthy and disease-AL contrasts were largely identical (Figures 3 and 4).peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/012211doi: bioRxiv preprint first posted online Dec. 5, 2014; Ribosomal subunits, oxidative stress responses, and translation factor activity were up regulated in diseased tissues compared to both AL and healthy tissues.Likewise, receptor activity, regulation of biological quality, and extracellular matrix components (collagens) were down regulated in diseased tissues compared to both healthier states.
Dissimilarities in enriched functions between the diseased-healthy and diseased-AL contrasts were few, reflecting the fact that healthy and AL states were quite similar.However, there were some differences in functional enrichment.Unfolded protein binding was enriched for the comparison between healthy and diseased tissues (Figure 3), while tRNA metabolism was enriched only between diseased and AL tissues (Figure 4).
To further investigate these differences between health and AL tissues we performed GO enrichment analysis on genes based on the p-value generated for the healthy-AL contrast.
However, no enriched GO terms were identified with an adjusted p-value of less than 0.1.

Gene Expression Analysis by Contrast
Hierarchically clustered gene expression heatmaps were constructed to show the relative expression patterns of the top most significant annotated DEGs for each contrast (Figures 5-7).Complete lists of all annotated DEGs with log fold changes for each contrast can be found in Additional Files 2-4.

Diseased vs. Healthy
Genes found to be up regulated (Benjamini-Hochberg FDR < 0.01) in diseased tissues compared to healthy corals include key members of the oxidative stress response in corals (e.g., catalases and peroxidases) and pentose phosphate metabolism (transketolase, transaldolase, and 6-phosphogluconate dehydrogenase).Both proteinases (astacin and cathepsin L) and protease inhibitors (alpha-macroglobulin and serine proteinase inhibitor Ku-type) were up regulated in diseased tissues.C-type lectin, a carbohydrate-binding protein, and malate synthase, a key enzyme of the glyoxylate cycle, were also up regulated in symptomatic tissues.Down regulated genes (Benjamini-Hochberg FDR < 0.01) include those encoding extracellular matrix constituents (collagens, heparin sulfate proteoglycans) and carbonic anhydrase, a key enzyme in coral skeletal deposition.Red fluorescent protein was also down regulated in diseased tissues, a hallmark of the coral stress response [27][28][29][30][31]. peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.

Diseased vs. AL
The expression differences between diseased and AL tissues within a colony paralleled the expression differences between diseased tissues and healthy corals.At the same significance threshold (Benjamini-Hochberg FDR < 0.01), almost the exact same genes were identified (Figure 6).Catalases, peroxidase, alpha-macroglobulin, cathepsin L, and malate synthase were up regulated in diseased tissues compared to tissues ahead of the lesion.Likewise, fluorescent proteins (red and green/cyan) and extracellular matrix components were down regulated in diseased compared to AL tissues.peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.

AL vs. Healthy
Although no differentially expressed genes passed the Benjamini-Hochberg 10% FDR when comparing the AL tissues and healthy corals, at a more lenient unadjusted p-value threshold of 0.01 we see that AL tissues appear to down-regulate hypoxia-inducible factor prolyl 4-hydroxylase and up-regulate a caspase, metalloendopeptidase and transcription factor AP-1.The absence of DEGs passing the FDR cutoff and paucity of DEGs at a more lenient threshold reaffirm the finding that gene expression is broadly similar between healthy corals and corals ahead of progressing disease lesions.However, slight signatures of differential gene expression suggest that there might still be some systemic physiological effect of nearby disease symptoms along a coral colony, but DESeq2 analysis alone does not provide enough power to detect it.

Correlation Between Gene Network Modules and Disease States
A total of 6737 DEGs with unadjusted p-value < 0.1 were input into WGCNA for network analysis.One sample (diseased individual "4") was identified as an outlier and removed from subsequent analysis.Twelve unique modules, denoted by arbitrarily assigned colors, remained after merging highly correlated modules (Additional File 5).
Of these twelve modules, eight were highly correlated to a single coral individual and one (grey) is reserved to contain genes that do not fall into any co-expression module.The remaining three modules were highly correlated with the disease states.Since we assembled these modules using a signed network, the sign of the correlation is equivalent to the direction of expression change with respect to the trait.For example, a module that is significantly negatively correlated to diseased corals contains genes that are down regulated in that state.
The dark green module (1155 genes) strongly correlated with diseased and healthy coral.
The genes within this module are up regulated in diseased tissues and down regulated in healthy tissues.Similarly, the turquoise module (669 genes) also correlated with diseased and healthy corals but with the reverse sign: these genes are up regulated in healthy tissues and down regulated in diseased tissues.Notably, one module was identified (green, 661 genes) that was significantly up-regulated in AL and (to a lesser extent) down regulated in diseased tissues, while remaining unchanged in healthy tissues.

Within Module Gene Expression Analysis
Hierarchically clustered gene expression heatmaps were constructed to show the relative expression patterns of the genes within each module that best represent the module and show significant correlations to the disease state, based on module membership and gene significance values (Additional File 6 and Figure 8).peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.

Discussion
The major pattern of variation in gene expression was between asymptomatic (healthy and AL) and diseased (D) corals (Figure 2).Contrasts between diseased corals with both healthy and ahead-of-the-lesion tissues produced many differentially regulated genes.
The differentially regulated genes between diseased and asymptomatic tissues represent elements of an immune response and a general stress response.Contrasts between healthy and AL gene expression generated far fewer differences, suggesting that these tissues are physiologically similar.However, gene co-expression network analysis and subsequent correlation to disease states revealed expression patterns distinct to each of the three states.

Diseased tissues up regulate immune response elements
Innate immunity provides immediate protection against non-self and responds to physical injury.Three general steps are involved in an innate immune response: detection, defense activation, and effector responses to neutralize the threat.Tissues sampled from the lesion of disease progression in corals experiencing white syndrome-like symptoms have enhanced expression of genes involved in each of these three immune response phases.C-type lectins act as pattern recognition receptors to activate pathogen elimination through phagocytosis in invertebrates [32].Cnidarian genomes encode ctype lectin genes with highly variable substrate regions, leading to hypotheses that these proteins recognize a large variety of pathogens [33].In A. millepora, mannose-binding C-type lectins have been shown to respond immediately following an immune challenge (only 45 minutes after lipopolysaccharide injection in [34]), but show no significant response at later time points [35].The up regulation of C-type lectins in tissues at the peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.lesion may suggest that these tissues have very recently become infected.The second phase of an immune response prepares targets for elimination via antimicrobial peptide synthesis and immune cell activation.While this experiment did not discover any differentially regulated antimicrobial peptides, we do detect the activation of immune activating proteins C4, alpha-macroglobulin and CD109 in diseased tissues.The lectin pathway of immune activation is triggered by lectins binding a pathogen-associated molecule and results in the activation complement component factor C4 and C3 [36].
These complement factors, along with alpha-macroglobulin and CD109, tag pathogens and secreted proteases for elimination.In the final phase of an innate immune response, foreign organisms are engulfed and destroyed by phagocytic immune cells.Lysosomes within these phagocytic cells contain proteins capable of degrading engulfed material via the production of reactive oxygen species (ROS) or proteolytic enzymes, such as cathepsins.The up regulation of cathepsin L in diseased tissue may be a consequence of such phagocytic activity.

Glyoxylate cycle and gluconeogenesis transcripts are up regulated in diseased tissues
Transcripts involving lipid and carbohydrate metabolism (triacylglycerol lipase, phosphoenolpyruvate carboxykinase), including glyoxylate cycle metabolism (malate synthase), were up regulated in diseased tissues compared to asymptomatic tissues.The differential regulation of these metabolic genes suggests that diseased corals may be utilizing stored energy reserves more than healthy corals.Fatty acids derived from stored lipids are oxidized by the beta-oxidation pathway and release acetyl-CoA to enter the citric acid cycle.Both carbons in one molecule of acetyl-CoA are consumed during the decarboxylation steps of the citric acid cycle and energy is released.The glyoxylate cycle is an alternative route through the citric acid cycle that allows organisms to thrive on two-carbon sources by catalyzing the conversion of acetyl-CoA to malate and succinate via a glyoxylate intermediate, bypassing the decarboxylation steps in the citric acid cycle [37].These four carbon compounds contribute to the energetic requirements of the cell and serve as building blocks for cellular components, fulfilling all the same necessities of the citric acid cycle without the need to replenish oxaloacetate from the diet.One of the most significantly up regulated transcripts in diseased coral tissues was malate synthase, one of the key enzymes of the glyoxylate cycle along with isocitrate lyase.Glyoxylate cycle enzymes are rare throughout the animal kingdom, but bioinformatic analyses suggest they exist in cnidarians [38].As additional support for a potential role of glyoxylate cycle metabolism in coral stress responses, glyoxylate cycle transcripts were up regulated in A. palmata larvae subjected to thermal stress [39].In higher plants, glyoxylate enzymes are active when the cell is switching from photosynthetic production of sugars to scavenging pathways from stored and structural lipids, as in starvation and/or senescence [40].In corals, this metabolic shift might indicate a decline in shared energy reserves with zooxanthellae, presumably due to stressinduced symbiont loss.

Oxidative stress response genes are up regulated in diseased tissues
Reactive oxygen species (ROS) are produced as a consequence of fatty acid oxidation.
The up regulation of antioxidants that protect the cell from these harmful byproducts corals could be a consequence of increased fatty acid metabolism.This explanation coincides well with the observed up regulation of lipid metabolism and antioxidant (catalase, peroxidase) transcripts in diseased corals.The production of ROS is also a fundamental element of the innate immune response.While ROS are capable of neutralizing phagocytized pathogens, the harm they cause to the host must be countered if an organism is to withstand its own immune response.Catalases and peroxidases capable of hydrolyzing harmful peroxides provide a mechanism of such self-protection.The upregulation of oxidative stress response genes is well characterized in corals experiencing thermal stress [41], physical stress [42], and infectious disease [20].Thus the identification of differentially expressed ROS response genes was anticipated, but it is interesting to note that up regulation of these genes was confined to diseased tissues and was not seen elsewhere in symptomatic colonies.A possible explanation for this finding is that asymptomatic tissues near the lesion are not directly encountering a bacterial pathogen.

Matrix metalloproteinases are up regulated in diseased tissues
Stony corals are subject to many potential sources of physical injury such as predators [43], burrowing bivalves [44] and storms [45,46].The tissue regeneration mechanisms employed by corals that have sustained a physical injury are common to wound-healing processes across metazoans [47].One of these steps involves restructuring of the extracellular matrix to encourage tissue regeneration.Matrix metalloproteinases (MMPs) are a group of enzymes capable of such activities and have been shown to play a direct role in wound repair in Hydra [48].MMPs also act on pro-inflammatory cytokines to direct inflammation due to wounding and innate immune responses to pathogens (reviewed in [49]).The up regulation of MMPs in response parasitic protists in a gorgonian coral suggests that these proteins are active in the immune response of cnidarians [50].Astacin and gelatinase have matrix metalloproteinase activities and were up regulated in affected coral tissues in this study.These activities suggest that these tissues at the disease lesion are launching an active immune defense and/or regenerating lost tissue.Inhibitors of protease activity were also up regulated in diseased coral tissues.
One of these proteins, the Kunitz-type serine protease, exhibited antimicrobial activity in Hydra [51].Another up regulated protease inhibitor, alpha-macroglobulin, is a vital component of the innate immune response that inactivates bacterial secreted proteases, thus compromising their virulence [52].The up regulation of alpha-macroglobulin in diseased (but not healthy or AL) tissues supports the notion that only diseased tissues are in direct contact with bacterial pathogens.The activation of these protease-inhibitors implies immune function in visibly infected tissues, but not necessarily in regions ahead of the disease lesion.

Calcification genes are down regulated in diseased tissues
Calcification rates in reef-building corals are sensitive to several environmental variables such as light, pH, and temperature [1,[53][54][55].While this experiment did not directly measure coral calcification, the identification of DEGs with functions in biomineralization suggests that disease negatively impacts coral skeletal deposition.
Both general extracellular matrix structural components and coral-specific calcification functions were differentially regulated in diseased tissues compared to asymptomatic tissues.Coral biomineralization is directed by an extracellular skeletal organic matrix comprised of secreted glycosylated proteins [56].These proteins include collagens and negatively charged macromolecules (like chondroitin sulfate proteoglycans) that bind calcium ions to aid in crystal formation [57].Several collagens and a protein with high similarity to nematogalectin, a collagen family protein that forms a major structural component in Hydra nematocyst tubules [58], were down regulated in diseased tissues.
The down regulation of these genes in diseased tissues suggests a weakening of the coral skeletal organic matrix and thus a diminished capacity for biomineral deposition.The potential impact of disease on coral skeletal growth is most clearly revealed by the down regulation of carbonic anhydrase, an enzyme that plays a fundamental role in mediating bicarbonate supplies for calcification in scleractinian corals [59][60][61].

AL gene expression: a systemic response to disease or factors contributing to disease susceptibility?
Recent studies investigating the colony-wide transcriptomic effects of yellow band disease (YBD) in Orcibella faveolata revealed differences between healthy tissues, diseased tissues, and apparently healthy tissues ahead of the yellow band lesion [62].

YBD-infected corals displayed intermediate expression patterns where expression ahead
of the lesion responded more similarly to healthy corals in some genes and more similarly to diseased corals in others.Only one DEG was identified between ahead of the lesion and diseased tissues.Conversely, in our study gene expression between asymptomatic (healthy and AL) and diseased corals dominates the observed transcriptomic response.However, principal coordinate analysis and gene co-expression network analysis both demonstrate differences between healthy and AL tissues.Coral tissues ahead of the lesion of disease progression tend to show more variability in gene expression than both entirely healthy and diseased corals (Figure 1).Expression variation between individuals in the intermediate disease state could signify that these tissues are at different stages in a transition to lesion appearance.This variation may explain the lack of power to detect significantly differentially expressed genes between AL and healthy tissues in DESeq2 analysis.However, gene co-expression network analysis was sensitive enough to identify modules correlated (post hoc) with all three coral conditions.The module that correlated specifically with the AL state confirms that these tissues are distinct from both diseased and healthy tissues and may be responding to the adjacent threat.One of the genes that best represent the AL-correlated module encodes hypoxia inducible factor prolyl 4hydroylase (HIF-P4H).HIF-P4H was up regulated in AL tissues compared to both diseased tissues from the same colonies and healthy individuals.These proteins regulate gene expression in response to cellular oxygen levels and inflammation [63].HIFs rapidly accumulate in low oxygen environments and induce hypoxia-responsive genes that function in oxidative stress responses, glucose metabolism, extracellular matrix homeostasis, and innate immunity (reviewed in [64]).HIF-P4Hs are responsible for maintaining appropriate levels of HIFs in response to oxygen availability in the cell [65].
In normoxic conditions HIF-P4H modifies the alpha subunit of HIF-1 in an oxygendependent reaction, leading to its eventual degradation in the proteasome.A study performed in rat cell culture observed that the expression of prolyl hydroxylases is up regulated in hypoxic environments, possibly as a mechanism to halt hypoxic signaling in freshly re-oxygenated cells [66].The up regulation of HIF-P4H in apparently healthy tissues ahead of the lesion therefore suggests that these tissues are responding to variable peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not oxygen availability.HIF-P4H has also been shown to negatively regulate immune responses by modifying the kinase responsible for releasing NF-κB from its inhibitor in an oxygen-dependent reaction [67].
White syndrome is a general term that refers to both active and recovering disease lesions [68].Since we did not monitor the colonies after sampling it is unknown whether the gene expression response in tissues ahead of the lesion describes tissues that are preparing to defend against or successfully overcoming an infection.The transitional gene expression profile of the AL state could signify either a return to homeostasis or an oncoming immune challenge.Yet one more possible explanation is that the differences in gene expression profiles between AL tissues and healthy corals reflect baseline differences in susceptibility that allowed the healthy corals to resist infection.In other words, the differential regulation of these genes could be either a consequence of disease progression, recovery from the disease, or an indicator of susceptibility.

Conclusions
Our gene expression analysis identified molecular processes specific to corals affected with white syndrome-like symptoms.Information provided by this analysis complements other investigations of coral disease responses to better understand the physiological consequences of infection in both visibly affected tissues as well as apparently healthy tissues adjacent to the lesion of disease progression.An increased knowledge of the systemic molecular effects of coral disease will contribute towards the development of diagnostic tools to predict and manage coral disease outbreaks.

Sampling
Coral fragments from 16 colonies of A. hyacinthus were sampled in the spring of 2011 along the eastern coast of Palau (7° 18.738' N, 134° 30.423'E) and immediately stored in RNAlater (Ambion).Eight of these colonies were visibly affected with white syndrome-like symptoms.The remaining eight colonies were completely symptom-free (designated healthy, H).From the eight affected colonies, coral fragments were sampled from both the lesion interface between diseased and healthy tissues (diseased, D) and areas well ahead of the lesion (AL, Figure 1B).AL coral fragments were sampled from approximately midway between the lesion boundary and the edge of the colony.AL tissues are unlikely to be in direct contact with pathogen since previous studies have demonstrated declines in pathogens only ~1 cm in advance of a white syndrome disease lesion [69].

Transcriptome Assembly and Annotation
The A. hyacinthus transcriptome has been generated from 5-day aposymbiotic larvae as described previously [70].It was annotated based on two resources: the proteome of the starlet sea anemone Nematostella vectensis [71] and in-depth annotations of the Acropora digitifera proteome [72].Based on manual verifications of a subset of A. digitifera annotations, they were pre-filtered to include only protein sequences longer than 60 amino acids with the annotation assigned based on the listed e-value = 1e-20 or better.
The GO, KEGG ("Kyoto Encyclopedia of Genes and Genomes"), KOG ("euKaryotic orthologous groups"), and gene name annotations were transferred to an A. hyacinthus contig if the contig matched one or both of these two resources with e-value = 1e-4 or better in blastx [73].The GO and KOG annotations assigned to genes that were denoted FOG ("fuzzy orthologous group", [74]) in the N. vectensis data were removed, since such genes encode proteins with common domains and cannot be functionally annotated based on homology alone.The annotated A. hyacinthus transcriptome has been released for unrestricted use prior to this publication, http://www.bio.utexas.edu/research/matz_lab/matzlab/Data.html.

Tag-based RNA-Seq
Duplicate libraries were prepared following [75] and sequenced using Applied Biosystems SOLiD v.3 platform.Read trimming, quality filtering, mapping, and conversion to per-gene counts was performed as described previously [75] with one modification: reads mapping to the same starting coordinate in the reference and aligning with 100% identity along the full length of the shorter read were discarded as potential PCR duplicates.The current step-by-step library preparation protocol as well as bioinformatic pipeline are available at https://sourceforge.net/projects/tag-based-rnaseq/ ; note however that the current version of the tag-based RNA-seq method assumes sequencing on the Illumina HiSeq instrument.

Identification of Differentially Expressed Genes (DEGs)
All statistical analyses were performed using R3.1.1 [76].DEGs were identified using a generalized linear model implemented by the R package DESeq2 [77].No outlying samples were detected by the arrayQualityMetrics package [78].DESeq2 performed peer-reviewed) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not .http://dx.doi.org/10.1101/012211doi: bioRxiv preprint first posted online Dec. 5, 2014; automatic independent filtering to remove lowly abundant transcripts and maximize the rate of DEG discovery post multiple testing correction at an alpha of 0.1.P-values for significance of contrasts between all three disease states were generated based on Wald statistics and were adjusted for multiple testing using the false discovery rate method [79].The contrasts resulted in tables including adjusted and unadjusted p-values and log 2 fold changes that were used in downstream analyses.

Gene Coexpression Network Analysis
A weighted gene coexpression network analysis (WGCNA, [80]) was used to identify groups of co-regulated genes in an unsupervised way.Genes with an unadjusted p-value < 0.1 for any of the three contrasts as determined by the generalized linear model testing for the effect of disease state were input into WGCNA.A sample network was constructed to identify outlying samples with a standardized connectivity score of less than -2.5 [81].A signed gene co-expression network was constructed with a soft threshold power of 24.Groups of co-regulated genes (modules) correlated with each other with the Pearson correlation coefficient 0.42 or better were merged.The eigengenes of the resulting modules (the first principal component of the expression matrix corresponding to the genes included in the module) were correlated with disease states (H, AL, or D).

Assessing the Robustness of the Analysis
Low quality of the SOLiD sequencing data resulted in low number of reads mapped, raising concerns about the reliability of the data.In tag-based RNA-seq, unlike standard RNA-seq, every count represents an observation of an independent transcript.Thus low counts could still provide sufficient quantitative information about transcript abundances.
In addition, high level of biological replication (n=8 per group) in our experiment should have compensated somewhat for the low counts within each replicate.To confirm that low counts did not result in inflated false discovery rate, we have simulated a series of count datasets based on the empirical per-gene total counts and coefficients of variation across samples as well as empirical sample size factors, which included no effect of experimental conditions (Additional File 7).Analysis of these simulated datasets recovered nearly identical sample size factors and highly similar dispersion estimates as in real data (Additional File 8).When these simulated datasets were analyzed with DESeq2 using the same models as real data, at most four genes passed the 10% Benjamini-Hochberg false discovery rate (FDR) cutoff for each contrast.Compared to 646 genes passing the FDR 10% cutoff for the disease-healthy comparison and 333 genes passing the same cutoff for the disease-AL comparison in the real dataset this is much less than 10%, indicating that the real data analysis was conservative.The simulationbased p-value cutoff achieving the empirical 10% FDR (Additional File 9A-C) would have yielded 1.05-1.5Xmore DEGs for these comparisons than the Benjamini-Hochberg procedure.Notably, the Benjamini-Hochberg correction did not yield any DEGs for the healthy-AL comparison, and accordingly, the DEG discovery rate in this comparison was even slightly lower than the simulation-based false discovery rate (Additional File 9C), indicating that for this comparison DESeq2 analysis did not provide sufficient power.To keep the analysis conservative, we chose to report DESeq2-based DEGs discovered using the Benjamini-Hochberg procedure.
WGCNA analysis provides one additional confirmation that the observed gene expression differences are driven by biological factors rather than stochasticity.
WGCNA constructs gene co-expression modules from their correlation pattern across samples without using the information about how the samples are distributed among experimental conditions.The fact that post hoc the module eigengenes correlate strongly with coral condition (Additional File 5) indicates that the gene expression patterns in the data truly reflect the biological processes related to disease.To verify this, we shuffled the condition designations among samples and indeed observed that the correlations with co-expression modules disappeared (Additional File 10).

Principal Coordinate Analysis
Principal coordinate analysis to visualize clustering of gene expression between disease states was performed using the "adegenet" package [82] using variance stabilized data for all genes and subsequently with only candidate differentially expressed genes (unadjusted p-value < 0.05), based on Manhattan distances which correspond to the sum of absolute log-fold changes across all genes.Effects of the three disease states ("D", "H", and "AL") were calculated using the multivariate analysis of variance function "adonis" of the R package "vegan" [83].

Functional Summaries
Gene ontology enrichment analysis was performed using adaptive clustering of GO categories and Mann-Whitney U tests [84] based on ranking of signed log p-values (GO-MWU, https://sourceforge.net/projects/go-mwu/).Gene expression heatmaps with hierarchical clustering of expression profiles were created with the pheatmap package in R [85].

Authors Contributions
GVA prepared the libraries for sequencing, EM assembled the transcriptome, RMW conducted the tag-based RNA-seq data analysis.MVM supervised the study, annotated the transcriptome, and performed simulations.MVM and RMW wrote the manuscript.

Figure 1 :
Figure 1: Directional lesion progress along A. hyacinthus colonies affected with white syndrome-like symptoms (A).Coral fragments were sampled at the lesion boundary (diseased, "D") or ahead of the lesion ("AL") (B).Tissues were also sampled from completely healthy individuals ("H", not pictured).Photo credit: Carly Kenkel.

File 1 )
. The disease-healthy contrast yielded 646 DEGs passing a Benjamini-Hochberg FDR cutoff of 10% and 2459 DEGs with an unadjusted p-value of less than 0.05.The disease-AL contrast yielded 333 DEGs passing an FDR cutoff of 10% and 2054 DEGs with an unadjusted p-value of less than 0.05.No DEGs passed the 10% FDR cutoff for the healthy-AL contrast, but 777 DEGs had an unadjusted p-value of less than 0.05.

Figure 2 :
Figure 2: Principal coordinate analysis of variance stabilized data clusters samples by disease state.Samples by the presence of disease symptoms (D vs. AL and H) when all isogroups are included in the PCoA (A).Differences between disease states become more evident when PCoA is performed on DEGs (unadjusted p-value < 0.05) only (B).

Figure 3 :
Figure 3: GO enrichment by molecular function (MF), biological process (BP), and cellular component (CC) of genes by p-value generated for the disease-healthy contrast.The color of the text indicates the direction of expression change between disease and healthy tissues (red = up-regulated in disease, blue = down regulated in disease).The size of the text indicates the significance of the term as indicated by the inset key.The fraction preceding the GO term indicates the number of genes annotated with the term that pass an unadjusted p-value threshold of 0.05.The trees indicate sharing of genes among GO categories (the categories with no branch length between them are subsets of each other).

Figure 4 :
Figure 4: GO enrichment by molecular function (MF), biological process (BP), and cellular component (CC) of genes by p-value generated for the disease-AL contrast.The color and font representations are the same as described in Figure 2.

Figure 5 :
Figure 5: Gene expression heatmaps for annotated DEGs (adjusted p-value < 0.01) for the disease-healthy contrast.Rows are genes, columns are samples ordered as in the bottom panel: ahead-of-lesion (AL), healthy (H), and diseased (D).The color scale is in log 2 (fold change relative to the gene's mean).The trees are hierarchical clustering of genes based on Pearson's correlation of their expression across samples.

Figure 6 :
Figure 6: Gene expression heatmaps for annotated DEGs (adjusted p-value < 0.01) for the disease-AL contrast.Rows are genes, columns are samples ordered as in the bottom panel: ahead-of-lesion (AL), healthy (H), and diseased (D).The color scale is in log 2 (fold change relative to the gene's mean).The trees are hierarchical clustering of genes based on Pearson's correlation of their expression across samples.

Figure 7 :
Figure 7: Gene expression heatmaps for annotated DEGs (unadjusted p-value < 0.01) for the AL-healthy contrast.Rows are genes, columns are samples ordered as in the bottom panel: ahead-of-lesion (AL), healthy (H), and diseased (D).The color scale is in log 2 (fold change relative to the gene's mean).The trees are hierarchical clustering of genes based on Pearson's correlation of their expression across samples.

Figure 8 :
Figure 8: Gene expression heatmaps of annotated DEGs with a module membership and gene significance score greater than 0.6 within their respective modules.Rows are genes, columns are samples ordered as in the bottom panel: ahead-of-lesion (AL), healthy (H), and diseased (D).The color scale is in log 2 (fold change relative to the gene's mean).The trees are hierarchical clustering of genes based on Pearson's correlation of their expression across samples.The color block of the trees indicates the module to which these genes belong (dark green, turquoise, and green).