The venom-gland transcriptome of the eastern coral snake (Micrurus fulvius) reveals high venom complexity in the intragenomic evolution of venoms

Background Snake venom is shaped by the ecology and evolution of venomous species, and signals of positive selection in toxins have been consistently documented, reflecting the role of venoms as an ecologically critical phenotype. New World coral snakes (Elapidae) are represented by three genera and over 120 species and subspecies that are capable of causing significant human morbidity and mortality, yet coral-snake venom composition is poorly understood in comparison to that of Old World elapids. High-throughput sequencing is capable of identifying thousands of loci, while providing characterizations of expression patterns and the molecular evolutionary forces acting within the venom gland. Results We describe the de novo assembly and analysis of the venom-gland transcriptome of the eastern coral snake (Micrurus fulvius). We identified 1,950 nontoxin transcripts and 116 toxin transcripts. These transcripts accounted for 57.1% of the total reads, with toxins accounting for 45.8% of the total reads. Phospholipases A2 and three-finger toxins dominated expression, accounting for 86.0% of the toxin reads. A total of 15 toxin families were identified, revealing venom complexity previously unknown from New World coral snakes. Toxins exhibited high levels of heterozygosity relative to nontoxins, and overdominance may favor gene duplication leading to the fixation of advantageous alleles. Phospholipase A2 expression was uniformly distributed throughout the class while three-finger toxin expression was dominated by a handful of transcripts, and phylogenetic analyses indicate that toxin divergence may have occurred following speciation. Positive selection was detected in three of the four most diverse toxin classes, suggesting that venom diversification is driven by recurrent directional selection. Conclusions We describe the most complete characterization of an elapid venom gland to date. Toxin gene duplication may be driven by heterozygote advantage, as the frequency of polymorphic toxin loci was significantly higher than that of nontoxins. Diversification among toxins appeared to follow speciation reflecting species-specific adaptation, and this divergence may be directly related to dietary shifts and is suggestive of a coevolutionary arms race.

New World coral snakes (Elapidae) consist of three genera (Leptomicrurus, Micrurus, and Micruroides) and over 120 species and subspecies, inhabiting a diverse array of habitats from the southern United States to central Argentina [8]. The eastern coral snake (Micrurus fulvius) is native to the forests of the southeastern United States and is mainly ophiophagous, preying upon smaller snakes and other squamates [9]. Bites from Micrurus species can be lethal [10] due to the pre-and postsynaptic effects of the neurotoxic venom components that dominate coral snake venoms (LD 50 = 7-76 μg venom/18-22 g mouse [11]), and polyvalent antivenom has been shown to be ineffective at treating bites from all Micrurus species [11,12] and is currently unavailable in the United States [10]. A full characterization of all venom components may allow the design of more effective polyvalent antivenom [13,14], but coral-snake venom composition is poorly understood in comparison to that of Old World elapids [15], mainly due to the difficulty of procuring sufficient venom quantities during milking for standard proteomic techniques [13]. High-throughput sequencing approaches are capable of identifying thousands of loci, enabling a detailed examination of the evolutionary forces shaping venom composition at the molecular level. We describe the first high-throughput transcriptomic characterization of an elapid venom gland to date. We sequenced the venom-gland transcriptome of M. fulvius with Illumina technology using the paired-end approach of Rokyta et al. [16], and used the generated sequence data to examine the relationship between toxin heterozygosity and gene duplication events and uncover distinct expression patterns in highly expressed and extremely diverse toxin gene families.

High venom complexity revealed by means of sequencing
Our high-throughput transcriptomic analysis revealed high venom complexity in M. fulvius, comparable to the diversity of toxin components recently identified in the venom-gland transcriptome of the eastern diamondback rattlesnake (Crotalus adamanteus: Viperidae) [16]. We generated 79,573,048 pairs of 100 nucleotide raw reads and merged 61,609,456 pairs on the basis of overlap at their 3' ends, following the approach of Rokyta et al. [16] and Rodrigue et al. [17]. These merged reads had an average length of 134 nucleotides with average phred scores of 46. The iterative assembly process described by Rokyta et al. [16] coupled with a reference-based assembly using nontoxin transcripts previously described in the venom-gland transcriptome of C. adamanteus resulted in the identification of 1,950 unique, full-length nontoxin coding sequences and 116 unique, full-length toxin coding transcripts ( Figure 1A). Transcript abundances were estimated by mapping 10 million reads to unique, full-length sequences with a minimum match percentage of 95% as described in Rokyta et al. [16] for C. adamanteus, against whose results we will be making comparisons. Toxin transcripts were grouped into clusters based on <1% nucleotide divergence for abundance estimates. This enabled us to account for allelic variation, recent duplication events, and possible sequencing errors while also allowing reads to be mapped to a unique sequence rather than to multiple sequences, resulting in a more accurate estimation of abundance levels. Our estimates of abundance reflect mRNA levels and not necessarily protein abundances; the relationship between mRNA and protein levels is not always straightforward [18], and the construction of a complete genotype-phenotype map requires proteomic analyses (e.g., mass spectrometry).
The 1,950 nontoxin transcripts accounted for 11.3% of the reads. The 116 full-length toxin transcripts were grouped into 75 clusters ( Figure 1B, Table 1) and accounted for 45.8% of the reads. In total, we accounted for 57.1% of the reads (Figure 2), comparable to the amount identified for C. adamanteus using a similar approach [16]. While the overall percentage of reads mapping to identified transcripts was similar for M. fulvius and C. adamanteus, toxin expression levels were quite different ( Figure 2). The toxin transcripts identified in M. fulvius accounted for nearly half of the total sequencing reads (45.8%) while the toxin transcripts in C. adamanteus accounted for approximately onethird (35.4%) of the total reads ( Figure 2) [16]. The numbers and abundances of nontoxin coding sequences were much lower in M. fulvius than in C. adamanteus, despite an increase in assembly effort (e.g., the addition of the reference-based assembly), as 1,950 nontoxin transcripts accounted for 11.3% of the total reads in M. fulvius while 2,879 nontoxin transcripts accounted for 27.5% of the total reads in C. adamanteus ( Figure 2). The venom-gland transcriptome of C. adamanteus was characterized by large, hemorrhage-inducing toxins such as snake venom metalloproteinases (SVMPs), proteins that presumably require extensive downstream processing by nontoxin machinery prior to becoming mature, active toxins [16]. The vast majority of highly expressed nontoxin sequences identified in the transcriptome of C. adamanteus were involved with proteostasis (e.g., protein folding, degradation, and transport) [16], and the reduction in the expression levels of nontoxin transcripts in M. fulvius could potentially reflect a difference in the maintenance, production, and folding requirements of the venom components of each species. The venom of M. fulvius was dominated by three-finger toxins (3FTxs) and phospholipases A 2 (PLA 2 s), relatively short toxins that may not require the degree of downstream http://www.    processing needed by larger toxins to become functional. This suggests that venoms dominated by smaller proteins differ in the transcriptional effort expended on toxins relative to nontoxins in comparison to venoms characterized by high-molecular weight components, with small-component venoms expressing toxins at much higher levels relative to nontoxin production. The largely proteinaceous composition of venom makes it metabolically costly to produce [19], and a reduction in the machinery necessary for the production of functional toxic proteins may confer an energetic benefit to species expressing smaller peptides and enzymes. Simple, smaller toxins have a reduced mutational target relative to larger proteins as a function of sequence length, potentially reducing the ability to evolve effective counterdefenses to resistance development in frequently envenomed prey [6,7] and predators [1] where more complex venoms would be advantageous. However, as our hypotheses are based on a comparison between a single representative of each family, sequencing additional members of Viperidae and Elapidae are needed to test whether these putative differences in transcriptional effort are fixed or unique to M. fulvius and/or C.adamanteus.

Toxin class expression patterns
We identified 116 unique, full-length toxin transcripts representing 15 toxin classes or families which clustered into 75 groups with <1% nucleotide divergence. Clusters could include alleles, recent duplicates, or assembly/sequencing errors. These clusters accounted for 45.8% of the total reads ( Figure 2), and nearly all of the highabundance transcripts coded for toxins ( Figure 1A). Of the 75 toxin clusters identified, 71 were among the top 200 most highly expressed genes ( Figure 1A), while only 63 of the 78 toxin clusters identified in C. adamanteus were among the top 200 most highly expressed genes [16].
The expression patterns of the venom gland appear to be biased towards toxin production in C. adamanteus, but reach a more extreme level of specialization in M. fulvius. Although sequencing a single specimen may not accurately reflect gene expression for all individuals of M. fulvius, our analyses provide a reference transcriptome http://www.  Figure 2 The venom-gland transcriptome of Micrurus fulvius was dominated by phospholipases A 2 and three-finger toxins. Toxin gene expression was dominated by phospholipases A 2 (PLA 2 s) and three-finger toxins (3FTxs). Full-length transcripts accounted for 57.1% of the total reads; toxin sequences accounted for 45.8% of the total reads. PLA 2 s and 3FTxs represent both the most abundant and most diverse toxin classes identified; 31 PLA 2 clusters accounted for 64.9% of the toxin reads, and 15 3FTx clusters accounted for 21.1% of the toxin reads. Toxin sequences accounted for 10.4% more of the total reads in M. fulvius than in C. adamanteus, while nontoxins in C. adamanteus accounted for more than twice the total read percentage than in M. fulvius. The venom of M. fulvius was dominated by small neurotoxic components while the venom of C. adamanteus was dominated by larger hemorrhage-inducing proteins, suggesting that the transcriptional effort expended on toxins versus nontoxins may differ between venoms dominated by high-molecular weight components and venoms dominated by smaller proteins, with small-component venoms expressing toxins at much higher levels relative to nontoxin production.

Crotalus adamanteus
for future work with the species. We used the number of reads mapping to a specific transcript as an estimate of its abundance as per Rokyta et al. [16]. Toxin transcripts were named by the toxin-class abbreviation followed by the cluster number and a letter if multiple transcripts were present within the cluster.

Phospholipases A 2
We identified 54 unique sequences of PLA 2 s that grouped into 31 clusters, which accounted for 64.9% of toxin reads and 29.7% of the total reads ( Figure 2). PLA 2 s were the most abundant and diverse toxin class in the M. fulvius venom-gland transcriptome. PLA 2 s are esterolytic enzymes and are among the most toxic components of snake venoms, causing pre-and postsynaptic neurotoxicity, myotoxicity, cardiotoxicity, hemolytic activity, anticoagulant activity, and hypotensive activity among other effects [20]. Conversely, SVMPs were the most abundant toxin gene family in C. adamanteus, with 16 clusters representing 39 sequences accounting for 24.4% of the toxin reads and 8.6% of the total reads [16]. PLA 2 s accounted for more than twice the percentage of toxin reads and over three times the percentage of total reads than the most abundant class in C. adamanteus, demonstrating the extreme dominance of gene expression by the PLA 2 gene family in M. fulvius.

Three-finger toxins
We identified 26 full-length transcripts and 15 clusters of 3FTxs, which accounted for 21.1% of toxin reads and 9.7% of the total reads ( Figure 2). 3FTxs were the second most abundant and diverse toxin class, and with http://www.biomedcentral.com/1471-2164/14/531 PLA 2 s accounted for 86.0% of the total toxin reads. 3FTxs are short (60-71 amino-acid residues), nonenzymatic proteins characterized by three β-stranded loops extending from a center protein core [21]. These proteins are common components of elapid snake venoms and exhibit postsynaptic neurotoxicity by inhibiting the binding of acetylcholine to the muscle nicotinic acetylcholine receptors, causing a blockage of neuromuscular transmission [21]. The venom-gland transcriptome of M. altirostris was dominated by 3FTxs, as this toxin family accounted for approximately half of the identified expressed sequence tags [22]. In M. fulvius, 3FTxs only accounted for 9.7% of the total reads, reflecting vast differences in expression patterns within the genus.

Long-chain neurotoxins
Long-chain neurotoxins (LCNs) were the third most abundant and fourth most diverse toxin class. We identified nine sequences, which grouped into seven clusters and accounted for 4.0% of toxin reads and 1.8% of the total reads ( Figure 2). LCNs are similar to the short-chain 3FTxs described above but contain an additional 6-12 amino acid residues and an additional disulfide bond [21]. While both antagonize muscle nicotinic acetylcholine receptors, differences in the functional sites of these toxins correlate to differences in the specificity of their targets [21]. LCNs were much more similar to one another than 3FTxs, being 5.3% divergent at the nucleotide level and 12.8% divergent at the amino acid level whereas divergence among members of the 3FTx class was extremely high, with a paucity of conserved nucleotides and amino acids across cluster members.

Snake venom metalloproteinases
SVMPs were the fourth most abundant toxin class, represented by a single transcript which accounted for 2.9% of the toxin reads and 1.3% of the total reads ( Figure 2). SVMPs display extensive local and systemic hemorrhagic activity associated with envenomation in viperids, but the role of these enzymes following Micrurus envenomation is uncertain. These enzymes can be subdivided into classes based on differences in domain structure, and the transcript identified in the venom-gland transcriptome of M. fulvius is a member of the class SVMP-III, characterized by disintegrin-like, cysteine-rich, and metalloproteinase domains [23]. Larger molecular weight toxins, such as SVMPs, are often absent from elapid venoms and are more common components in the venoms of viperids [16,23]. High metalloproteinase expression has been shown to be incompatible with high toxicity in rattlesnakes, being described by Mackessy as a tradeoff between neurotoxicity and hemorrhagic activity [24]. High-molecular weight toxins such as SVMPs are the major causes of hemorrhage and necrosis commonly associated with viperid envenomations, and presumably aid in digestion and are most effective when localized at the site of the bite. Conversely, toxins with systemic effects (e.g., toxicity) such as 3FTxs are more effective when spread through the envenomated organism is rapid, resulting in prey immobilization. This dichotomy in pharmocological effects appears to hold in elapids as well, demonstrating the antagonistic effects of these components. Yet in comparison to other transcriptomic work with Micrurus species, SVMP expression levels are nearly five-times higher in M. fulvius than in congeneric individuals [22], suggesting a more active functional role for this toxin in M. fulvius. This hypothesis awaits proteomic verification and testing via comparative functional assays to quantify the activity of SVMPs in M. fulvius relative to congeners.

Kunitz-type inhibitors
Kunitz-type inhibitors (KUNs) were the fifth most abundant and third most diverse toxin class present in the M. fulvius transcriptome, with nine transcripts and eight clusters accounting for 2.2% of the toxin reads and 1.0% of the total reads ( Figure 2). KUNs are characterized by possessing a compact tertiary fold and three disulfide bonds, functioning as both inhibitors of serine proteases and as neurotoxins by inhibiting calcium and potassium ion channels [25]. KUN-4 was a very divergent transcript, being 504 nucleotides longer than the next longest transcript. It shared 92.0% identity with a putative KUN known from the Australian elapid Austrelaps labialis [26] and 85.0% identity with a KUN identified in the venom-gland transcriptome of C. adamanteus [16]. This transcript is considered putative until a comprehensive functional characterization is completed. The removal of this extremely divergent transcript resulted in a maximum pair-wise nucleotide divergence of 31.0% within KUNs and a maximum pairwise amino acid divergence of 68.0%, reflecting the diversity of this toxin class, especially in comparison to LCNs.

Low-abundance toxins
PLA 2 s, 3FTxs, LCNs, SVMPs, and KUNs accounted for 95.1% of the reads mapping to toxin sequences ( Figure 1B), 82.7% of the toxin clusters, and 85.3% of the unique toxin transcripts. The remaining low-abundance toxins fell into ten different classes, are listed under "Other" in Figure 1B and Figure 2, and are described in Table 2.

Heterozygosity and gene duplication
Single nucleotide polymorphisms (SNPs) were identified in the coding regions of toxin clusters and nontoxin http://www.biomedcentral.com/1471-2164/14/531    of heterozygous toxin loci was significantly greater than the frequency of heterozygous nontoxin sequences (χ 2 = 6.383, df = 1, p = 0.0115). Toxin transcripts also possessed a much higher SNP density than nontoxins. SNP density was calculated as the number of SNPs per 1,000 bases (kb) in the coding regions of toxin and nontoxin transcripts containing SNPs [35]. Nontoxin transcripts had a SNP density of 0.96 SNPs/kb while toxins had a SNP density of 1.93 SNPs/kb. Sequences in multitranscript clusters appear to represent different allelic states of heterozygous loci, potentially reflecting how heterozygote advantage can favor gene duplication. If duplicate genes confer an advantage by providing more of a specific gene product, duplication events involving genes that are already highly expressed would be most beneficial [36]. While duplication events of highly expressed toxin transcripts within the snake venom gland have been rampant throughout the evolutionary history of venomous species (e.g., the formation of large gene families), this does not explain the increased heterozygosity detected in our analyses. Gene duplication may also produce a selective advantage when overdominance occurs. In this scenario, carrying different alleles at a single locus is beneficial and gene duplication can ultimately result in the fixation of multiple advantageous alleles. Heterozygosity occurred with a significantly higher frequency in toxin than in nontoxin sequences with nearly half of the polymorphic sites in toxins resulting in a nonsynonymous mutation, suggesting that heterozygote advantage may play a key role in driving gene duplication and allelic variation within toxin genes [37] (but see [38]).

Expression biases and sequence divergence following speciation
The maximum-likelihood phylogeny for 15 unique M. fulvius 3FTx cluster members was estimated under the HKY+G model ( Figure 3A). Expression levels were extremely biased toward a handful of sequences; four transcripts accounted for more than 67.0% of the toxin reads mapping to 3FTxs. These high-abundance transcripts (3FTx-1a, 2a, 3a, and 4a) are all members of multitranscript clusters, and duplication events and the number of alleles appear to be positively correlated with expression within the 3FTx toxin gene family. The maximum-likelihood phylogeny for 15 M. fulvius 3FTx clusters and 23 3FTx transcripts from M. altirostris and M. corallinus was estimated under the HKY+G model ( Figure 3B). Two of the four most highly expressed M. fulvius 3FTx paralogs are in well-supported clades containing orthologs from both M. altirostris and M. corallinus. 3FTx-13, while accounting for approximately 2.0% of the 3FTx reads in M. fulvius, is in a well supported clade containing ten M. altirostris sequences indicating that orthologs to 3FTx-13 have undergone duplication events in M. altirostris. M. altirostris sequences were identified in a venom-gland transcriptome analysis where the toxin family accounted for 52.8% of the 867 expressed sequence tags generated [22], demonstrating that these transcripts are highly expressed. The apparent dominance of the 3FTx toxin family in M. altirostris by these transcripts, and the extremely low expression of 3FTx-13 in M. fulvius could reflect dietary differences between the species, as shifts in prey consumption have been known to be coupled with expression modifications that augment venom efficacy [4]. Genes that are no longer effective are lost or silenced while effective genes are highly expressed and diversify, suggesting functional divergence among toxins occurs following speciation events [4].
A maximum-likelihood phylogeny for 31 unique M. fulvius PLA 2 clusters was estimated under the SYM+G model ( Figure 4A). Expression levels for PLA 2 transcripts were much more evenly distributed across the entire gene family in comparison to 3FTx sequences. The most highly expressed 3FTx transcript, 3FTx-2a, accounted for 22.7% of the total 3FTx reads while four transcripts accounted for approximately 67.0% of the 3FTx reads. In the PLA 2 gene family, PLA 2 -2a was the most abundant toxin family member, accounting for 8.3% of the total PLA 2 reads and the four most highly expressed transcripts accounted for only 30.8% of the reads mapping to PLA 2 s. Although these two families dominated toxin transcript expression levels in the M. fulvius venom gland, PLA 2 s did so through uniform expression while 3FTx expression patterns were tremendously biased. PLA 2 transcripts also demonstrated less sequence divergence among clusters relative to 3FTx sequences. PLA 2 s are esterolytic enzymes that share a conserved three-dimensional structure [4,20], and the greater similarity between PLA 2 transcripts may be a result of more stringent conformational constraints. The conservation of crucial structures in PLA 2 enzymes ensures a functioning active site whereas the relatively short 3FTx peptides may be free of this limitation. Lynch [4] found that functionally critical sites were under strong purifying selection in PLA 2 s, with strong directional selection being restricted to surface residues due to their interactions with specific targets in prey, enabling preyspecific adaptation while ensuring the functionality of the enzyme. The maximum-likelihood phylogeny for 31 M. fulvius PLA 2 clusters, three M. altirostris sequences, and a single M. corallinus sequence was estimated under the SYM+G model ( Figure 4B). Functional divergence among PLA 2 s may also occur following speciation events [4], as all three M. altirostris sequences constitute a monophyletic clade and are sister to PLA 2 -21, a transcript that accounts for <1% of PLA 2 reads in M. fulvius (although the pattern is not as strong as in 3FTxs as the PLA 2 -21/ M. altirostris clade is not well-supported). http://www.biomedcentral.com/1471-2164/14/531 Rampant gene duplication following the initial recruitment of toxin genes into the venom gland has led to the production of multigene toxin families [36], including the highly expressed 3FTx and PLA 2 gene families present in the M. fulvius transcriptome. Gene duplication can be advantageous by increasing the production of a beneficial gene product or permanently fixing multiple advantageous alleles. Gene duplication can be followed by elevated rates of selection [39], enabling genes to acquire new functions or divide single functions among several genes [36]. Phylogenetic analyses of the 3FTx and PLA 2 toxin classes (Figures 3 and 4) demonstrate the pervasiveness of gene duplication throughout the evolutionary history of these toxin families, and suggest that the divergence of toxin genes occurs following speciation. Zhang [36] stated that species-specific duplication events can result in species-specific gene function and, subsequently, adaptation. This adaptation can result in divergence and the development of unique phenotypes [36], which may be reflected in our phylogenetic analyses of multiple Micrurus species.

Intragenomic evolution of venom genes
Positive selection in toxin genes has been repeatedly documented, reflecting the significance of venom to the fitness of venomous species [2][3][4]  dN/dS ratios are represented by ω; λ represents our test statistic which is negative twice the difference in log likelihoods compared across models to a χ 2 distribution with two degrees of freedom; and p corresponds to the proportion of codon sites falling into one of three rate classes which are purifying selection, neutral selection, and positive selection, respectively. The model selected by MrModelTest2.3 [50] for the estimation of the maximum-likelihood phylogeny is given and n corresponds to the number of sequences used in the analysis. Significant p-values are indicated by an asterisk. http://www.biomedcentral.com/1471-2164/14/531 sequences available with little regard for the evolutionary histories of the taxa involved [4,40]. Here, we examine the intragenomic evolution of toxin classes within the M. fulvius genome. The codeml results for detecting the presence of positive selection are summarized in Table 4. The results from the M7 and M8 models are nearly identical to the results from the M1 and M2 models and are therefore not shown. We examined 3FTxs, KUNs, LCNs, and PLA 2 s for the presence of positive selection as these represent the four most diverse and four of the five most abundant toxin classes identified. While the fourth most highly expressed toxin class was SVMPs, this class was represented by a single transcript and thus could not be included in our analyses. The M0 model estimates a single ω value over all branches in the phylogeny and therefore is only capable of detecting evidence of selection when the majority of sites have experienced positive selection throughout their evolutionary histories [4]. Under the M0 model, we detected strong evidence of positive selection (1.48 ≤ ω ≤ 999.00) for the 3FTx, LCN, and PLA 2 toxin families. Evidence of positive selection was not detected in the KUN toxin family (ω = 0.58), and the extraordinarily high ω calculated for the LCN family (ω = 999.00) was a result of all polymorphic sites resulting in nonsynonymous substitutions. We also used site specific models (M1, M2, M7, and M8) to detect rate variation among sites [2,4]. The 3FTx, LCN, and PLA 2 toxin classes rejected the null or nearly neutral model (M1) in favor of the selection model (M2) (0.00 ≤ p ≤ 4.49 × 10 −5 ), demonstrating the pervasiveness of positive selection experienced by highly expressed toxin transcripts within the M. fulvius genome. The percentage of sites experiencing selection ranged from 16-56% with 3.79 ≤ ω ≤ 999.00. The KUN toxin class did not reject the M1 model in favor of the selection model (p = 0.0539). The high ω value under the site specific models for the LCN class again reflects the absence of synonymous substitutions within the class. Elevated rates of evolution have previously been documented in toxin genes [2,4,41,42], and this accelerated evolution is most likely due to their direct involvement in fitness [2,4] and may be reflective of a coevolutionary arms race with specific prey taxa [6,7].

Sequence accession numbers
The original, unmerged reads were submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive under accession number SRA062772. The annotated sequences were submitted to the GenBank Transcriptome Shotgun Assembly database under accession numbers GAEP01000001-GAEP01001950 (nontoxins) and GAEP01001951-GAEP01002066 (toxins).

Conclusions
We have described the most comprehensive transcriptomic characterization of an elapid venom gland to date, revealing venom complexity previously unknown from any New World coral snake [11,13,15,22,[43][44][45]. Transcriptional effort expended on toxins relative to nontoxins may differ between venoms dominated by highmolecular weight components and venoms dominated by smaller proteins. This reduction in the machinery required for the production of functional toxins may confer a metabolic advantage to species expressing smaller peptides and enzymes, but may also reduce the capacity of species to evolve effective counterdefenses to resistant prey or predators. Toxin expression was dominated by PLA 2 s and 3FTxs, yet these two gene families greatly differed in expression patterns. Expression within the 3FTx family was extremely biased, being dominated by a handful of transcripts while PLA 2 expression was much more uniform. SNP analysis revealed the frequency of heterozygous loci was significantly higher in toxins than in nontoxins with nearly half of the polymorphic sites in toxins resulting in a nonsynonymous substitution, suggesting overdominance may ultimately favor gene duplication and permanent fixation of advantageous alleles within the venom gland. We detected evidence of positive selection in three of the four most diverse and highly expressed toxin classes; sequence evolution or modifications of toxin expression patterns could increase the specificity of venoms for frequently envenomed prey items [5]. Diet has been proposed to be an important selective regime in determining venom composition within Micrurus species [46], and elevated rates of selection suggest a coevolutionary arms race [1].

Venom-gland transcriptome sequencing
We followed the approach described in Rokyta et al. [16] for the preparation and sequencing of the venom gland. We sequenced the venom-gland transcriptome of an adult female M. fulvius from Wakulla County, FL, with a snout-to-vent length of 620 mm and a total length of 685 mm. The snake was anesthetized by propofol injection (10 mg/kg), and venom was extracted by electrostimulation [47]. The animal was allowed to recover for four days for transcription to be maximized [48], at which point the snake was euthanized by sodium pentobarbitol injection and its venom glands removed [16]. This procedure was approved by the Florida State University Institutional Animal Care and Use Committee under protocol #0924. Sequencing and nonnormalized cDNA library preparation was performed by the HudsonAlpha Institute for Biotechnology Genomic Services Laboratory (http://www.hudsonalpha.org/gsl/). Total RNA was reduced to poly-A+ RNA with oligo-dT beads. Two http://www.biomedcentral.com/1471-2164/14/531 rounds of poly-A+ selection were performed. The mRNA then underwent a mild heat fragmentation followed by random priming for first-strand synthesis. Standard second-strand synthesis was followed by library preparation with the double-stranded cDNA as input material. Sequencing was performed in a single lane on the Illumina HiSeq 2000 with 100-base-pair, paired-end reads.

Transcriptome assembly and analysis
We followed the general approach described in Rokyta et al. [16] for the de novo assembly of the M. fulvius venom-gland transcriptome. Most of our read pairs overlapped at the 3' end and these reads were subsequently merged to produce longer, composite reads [16,17]. Quality scores were updated accordingly and only these merged reads were used in the assembly process. The Extender program [16] was used as a de novo assembler of 1,000 random reads to eliminate extremely high-abundance transcripts. Full-length transcripts were identified with blastx searches and annotated. These sequences were then used as a template to filter a set of the unassembled, original merged reads using NGen 3.1 and a minimum match percentage of 98%. This de novo approach using the Extender program was repeated a second time as described above, using 1,000 of the filtered reads as the program continued to be productive at assembling full-length, unique transcripts. We next identified nontoxin transcripts by aligning 10 million of the unassembled reads to nontoxin transcripts previously identified in the venom-gland transcriptome of C. adamanteus [16] using the NGen assembler. We performed an initial alignment with a minimum match percentage of 93%. We then performed a second alignment with a reduced minimum match percentage of 90% to assemble more divergent transcripts. All other parameter values were consistent with our de novo assemblies in NGen. Coding regions with minimum ten-fold coverage were annotated, and all regions below this threshold were removed. These transcripts were combined with the Extender results and this unique set of sequences was then used as a template to filter a set of the merged reads using NGen 3.1 and a minimum match percentage of 98%. Next, 10 million filtered reads were used in a de novo assembly with NGen 3.1 with a minimum match percentage of 93%. Contigs with a minimum of 200 reads were identified with blastx searches, annotated, and duplicates removed, producing a unique set of identified transcripts. This process of filtering, NGen assembly, and annotating was performed an additional two times. Abundances were estimated by mapping 10 million merged reads to full-length sequences with a minimum match percentage of 95% using a reference-based assembly in NGen [16]. The percentage of reads mapping to an individual transcript was used to estimate abundance. Toxin transcripts were clustered based on <1% nucleotide divergence for abundance estimates.

Detecting heterozygosity
Ten million merged reads were aligned to the 2,025 annotated transcripts in a reference-based assembly in NGen with a minimum match percentage of 95%. SNPs were identified by using the SeqMan module of the DNAStar Lasergene software suite. Toxin transcripts were clustered based on <1% nucleotide divergence as SNP detection provided an approach to identify allelic variation within toxin clusters. SNPs were only considered if they occurred in the coding sequence of full-length, annotated transcripts, had a SNP% ranging from 40-60%, and at least 20-fold coverage with maximum coverage bounded at 20,000-fold. These parameters are more stringent than previous SNP identification approaches [35] and provide a conservative estimation of variable sites in the coding regions of annotated transcripts. The frequency of toxin versus nontoxin heterozygous loci, relative to the number of transcripts belonging to each class identified in the transcriptome, was compared to a χ 2 distribution with one degree of freedom.

Phylogenetic analyses
Sequences of 3FTxs and PLA 2 s were independently aligned on the basis of the amino-acid sequence in the MegAlign module of the DNAStar Lasergene software suite with ClustalW [49]. Model selection was performed using the Akaike Information Criterion values with MrModelTest2.3 [50], and maximum-likelihood phylogenies were estimated using PAUP* version 4.0a126 [51] and the iterative search strategy described by Rokyta et al. [52]. Nodal support was determined in MrBayes v3.1.2 by the estimation of Bayesian posterior probabilities [53,54]. Markov Chain Monte Carlo searches were run for 10 million generations with four chains and a temperature parameter of 0.20. Samples were taken every 1,000 generations and the first million generations were discarded as burn-in. A related transcript from the mangrove snake (Boiga dendrophila: Colubridae) served as the outgroup to root each 3FTx phylogeny while a transcript from the eastern diamondback rattlesnake (C. adamanteus) served as the outgroup to root each PLA 2 phylogeny. Orthologous transcripts of both toxin families from congeners were downloaded from the NCBI database and included in a second analysis following the method described above.

Detecting selection
Transcripts from the four major toxin families (3FTx, KUN, LCN, and PLA 2 ) were used in selection analyses. Sequences were independently aligned according to class on the basis of the amino-acid sequence in the MegAlign module of the DNAStar Lasergene software http://www.biomedcentral.com/1471-2164/14/531 suite with ClustalW [49]. Gaps, stop codons, and signal peptides were excluded from all analyses, and only transcripts possessing signal peptides were included. Signal peptides mediate the targeting and transporting of the pre-protein and are cleaved prior to expression [55]. Their exclusion from our analyses ensured that only the mature amino-acid sequences of expressed toxins that are targets of selection were examined. Model selection was performed using the Akaike Information Criterion values with MrModelTest2.3 [50]. A maximum likelihood phylogeny was constructed using PAUP* version 4.0a126 [51] and the iterative search strategy previously described by Rokyta et al. [52].
A likelihood-ratio test for positive selection was conducted with codeml from the PAML version 4.4 package [56,57] with the maximum-likelihood phylogeny estimated as described above [2]. The null model, or the nearly neutral model (M1), allows for a class of sites to be evolving under neutral selection (dN/dS=1) while constraining the dN/dS for a second class to be <1. The alternative model, or the positive selection model (M2), incorporates an additional class that allows for a proportion of codon sites to be experiencing positive selection (dN/dS>1). To test for the presence of positive selection, negative twice the difference in log likelihoods were compared between models to a χ 2 distribution with two degrees of freedom. A similar test was also performed to verify the results of the initial analysis, comparing models M7 (Beta) and M8 (Beta with positive selection) to a χ 2 distribution with two degrees of freedom [58]. To estimate an overall dN/dS, the M0 model was used. This model averages the dN/dS across the entire phylogeny, producing a single ratio for all sites. While this model has been shown to have limited power at detecting positive selection [59], it provides a broader perspective and a more conservative estimate of selection within the M. fulvius genome than the site-specific models described above.