BMP signaling components in embryonic transcriptomes of the hover fly Episyrphus balteatus (Syrphidae)

Background In animals, signaling of Bone Morphogenetic Proteins (BMPs) is essential for dorsoventral (DV) patterning of the embryo, but how BMP signaling evolved with changes in embryonic DV differentiation is largely unclear. Based on the extensive knowledge of BMP signaling in Drosophila melanogaster, the morphological diversity of extraembryonic tissues in different fly species provides a comparative system to address this question. The closest relatives of D. melanogaster with clearly distinct DV differentiation are hover flies (Diptera: Syrphidae). The syrphid Episyrphus balteatus is a commercial bio-agent against aphids and has been established as a model organism for developmental studies and chemical ecology. The dorsal blastoderm of E. balteatus gives rise to two extraembryonic tissues (serosa and amnion), whereas in D. melanogaster, the dorsal blastoderm differentiates into a single extraembryonic epithelium (amnioserosa). Recent studies indicate that several BMP signaling components of D. melanogaster, including the BMP ligand Screw (Scw) and other extracellular regulators, evolved in the dipteran lineage through gene duplication and functional divergence. These findings raise the question of whether the complement of BMP signaling components changed with the origin of the amnioserosa. Results To search for BMP signaling components in E. balteatus, we generated and analyzed transcriptomes of freshly laid eggs (0-30 minutes) and late blastoderm to early germband extension stages (3-6 hours) using Roche/454 sequencing. We identified putative E. balteatus orthologues of 43% of all annotated D. melanogaster genes, including the genes of all BMP ligands and other BMP signaling components. Conclusion The diversification of several BMP signaling components in the dipteran linage of D. melanogaster preceded the origin of the amnioserosa. [Transcriptome sequence data from this study have been deposited at the NCBI Sequence Read Archive (SRP005289); individually assembled sequences have been deposited at GenBank (JN006969-JN006986).]


Conclusion:
The diversification of several BMP signaling components in the dipteran linage of D. melanogaster preceded the origin of the amnioserosa.
[Transcriptome sequence data from this study have been deposited at the NCBI Sequence Read Archive (SRP005289); individually assembled sequences have been deposited at GenBank (JN006969-JN006986).]

Background
Across animals, the Bone Morphogenetic Protein (BMP) signaling pathway plays a major role in specifying the dorsoventral (DV) axis [1,2]. However, the components of the BMP pathway have been repeatedly modified through lineage specific gene duplications and gene losses [3,4]. Whether some of these genetic changes correlate with the origin of species-specific morphological traits that develop under the control of the BMP pathway is unknown. Flies (Diptera) provide an excellent opportunity to address this question firstly because the BMP signaling pathway of Drosophila melanogaster has been studied in great detail [5,6], and secondly because tissue specification presumably under the control of BMP signaling along the DV axis of dipterans has undergone significant change [7]. In D. melanogaster, dorsal blastoderm differentiates into a single extraembryonic epithelium, called amnioserosa, which closes the developing embryo dorsally [8]. This tissue is found in higher cyclorrhaphan flies (Schizophora), but in other dipterans, dorsal blastoderm gives rise to distinct serosal and amniotic epithelia [9][10][11]. Serosa and amnion develop from an amnioserosal fold at the margins of the gastrulating embryo. The outer cell layer of this fold becomes the serosa, which closes about the embryo. Its inner cell layer detaches from the serosa but retains continuity with the embryo while closing dorsally (lower cyclorrhaphan flies) or ventrally (non-cyclorrhaphan dipterans). The lower cyclorrhaphan syrphids represent the closest relatives of D. melanogaster that have been shown to develop distinct serosa and amnion tissues [9]. Therefore, they are of particular interest in efforts to understand how the origin of the amnioserosa as a new morphology is linked to changes in the underlying developmental gene network.
In previous studies we have characterized the role that the homeobox gene zerknüllt (zen) may have played in the origin of amnioserosa development [reviewed in 7]. The transcription factor Zen is regulated by BMP signaling and essential in serosa specification in non-schizophoran insects and amnioserosa specification in D. melanogaster [9,[12][13][14]. In lower Cyclorrhapha and more distant relatives of D. melanogaster, zen expression in the serosa is maintained after gastrulation, i. e., when the serosa begins to spread over the embryo [9], whereas in D. melanogaster zen expression in the amnioserosa is down-regulated immediately after gastrulation [13]. In lower cyclorrhaphan flies, postgastrular down-regulation of zen abrogates serosa development and results in the formation of a single extraembryonic tissue with amniotic gene expression [15]. Thus, the repression of this single transcription factor may account for the morphological tissue reorganization that accompanied the origin of the amnioserosa. However, loss of postgastrular zen expression does not explain, why in lower cyclorrhaphan and non-cyclorrhaphan dipterans the patterning of the dorsal blastoderm results in the specification of two distinct extraembryonic tissues types as opposed to one in schizophoran (i. e. higher cyclorrhaphan) flies such as D. melanogaster.
In D. melanogaster, amnioserosa specification occurs at the dorsal midline and requires peak-levels of BMP activity [14,16], which are provided through the interaction of two extracellular ligands, Dpp and Scw [17][18][19]. Both ligands are secreted into the perivitelline space and transported towards the dorsal midline [17,[20][21][22][23][24], where BMP-ligand dimers are released from antagonists to activate a receptor complex and initiate intracellular signaling [17,20,[25][26][27]. A cell autonomous autoregulatory loop further increases the BMP signal at the dorsal midline and generates a narrow and sharply delineated domain of BMP peak activity [20,28]. Dpp is essential for BMP activity and controls the specification of all tissues that develop under the control of this pathway in the early embryo, including amnioserosa and dorsal ectoderm [16,29]. Scw boosts BMP activity along the dorsal midline and is in particular required for amnioserosa specification [19].
In other dipterans, the molecular mechanisms that specify the amnion and serosa are not known. Expression studies in a mosquito suggest that a tighter expression of the Dpp antagonist short gastrulation (Sog) leads to broader BMP signaling, which in turn may allow for the specification of two versus one extraembryonic tissue type [11]. Additionally, Scw is absent from the genomes of mosquitoes and other insects, and it has been suggested that its origin may correlate with the origin of the amnioserosa [3]. Several other BMP signaling components of D. melanogaster resulted from gene duplications that have been mapped to the dipteran lineage, while others -known from the BMP pathways of vertebrates -were lost in the lineage leading to D. melanogaster [4]. Here we use embryonic transcriptome data of the hover fly Episyrphus balteatus (Syrphidae) to address the question of whether evolutionary changes in the complement of BMP signaling components occurred in correlation with the origin of the amnioserosa. Specifically, we found that with the possible exception of one gene-duplication (crossveinless/shrew) and one gene loss (DAN), genes encoding known BMP signaling components, including scw, are conserved across the schizophoran boundary of the dipteran tree. Thus, most or all of the gains and losses of BMP signaling genes in the dipteran lineage do not correlate with the origin of the amnioserosa, suggesting that the origin of amnioserosa specification was probably achieved by rearranging the interaction of established factors.

Results and Discussion
Putative Orthologues of 6013 E. balteatus Genes We sequenced the transcriptome of E. balteatus embryos at two successive time points during early embryogenesis: 0-0.5 hrs old embryos to sample pre-blastoderm stages prior to the onset of zygotic transcription ("maternal library"), and 3-6 hrs old embryos to sample blastoderm and gastrulation stages after the onset of zygotic transcription ("zygotic library"). The cDNA libraries that we prepared from these developmental stages were normalized (Additional file 1A,B) and sequenced using the 454 GS FLX Titanium platform. Following removal of contaminants (see Material and Methods, Additional file 1C, D) reads from both libraries were pooled and assembled using the Newbler Assembler from Roche. Above our chosen cutoff of 100 nt, this assembly yielded a total of 16,950 contigs with an average length of 798 nt (13.5 MB) and 26,862 singletons with an average length of 264 nt (7.1 MB). This data set (20.6 MB total sequence data) was used in subsequent analyses ( Figure 1A). To identify E. balteatus genes, we pooled sequence data from both libraries and performed reciprocal BLAST against annotated genes of D. melanogaster. Based on reciprocal hits, we identified putative D. melanogaster orthologues of 6013 E. balteatus genes. In total, about 8.1 MB (39%) of the assembled E. balteatus sequence data could be annotated (red line in Figure 1A), corresponding to 43% of annotated D. melanogaster genes. Specifically, we recovered 85% of genes annotated for translational control, over 60% of the genes known to encode gene-specific transcription factors (transcription factor -strict [30]), 50% of the genes associated with known and putative gene-specific transcription factors (transcription factor -putative [30]), about 50% of genes associated with structural functions (structure) and enzymes, and 30-40% of genes associated with receptor binding, molecular transporters, and signal transduction (transducer) ( Figure 1B).
All fourteen genes were represented with at least one read from the zygotic library (blue lines in Figure 2A-N), which is consistent with our previous finding that all these genes are expressed in the 3-6 hours time window of embryonic development [9,[31][32][33]. For six of these genes (Eba-bcd, Eba-cad, Eba-nos, Eba-tor, Eba-otd, Eba-hb) we also obtained reads from the maternal library (red lines in Figure 2A-F). Previous and new (Additional file 3A-C) in situ hybridization data indicated maternal expression of Eba-bcd, Eba-cad, Eba-nos, Eba-tor and Eba-otd, but not of Eba-hb. Quantitative PCR (qPCR) on non-normalized cDNA suggested an 18-fold increase of Eba-hb expression levels following the onset of zygotic transcription, whereas expression levels both of Eba-otd and Eba-cad increased by about 2-fold (orange bars in Additional file 3D). Coverage of these genes in the maternal and zygotic transcriptomes closely reflected our qPCR data (light grey bars in Additional file 3D), suggesting that, despite cDNA normalization, the coverage of these genes remained roughly proportional to their expression levels.

BMP Signaling Components in the E. balteatus Transcriptome Database
In D. melanogaster, BMP signaling at the dorsal side of the blastoderm is required to specify the amnioserosa as a single extraembryonic tissue (see Background). Based on mosquito data, it has been suggested that in lower dipterans restricted expression of the BMP antagonist Short gastrulation (Sog) may account for an expanded BMP signaling domain in the dorsal blastoderm, which resolves into serosa and amnion territories [11]. Furthermore, it has been suggested that the complement of BMP signaling components changed in the dipteran lineage in parallel with the origin of the amnioserosa [3]. We used our transcriptome database as a tool to test the latter idea by searching for E. balteatus homologues of specific BMP signaling components of D. melanogaster.
In D. melanogaster, activity of BMP ligands is modulated by Sog [22,23,26], which in turn is regulated by the related metalloproteases Tolloid (Tld) and Tolkin (Tok) [25,49,50]. We identified E. balteatus homologues of sog as well as of tld. While we were not able to identify an orthologue of tok, the presence of a distinct tld orthologue in E. balteatus suggests that the dipteran gene duplication giving rise to tld and tok occurred before the origin of the amnioserosa ( Figure 3D). BMP ligand activity in D. melanogaster is additionally modulated by Twisted gastrulation (Tsg) [51,52], Crossveinless (Cv) [53,54], Shrew (Srw) [26,55], as well as the membrane associated factors Crossveinless-2 (Cv-2) [56,57], Kekkon 5 (Kek5) [58], Pentagone (Pent) [59] and Larval Translucida (Ltl) [60]. We identified E. balteatus homologues of cv-2, kek5 ( Figure 3E) as well as tsg, cv and an additional cv paralogue, Eba-cv-like ( Figure 3F). Previous studies have suggested that tsg, cv, and srw originated by two successive duplications of a cv-like ancestor in the dipteran lineage [3]. Our gene tree analysis is consistent with this idea, but does not resolve whether Eba-cv-like is orthologous to srw, or whether it is the product of an independent gene duplication in E. balteatus. We did not identify orthologues of pent and ltl in E. balteatus, but putative orthologues of both genes are present in the genome of T. castaneum (data not shown). Thus, all currently known modulators of BMP ligand activity in D. melanogaster may have existed prior to the origin of the amnioserosa.
Putative orthologues of the vertebrate BMP ligands BMP10 [61] and Anti-Dorsalizing Morphogenetic Protein (ADMP) [62][63][64], as well as the vertebrate BMP inhibitors BAMBI [65], DAN and Gremlin [66] have been found in beetles and/or wasps but not in D. melanogaster [4]. Among these, we were able to identify a putative orthologue of DAN in E. balteatus ( Figure 3G). The loss of this gene may correlate with the origin of the amnioserosa. However, the function of DAN in insects remains unknown and its potential role in BMP signaling is therefore speculative.

Differences in maternal expression of BMP signaling components in D. melanogaster and E. balteatus
Based on our finding that coverage levels of segmentation genes in the two sequenced transcriptomes were roughly proportional to their expected expression levels (see above; Additional file 3D), we decided to globally compare maternal gene expression between E. balteatus and D. melanogaster. For this purpose, we approximated the maternal expression profiles of all annotated E. balteatus genes based on their coverage in the maternal transcriptome and compared them to maternal expression profiles of their D. melanogaster orthologues. Maternal coverage levels of all annotated E. balteatus genes were corrected for sequence lengths and sequencing depth (see Methods) and plotted against coverage levels of their D. melanogaster orthologues, which were estimated from available SOLiD total RNA sequencing data of 0-2 hour old embryos (i.e. stages when the zygotic transcriptome is still essentially silent) [67] (Figure 4). Coverage levels derived from the non-normalized D. melanogaster transcriptome spread by 5.5 orders of magnitude, while those of the normalized E. balteatus transcriptome spread by 3.6 orders of magnitude. A reduced breadth in coverage levels of E. balteatus genes was expected due to the normalization protocol. The coverage levels of the maternal genes nanos, bicoid, torso, and caudal were higher than 1 in the transcriptome of D. melanogaster and E. balteatus. In contrast, the zygotic genes huckebein, even skipped, giant, hairy, knirps, tailless, Krüppel, and zerknüllt showed coverage levels lower than 1 in both species ( Figure 4A). Notably, the scatter plot correctly revealed the expression differences of hunchback (maternally expressed in D. melanogaster but not in E. balteatus) and orthodenticle (maternally expressed in E. balteatus but not in D. melanogaster). When restricting the data set to BMP components ( Figure 4B) or transcription factors ( Figure 4C), we readily identified additional candidate maternal expression differences in both gene groups. For example, the data suggest maternal expression of crossveinless-2 or kekkon-5 in E. balteatus but not in D. melanogaster, which might reflect different interactions of BMP signaling molecules and regulators in both species during early embryonic development.

Conclusions
Comprising orthologous sequences of nearly half (43%) of all annotated D. melanogaster genes, the newly generated transcriptome data of E. balteatus provide a convenient tool to identify putative orthologues of conserved insect genes. Here we used the transcriptome data of E. balteatus to show that the novel dipteran BMP ligand Scw and other BMP signaling components of D. melanogaster existed prior to the origin of the amnioserosa ( Figure 5). These findings suggest that the origin of amnioserosa development was accompanied by subtle changes in the expression of conserved BMP signaling components, rather than on the origin or loss of individual genes. Modification of the BMP pathway is expected to be constrained due to its multiple functions in development. However, the duplicated genes (gbb and scw; tld and tok; cv, tsg and srw) may have relaxed these constraints, because BMP activity could now be provided by non-identical sets of genes in early (blastoderm) and later developmental stages. We suspect that increasing the role of scw, tld, tsg and srw in early (blastoderm) development at the expense of gbb, tok and cv facilitated genetic accommodation of early DV patterning following the origin of the amnioserosa. Conversely, the entire complement of the duplicated BMP signaling genes might still be required for early DV patterning in lower cyclorrhaphan flies such as E. balteatus.

Preparation of Transcriptome Library
Total RNA was prepared by homogenizing embryos in Trizol (Inivtrogen), treated with DNaseI, and enriched for polyA containing transcripts using the Oligotex kit (Qiagen). First-strand cDNA was synthesized from approximately 1 μg of mRNA. Annealing of random hexamer primers (15 mM) was at 25°C for 10 minutes, cDNA was synthesized at 50°C for 1 hour and followed by inactivation of the reverse transcriptase (Superscript, Invitrogen) at 85°C for 5 minutes. Second-strand cDNA was synthesized using the first strand reaction with Klenow DNA Polymerase at 15°C for 1.5 hours, and terminated by the addition of 0.5 M EDTA, pH 8. cDNA was purified using the QIAquick MinElute Reaction Cleanup Kit (Qiagen). cDNA ends were filled in ("polished") using a mix of Klenow DNA polymerase and T4 polynucleotide kinase with dNTPs at 20°C for 30 minutes, after which the reactions were purified again using the QIAquick MinElute Reaction Clean-up Kit. "A" overhangs were created by incubating polished cDNA with 0.2 mM dATP 0.3 U/μl Klenow exo-at 37°C for 30 minutes. cDNA was purified using the QIAquick MinElute Reaction Clean-up Kit (Qiagen), ligated with a mix of Adap-torA and AdaptorB using T4 DNA ligase, and purified again. Adaptors were generated by annealing equimolar amounts of complementary oligos in 2x TNE buffer (20 mM

Library Normalization and Fragment Size Selection
Libraries were normalized using the TRIMMER DIRECT cDNA Normalization Kit (Evrogen) and were carried out essentially as described in the user manual. Briefly, 400 ng of each library were suspended in hybridization buffer and split into four tubes. Following 5 hr incubation at 68°C, the aliquots were treated either with 4 units, 2 units, 1 unit, or no duplex-specific nuclease (DSN). After DSN digestion, the normalized cDNA libraries were amplified by PCR. Optimal amplification within the exponential phase of the PCR was determined visually after electrophoresing different amplification runs of the non-DSN treated sample, after which all aliquots were amplified by a total of 20 cycles. The normalization efficiency was assessed by quantitative PCR (qPCR) of hunchback (Eba-hb, 5'-CTCAGCCC-GAATCCAAAT/5'-GGTTGTGGGAGTTGATGTTG, amplicon 137 bp), caudal (Eba-cad, 5'-GAAAGAAT ACTGCACCTCCC/5'-GTCGTTCCGATAGTTGAAGC, amplicon 79 bp), and alpha-tubulin as reference (Ebatub, 5'-TGAGGCTCGTGAGGATTT/5'-TCACCATCT CCAGAATCCA, amplicon 71 bp). Primer efficiency was estimated from a standard curve using five different template concentrations (12.5-200 ng); all analyses were run simultaneously in triplicates. For both libraries, the normalization with 4 units DSN were chosen for sequencing. Based on optimal cycle numbers for amplification and degree of normalization, these libraries were size fractionated using agarose gel electrophoresis, excising fragments at roughly 500 bp, and then purifying them using the MinElute Gel Extraction Kit (Qiagen) prior to sequencing.

Sequencing
Transcriptome library sequencing was performed on the Roche/454 Life Sciences GS-FLX platform at the Institute for Genomics and Systems Biology's (IGSB) High-Throughput Genome Analysis Core (HGAC) at Argonne National Laboratory according to the Roche GS-FLX XLR70 Titanium emPCR and amplicon sequencing protocols. Each transcriptome library was sequenced on one region of a two region GS-FLX gasket using Roche GS-FLX XLR70 titanium sequencing reagents. An emPCR titration was initially performed on each library to determine the proper bead:library copy ratio that yielded optimal clonal bead percent enrichment to be used in the final bulk XLR70 emPCR reaction.

Sequence Assembly
Sequencing raw data was processed with gsRunProcessor (software release 2.0.00.20) using standard quality filtering and trimming as defined by the default settings. We obtained 417,735 reads with a mean length of 243 nt for the maternal time point and 406,580 reads with a mean length of 278 nt for the zygotic time point, totaling 214.6 MB of sequence data. This raw data set was contaminated with sequences of the pea aphid Acyrthosiphon pisum, because E. balteatus requires aphids for egg deposition, and embryos were collected in batch from leaves heavily infested with aphids. For subsequent analyses, we removed all reads that matched published A. pisum sequences (mRNA + genomic, NCBI, 2009-07-13) [68] with 95% or higher identity. Removal of A. pisum sequences reduced the total sequence data by about 26% to 158.1 MB and the number of reads from each library by about 22%, resulting in 325,200 reads from the maternal library with a mean read length of 228 nt, and 311,906 reads from the zygotic library with a mean read length of 269 nt. The distribution of read lengths displayed two peaks, one at about 400 nt (360 nt for reads from the zygotic library) and one at less than 100 nt (Additional file 1C,D). The peak at 400 nt corresponded to the expected mean length using the Titanium chemistry. The peak slightly below 100 nt presumably resulted from our cDNA preparation protocol, which had not yet been optimized for the Titanium chemistry at the time of library preparation and lacked, for example, any additional size exclusion steps following gel electrophoreses. All reads have been deposited as the E. balteatus transcriptome at the NCBI Sequence Read Archive (SRA, SRP005289). Assembly was based on combined SFF sequence files of the maternal and zygotic libraries using Newbler Assembler (software release 2.3). Newbler parameters were default except: minimal identity of 90% in overlaps (-mi 90), overlaps to be at least 30 nt in length (-ml 30), minimal length of contigs to be 100 nt (-l 100). Newbler was run as cDNA assembly (-cdna), which includes processing of contigs that are found to be variants of the same transcript into distinct isotigs. From a total of 637,080 reads, 544,776 reads (85.5%) were assembled (fully assembled reads: 432,160 reads,

Sequence Annotation
Reported isotigs (12,296;12.9 MB) and singletons of at least 100 nt length (26,862; 7.1 MB; identified from 454ReadStatus.txt) were combined, and reciprocal BLAST searches of the E. balteatus transcriptome were carried out against the translated transcriptome of D. melanogaster (dmel-all-translation-r5.29.faa, flybase.org) using blastx (E. balteatus query against D. melanogaster database) and tblastn (D. melanogaster query against E. balteatus database). Annotation was performed with an e-value threshold of 10 to screen for all putative orthologues ("no cutoff") and with an e-value threshold of 1e-10 to obtain a conservative list of high confidence orthologues ("1e-10") (see Additional file 5 for comparison of annotation based on assemblies with Newbler 2.3 and Newbler 2.5.3). Reciprocal hits of E. balteatus with D. melanogaster were assigned with the same flybase D. melanogaster CG identifiers. Gene ontology terms were assigned based on the current D. melanogaster gene ontology, with the exception of 'transcription factorstrict' and 'transcription factor -putative', which were based on a curated list of genes of known and putative gene specific transcription factors [30].

Sequence Coverage
To determine coverage levels for individual published E. balteatus genes, associated reads of the maternal and zygotic library sequences were identified by BLAST search (blastn) and assembled with the published sequence using CAP3 with standard parameters [69]. Coverage levels were then calculated for each nucleotide position of the published gene sequences. Fold-coverage in the maternal library was used to approximate maternal expression levels of all annotated E. balteatus genes. The assembled E. balteatus transcriptome (isotigs and singletons) was blasted against all maternal reads (blastn). Coverage of annotated transcripts was then calculated from all reads that matched with at least 95% identity and over the length of at least 50 nt. To approximate levels of gene expression, coverage of each gene was divided by its sequence length and the total RNAseq data of the maternal library (74 MB). Fold-coverage in the 0-2 hr time point of the D. melanogaster development transcriptome was used to approximate maternal expression levels of D. melanogaster genes [67]. Expression data of D. melanogaster genes in 0-2 hrs old embryos was downloaded from modEncode as coverage data mapped onto the entire genome (BC1_plus.wig, BC1_minus.wig). Expression of gene transcripts was retrieved from this genomic map by extracting coverage information of all exons for each annotated gene (BDGP/ dm3). To account for potentially mis-annotated exonintron structures of computationally predicted genes, gene expression was approximated by the coverage of the most strongly expressed exon longer than 500 nt, or by the coverage of the most strongly expressed transcript variant, whichever was higher. To approximate levels of gene expression, coverage of each gene was divided by its sequence length and the total RNAseq data of the 0-2 hr time point (3.4 GB, SRP001696).

Gene Discovery of BMP Signaling Components
A list of D. melanogaster and Tribolium castaneum BMP signaling compounds was matched against assembled and unassembled E. balteatus transcriptome data (tblastn, e-value of 0.001). Identified E. balteatus sequences were assembled in Sequencher, assemblies were manually corrected for ORF frame shifts by alignment with D. melanogaster protein sequence, and sequence orthology was confirmed by reciprocal blast against a D. melanogaster database. To exhaustively search for putative duplicates specific to the E. balteatus lineage, blast searches were repeated using lower cut-offs (increasingly higher e-values) until all newly identified and assembled E. balteatus reads matched a clearly nonorthologous sequence in available insect protein databases. All T. castaneum sequences were retrieved from BeetleBase (ftp://bioinformatics.ksu.edu/pub/BeetleBase/ latest/Sequences/Tribolium_Official_Gene_Sequences/ mRNA.fa).

Phylogenetic Gene Trees
Protein alignments were created using the Clustal algorithm with standard parameters (MegAlign). When more than half of the aligned sequences carried a gap at a given position, these positions were removed from the alignment. The amino acid substitution model was estimated using AIC in ProtTest [70]; maximum likelihood trees were calculated using PhyML [71]. Bootstrap values were based on 1000 replicas. Trees were plotted with drawtree (Phylip package) [72] and the newick-utils package [73].
Custom scripts (Perl, R) were used to automate blast searches and evaluation, calculate E. balteatus and D. melanogaster gene coverage, and prune sequence alignments. Scripts are available on request. Plots were prepared with gnuplot and finished with Freehand. Assembly, blast searches, and bootstrap analysis were computed on the computer cluster of the Department of Ecology & Evolution at the University of Chicago (http://biocomputing.uchicago. edu).

Additional material
Additional file 1: Normalization and 454 sequencing of transcriptome libraries. (A) Normalization of library of 0-0.5 hrs old embryos. (B) Normalization of library of 3-6 hrs old embryos. Normalization was assessed by quantitative PCR. Shown are transcript levels of E. balteatus alpha tubulin (Eba-tub) relative to E. balteatus orthologues of hunchback (Eba-hb) and caudal (Eba-cad). Bar heights indicate transcript levels, colors indicate levels in the non-normalized cDNA library (blue), and after normalization with 1 unit DSN (red), 2 units DSN (orange), and 4 units DSN (yellow). (C) Distribution of read lengths of library from 0-0.5 hrs old embryos. (D) Distribution of read lengths of library from 3-6 hrs old embryos. Read length (x-axis) is shown as a function of number of reads (y-axis) before (grey graph) and after removal of pea aphid sequence contaminations (black graph).
Additional file 2: Coverage of previously identified E. balteatus embryonic patterning genes in the libraries of 0-0.5 hrs old embryos (maternal) and 3-6 hrs old embryos (zygotic).
Additional file 3: Expression of Eba-otd in ovarian follicles and early embryos. (A-C) Eba-otd in situ hybridizations of ovarian follicles (A,B) and an early embryo (C). Anterior is left, dorsal is up. (D) Increase of zygotic expression levels relative to maternal expression levels measured for Ebaotd, Eba-cad, and Eba-hb using qPCR (orange bars) and transcriptome coverage (grey bars). qPCR data are based on the average of two independent PCRs, each of which run in triplicate. Bars indicate variance between samples.
Additional file 4: Schematic overview of TGF-b signaling in D. melanogaster. TGF-β ligands bind as dimers to a transmembrane receptor complex, which activates signal transducing protein that translocates to the nucleus to regulate gene activity (modified after [74]). Components of the BMP signaling cascade are shown in red, components of the Activin-β signaling cascade in blue, shared components in purple. In the case of generalized BMP signaling, extraecellular ligands (Dpp, Scw, Gbb) are regulated by BMP antagonists Sog and Tsg/Cv/Srw(?), and metalloprotease activity of Tld/Tok, which cleaves Sog and frees BMP. Additional BMP specific regulators are transmembrane protein CV-2, which directly binds BMPs, and membrane associated protein Kek5. Active BMPs signal through the receptor complex composed of Tkv, Sax and Put, which phophorylates Mad. Phophorylated Mad recruits Medea and translocates to the nucleus. For abbreviations, see legend to Figure 3.
Additional file 5: Comparison of sequence annotation following de novo transcriptome assemblies produced by Newbler v2.3 and Newbler v2.5.3. During manuscript preparation, a new version of Newbler became available (v2.5). To address the concern of a potential underperformance of Newbler v2.3 in the case of our data set, we repeated the assembly and annotation with the latest available Newbler assembler (v2.5.3) and compared the results to the assembly with Newbler v2.3. We found the total number of assembled bases with Newbler v2.5.3 (10.5 Mb) decreased by almost 20% when compared with the number of assembled bases with Newbler v2.3 (12.9 Mb), suggesting that for our dataset Newbler v2.5.3 performed with more stringent assembly conditions. The observed differences did not affect our identification of BMP signaling components in E. balteatus as these genes were all identified by Blast and subsequent manual assembly.