Skip to main content

Differential expression of endogenous plant cell wall degrading enzyme genes in the stick insect (Phasmatodea) midgut



Stick and leaf insects (Phasmatodea) are an exclusively leaf-feeding order of insects with no record of omnivory, unlike other “herbivorous” Polyneoptera. They represent an ideal system for investigating the adaptations necessary for obligate folivory, including plant cell wall degrading enzymes (PCWDEs). However, their physiology and internal anatomy is poorly understood, with limited genomic resources available.


We de novo assembled transcriptomes for the anterior and posterior midguts of six diverse Phasmatodea species, with RNA-Seq on one exemplar species, Peruphasma schultei. The latter’s assembly yielded >100,000 transcripts, with over 4000 transcripts uniquely or more highly expressed in specific midgut sections. Two to three dozen PCWDE encoding gene families, including cellulases and pectinases, were differentially expressed in the anterior midgut. These genes were also found in genomic DNA from phasmid brain tissue, suggesting endogenous production. Sequence alignments revealed catalytic sites on most PCWDE transcripts. While most phasmid PCWDE genes showed homology with those of other insects, the pectinases were homologous to bacterial genes.


We identified a large and diverse PCWDE repertoire endogenous to the phasmids. If these expressed genes are translated into active enzymes, then phasmids can theoretically break plant cell walls into their monomer components independently of microbial symbionts. The differential gene expression between the two midgut sections provides the first molecular hints as to their function in living phasmids. Our work expands the resources available for industrial applications of animal-derived PCWDEs, and facilitates evolutionary analysis of lower Polyneopteran digestive enzymes, including the pectinases whose origin in Phasmatodea may have been a horizontal transfer event from bacteria.


Whole transcriptome shotgun sequencing, or RNA-Seq, is a high-throughput, next-generation sequencing tool that can efficiently identify tens of thousands of functional genes in an organism or specific tissue at a given time [1, 2]. This deep sequencing makes it a more attractive tool than microarrays for organisms lacking reference genomes, facilitating de novo transcriptome assembly [35]. Its high coverage is desirable when profiling transcripts in tissues of unknown function, enabling researchers to generate and/or test multiple hypotheses at once (eg: [6, 7]), and in organisms potentially harboring symbiotic organisms that may or may not produce the transcripts of interest (eg: [810]), as RNA-Seq can simultaneously identify genes from microbes and their vectors/hosts.

Such a combination of low genome resource availability and enigmatic physiology exists in the stick and leaf insects (order Phasmatodea), or phasmids. Though common in the pre-molecular biology era through the Laboratory Stick Insect, Carausius morosus[11], phasmid research today is relatively limited. Few phasmids are pests of agricultural crops [11, 12], though they reach plague-like abundances in temperate forests [13, 14] and C. morosus is an invasive pest in several countries [15, 16]. All life stages of all species within the order feed exclusively on leaves [17]. This obligate folivory is relatively rare: Among insects it is known only from leaf beetles (Coleoptera: Chrysomelidae), while more basal “herbivores” such as grasshoppers and crickets (Orthoptera) will quite readily scavenge vertebrate meat, engage in cannibalism, and even hunt and kill other insects [18, 19]. Thus phasmids are an ideal system for studying the evolution of herbivory in the lower Polyneoptera.

Folivorous organisms benefit greatly from plant cell wall degrading enzymes (PCWDEs), a group that includes cellulases, hemicellulases, lignases, pectinases, and xylanases [17]. Once thought to be limited to microbes, endogenous (symbiont-independent) PCWDE production has since been found throughout the Animalia. In particular, cellulase (beta-1,4-endoglucanase; Enzyme Commission: genes from the Glycoside Hydrolase family 9 (GH9) are now believed to have existed in the ancestor of all Metazoan life [20, 21] as opposed to having been repeatedly acquired from microbes via horizontal gene transfer, as is thought to be origin of GH45 and GH48 cellulases in beetles [22, 23]. Among insects, endogenous cellulases have been found in lower and higher termites, cockroaches, crickets, beetles [2124], a firebrat [25], a springtail [26], and, recently, the phasmids. High cellulase activity in the anterior midguts of two phasmid species, Eurycantha calcarata and Entoria okinawaensis, was detected, the responsible proteins isolated, and the genes encoding them sequenced. Sequence homology and antigency against an insect cellulase anti-serum supported an endogenous, Insectan origin for the enzymes [27]. Such a process is slow and predicated on the translation of PCWDE genes into enzymes active against laboratory substrates like carboxymethylcellulose or crystalline cellulose, whose specificity and selectivity are imperfect [28, 29].

Whether or not phasmids contain other PCWDEs, such as the cellobiases (a type of beta-glucosidase; EC: that convert the products of cellulase into glucose monomers [24] or the pectinases (polygalacturonases; EC: that hydrolyze pectin into galacturonic acid monomers [17], was unknown. Presence of such active enzymes could explain the obligate folivory of the Phasmatodea and be a key factor in the order’s evolution [30], which is itself a puzzle as the sister order to the Phasmatodea is highly debated [31, 32]. Microbiological assays of the phasmid gut suggest digestion in the order is symbiont-independent [33], so any phasmid PCWDE genes are likely endogenous, yet whether the genes show homology to insect or microbe genes depends on their own evolutionary origin [23, 30]. Complicating the issue is the relative lack of genetic resources for phasmids or their most closely-related orders: Orthoptera [34], Embioptera [35], and Notoptera/Xenonomia [36]. Lastly, even if phasmids have PCWDE genes, their expression is not a guarantee, nor are they necessarily expressed in the gut region where they are most active. The phasmid midgut is physically differentiated into two sections: an anterior midgut (AMG) marked by circular pleating and folding, and a posterior midgut (PMG) studded irregularly by hollow bulbs with filamentous tubules called the “appendices of the midgut” that open into the midgut lumen [37]. The appendices may have an excretory or secretory function, and the possibility exists that they produce digestive enzymes that are carried forward into the AMG via countercurrent flow [38]. In the face of all these unknowns, next-generation sequencing is the best resource for answering questions of phasmid digestive physiology efficiently and effectively.

Here we used de novo transcriptome assembly to identify the genes expressed in the midguts of six species of Phasmatodea from four families, while greatly increasing the publicly available genetic resources for the order. We also used RNA-Seq on one exemplar species, Peruphasma schultei (Pseudophasmatidae) to quantitatively compare transcript expression between the AMG and PMG, and produced a genomic DNA library from the symbiont-free phasmid brain to confirm that identified transcripts were encoded by the insect itself. Our main goal was to identify the production organ of the Phasmatodea endogenous cellulase, while simultaneously creating an inventory of expressed PCWDE and other digestive genes in phasmids and generating hypotheses on their evolutionary origins and the putative functions of the midgut sections. This study serves as a necessary preliminary for more targeted molecular work. More broadly, our transcriptomes are useful for evolutionary analyses of non-cellulase PCWDEs in insects and identifying potential genes with biotechnological applications such as in processing biofuel feedstock or improving its rheology [39, 40].


Insects and microscopy

Insects used were Peruphasma schultei (Pseudophasmatidae), Sipyloidea sipylus (Diapheromeridae), Aretaon asperrimus (Heteropterygidae), and Extatosoma tiaratum, Medauroidea extradentata, and Ramulus artemis (Phasmatidae) cultured at room temperature in the Bohart Museum of Entomology, University of California, Davis. Phasmids were fed an ad libitum diet of privet (Ligustrum sp.) for Peruphasma, Eucalyptus for Extatosoma, and Rosa sp. for the others.

Library prep and sequencing

The RNA-Seq study of Peruphasma schultei made use of three biological replicates for both the anterior and the posterior midguts (AMG and PMG respectively). For each replicate, the guts of five fed, surface-sterilized, adult, male and female phasmids were removed under sterile conditions and emptied of their contents in several washes of 70% ethanol. Then the anterior and posterior sections were separately pooled and homogenized in TRIzol® Reagent. RNA was extracted according the Trizol-Plus protocol, which includes an on-column DNAase digestion step. Total RNA quality (and subsequent library quality) was checked with the Bioanalyzer 2100. Libraries were made using the Illumina TruSeq v2 kit according to the manufacturer’s instructions.

Hundred base pair paired-end sequencing was performed on the HiSeq 2000 and the raw data uploaded to the NCBI SRA Database [GenBank:SRP030474]. For quality control, low quality bases and adapter contamination were removed with the fastx toolkit [41] and the cutadapt software packages [42]. FastQC [43] was used to check the final quality of reads prior to de novo assembly. The number of reads generated for each biological replicate is shown in Table 1.

Table 1 Total reads and trinity results for each transcriptomic or genomic library

For gut transcriptomes of the other five species, the same method was used as for P. schultei with a few changes. For each species, only one biological replicate was produced for both the anterior and the posterior midgut. This library was made of pooled midguts (all females for E. tiaratum, M. extradentata, R. artemis, and S. sipylus, and a mixture of males and females for A. asperrimus). RNA was successfully extracted for all tissues with the exception of the S. sipylus PMG, for which the extraction failed and for which no additional specimens could be obtained. RNA-extraction and quality control were as for Peruphasma, but libraries were made using the NEBNext® Ultra™ RNA Library Prep Kit for Illumina kit according to the manufacturer’s instructions. Sequencing was on an Illumina HiSeq 2000, and the data uploaded to the NCBI SRA Database [GenBank:SRP038202]. The numbers of reads produced for each sample are shown in Table 1.

de novotranscriptome assembly

The Trinity assembler with the default parameters was used to generate de novo transcriptomes for all species using quality controlled reads [3]. TopHat (v2.04) was used for aligning reads to the transcriptome [44]. HTSeq [45] was used to quantify the number of reads aligning to each transcript. Gene and isoform abundances and expression levels from the P. schultei RNA-Seq data were quantified using RSEM (RNA-Seq by Expectation Maximization) [46]. This program was chosen over other programs as it does not rely on reference genomes, of which there are none for the Phasmatodea. For the P. schultei RNA-Seq, Trinity assembly yielded 135,622 transcripts (N50 contig length=1669). Differentially expressed genes were identified using EBSeq, an R package that compares isoform expression across two or more biological conditions, in this case AMG and PMG, using a Bayesian heirarchical model [47]. Differentially expressed genes (DEGs) were those with an adjusted p-value <0.05.

Transcriptome annotation and PCWDE identification

Due to the lack of closely related species with well-annotated genomes, or even consensus as to what is the most closely related order to the Phasmatodea, we used several methods to annotate the assembled transcripts. For all P. schultei transcripts and the top 500 most highly expressed transcripts for the other species, we used Blast2GO’s [48] tblastx program to compare each sequence to the NCBI translated nucleotide collection (nr) database, with an expect value threshold of e-6. Contigs with highly significant BLAST [49] hits were mapped to the Gene Ontology (GO) database and annotated using Blast2GO with an expect value threshold of e-6. InterPro annotations were performed using the Blast2GO remote connection to the InterProEBI server [48]. GO terms were modulated using ANNEX and GOSlim, using the “generic” mapping (goslim_generic.obo) available in Blast2GO (Figure 1). Potential metabolic pathways represented in the transcriptome were identified using the Kyoto Encyclopedia of Genes and Genomes (KEGG) [50] database via Blast2GO (Additional files 1, 2, 3, 4, 5 and 6). Enrichment analysis (Fisher’ s Exact Test via Blast2GO) was used to find enriched GO terms, with term filter value below 0.05, term filter mode “FDR,” and two-tailed test options selected (Figure 2). Annotations were added to those provided by RSEM.

Figure 1
figure 1

Comparisons of the GO terms expressed in the anterior and posterior midguts of the Phasmatodea. The top 500 most expressed transcripts are seen for each of the anterior (left) and posterior (right) midguts. A) Aretaon asperrimus. B) Extatosoma tiaratum. C) Medauroidea extradentata. D) Peruphasma schultei. E) Ramulus artemis.

Figure 2
figure 2

GO categories enriched for the most differentially expressed genes in each Peruphasma schultei midgut segment. Values are relative to the overall transcriptome as per Fisher’s exact test. A) Anterior midgut (posterior midgut values in red). B) Posterior midgut (anterior midgut values in red).

To specifically identify PCWDE-encoding transcripts, we downloaded nucleotide sequences for representative PCWDEs from the NCBI database, selecting known, endogenous insect proteins as well as fungal, bacterial, and protozoan proteins. The query sequences from NCBI were blast-ed against the full transcriptomes after removing low-quality reads, with an expect value threshold of e-10. Only transcriptome isoforms that aligned to at least 75% of the representative gene downloaded from NCBI were included in later transcript number analyses.

Amino acid alignment and phylogenetic analysis

For the putative PCWDEs, the transcript sequences from the phasmids were converted to amino acid sequences using the ExPASy online translation tool [51]. A representative sequence from each isogroup (comp#_c#) was selected based on E-value and Sim mean when compared to known enzymes in the NCBI database. The number of isogroups and isotigs (sequences) within each group is listed in Table 2, and Additional file 7: Table S7 shows these sequence names. Other known protein sequences for these enzymes were collected from the NCBI database from a diversity of organisms including bacteria, fungi, plants, other insects, nematodes, and, when available, protists, chordates, and other invertebrates (Additional file 8: Tables S8, S9, S10 and S11). The amino acid sequences were aligned using MUSCLE [52] and manually curated using Mesquite [53]. Further alignment and production of consensus sequences for clades were done using JalView [54]. These alignments were then searched for the known conserved regions for the active/catalytic sites for each enzyme type, identified using the Catalytic Site Atlas [55], with Blast alignment to confirm their presence in the phasmid transcripts.

Table 2 Number of PCWDE isogroups and isotigs in the phasmid midgut

MUSCLE-aligned sequences curated on Mesquite were converted to Phylip format [56]. For neighbor-joining trees, the Phylip program “seqboot” was run to make multiple datasets for bootstrapping, and the results run through “protdist” and “neighbor”, then the trees combined with “consense”. For parsimony trees, the “seqboot” datasets were run through “protpars” and “consense”. For maximum likelihood trees, the MUSCLE-alignment file as uploaded to the CIPRES portal ( [57] and run on RAxML-HPC2 on XSEDE for 1000 bootstrapping runs [58]. For Bayesian analysis, Mr.Bayes 3.2.2 [59] was run with the CIPRES datasets with 500000 generations for bootstrapping. Consensus trees were viewed and prepared for figures using FigTree 1.4.2 [60]. The Maximum Likelihood tables were chosen as the figures for this manuscript.

Testing for endogenous production of PCWDEs

The possibility existed that some transcripts came from microbial symbionts or contaminants. Table S7 shows which PCWDE encoding isogroups contained poly-adenylation signals, a feature predominantly of eukaryotic mRNA that that, unlike bacterial RNA, will pass in large amount through our cDNA synthesis method, and whose presence is used to suggest endogenicity [6163]. We also tested the translated transcripts for the presence of eukaryote-specific signal peptides using SignalP 4.1 ( [64], which is also evidence against bacterial origins for the transcripts. Such methods however cannot differentiate between enzymes produced by protozoan or fungal symbionts and those produced by insects, including insect-produced proteins whose genes were acquired from a eukaryote via horizontal gene transfer as has happened in beetles [23, 30, 65].

To further show that particular PCWDE genes are endogenous to the phasmid genome and not produced by gut symbionts or contaminants, we extracted DNA from non-gut tissue for next generation sequencing. Finding a gene encoding a transcript protein in the genome is a strong and frequently used indicator of endogenicity [27, 6668]. By using non-alimentary tissue we avoided aspecific amplication of microbial genes and recovered insect DNA alone, as has been done in other insects to show microbe-like genes are endogenously produced [6971], including the first discovery of endogenously produced cellulase in an insect [72]. We dissected out the brain under sterile conditions from one P. schultei individual and extracted DNA using the ChargeSwitch® gDNA Mini Tissue Kit (including the RNAase digestion step). Genomic Illumina libraries for paired-end 100 bp sequencing were then produced using the Illumina Truseq kit and validated using the bioanalyzer 2100. Sequencing was conducted on the Illumina HiSeq 2000, and the data uploaded to the NCBI SRA Database [GenBank:SRP030474]. The numbers of reads generated for the sample are shown in Table 1. We tested if the P. schultei cellulase and pectinase genes were endogenous in origin by mapping all genomic reads from the brain tissue back to our de novo midgut transcriptome assembly. We then took all the genomic reads that mapped to each PCWDE gut transcriptome gene and blasted them to the entire gut transcriptome to narrow the list of reads down to those that uniquely map to a single PCWDE gene (Table 3).

Table 3 Peruphasma schultei pectinase and cellulase genomic reads uniquely aligned to transcriptome isotigs

As a final test, we blasted all PCWDE transcripts to the draft genome for Timema cristinae (Phasmatodea: Timematidae), which was under development but available at the webpage of the Nosil Lab of Evolutionary Biology at the university of Sheffield, UK ( The Timematidae are considered the sister group to all other Phasmatodea [36]. While the Timema genome is not guaranteed to have the same genes as the species we examined, finding our transcripts within the Timema genome would be strong evidence that the gene is both endogenous in and ancestral to Phasmatodea.


Phasmid midgut de novotranscriptome assemblies

From the extracted RNA libraries of the pooled AMGs or PMGs of each phasmid species we generated approx 54 million high quality, 100 bp, paired-end sequence reads (Table 1), with the exception of Sipyloidea sipylus (Diapheromeridae), from which we were only able to successfully extract RNA from the anterior midguts. de novo assembly of each midgut section’s library with Trinity [3] produced ~114-170 thousand transcript contigs per species (Table 1). All reads and the final transcriptome for Peruphasma schultei are available under BioProject accession PRJNA221630, and for all other phasmids under PRJNA238833.

Annotation of the P. schulteitranscriptomes

Approximately 30323 (22%) of the 135622 transcriptomic sequences had BLAST hits (Additional file 9: Table S9), most of which were homologous to sequences from other insects and arthropods (Additional file 10: Figure S1). More genes were homologous to the red flour beetle Tribolium castaneum than to insects from more closely related orders, reflecting the relative dearth of available genetic information from insects in the Polyneoptera clade and the relatively recent sequencing of Tribolium. The high percentage of sequences with no blast hits (orthologs) is unsurprising given the lack of an annotated, sequenced Phasmatodea genome. The sequences may have represented noncoding regions, wrongly-assembled contigs, or novel genes whose significance is unknown.

Differential gene expression across the phasmid midgut

RNA-Seq analysis of P. schultei suggested compartmentalization of gut function (Figure 2), as found in other insects [24] and plant cell wall consuming organisms. Additional files 11 and 12 list all differentially expressed genes (DEGs) between the P. schultei midgut sections, defined as genes with a Posterior Probability of Differential Expression (PPDE) >0.95 [47]. We also defined genes as being highly expressed if their expression levels were 10× higher than the mean for that midgut segment. Over 4000 genes were differentially expressed in each gut section, with 2318 genes expressed only in the AMG and 1309 expressed only in the PMG. We found 318 highly expressed genes for the AMG and 648 for the PMG (Additional files 13 and 14).

Analysis of the most highly expressed sequences in each midgut section for all species further suggested compartmentalization of digestion, or at least enzyme gene expression. All species showed similar GO category profiles for each midgut section (Figure 1), with nearly 50% reduction in hydrolase gene expression in the PMG relative to the AMG. Enzyme-encoding transcripts that break down polymers at internal sites, such as serine proteases, lipases, and PCWDEs [73] were more abundant in the AMG, as were carboxylesterases transcripts and sugar hydrolases. Transcripts abundant in the PMG encoded enzymes that break down dimers and monomers, such as dipeptidases, phospholipases, and trehalase, as well as some cell membrane receptor proteins and cytochrome P450s.

Phasmatodea midgut PCWDEs

Among the sugar hydrolases were several isogroups of PCWDEs including cellulases in the GH9 family and cellobiase, which together can digest cellulose polymers completely into sugar, and the pectinase endopolygalacturonase. We also found transcripts encoding beta-1,3-glucanase (EC:, a polysaccharide-degrading enzyme family known mostly from Lepidoptera larval midguts and expressed in response to feeding on a diet containing bacteria [74]. These four enzymes (cellulases, cellobiases, pectinases, and beta-1,3-glucanases) were used in the manual annotations.

Amino acid alignment and phylogenetic analysis

The phasmid cellulases aligned most closely with other Polyneopteran cellulases — including the known, active, endogenous cellulases isolated from the phasmids Eurycantha calcarata and Entoria okinawaensis[27] — as well as those of other invertebrates, tunicates, plants, and actinomycete bacteria, but not with nematode or beetle cellulases. Phasmid endoglucanases are of the GH9 family thought to be ancestral to all animal life [21], as opposed to the GH5, 45, or 48 cellulases found in nematodes and beetles [23]. The phasmid transcripts either themselves included or were homologous to transcripts including the known active sites invariant in GH9 cellulases, based on work on Thermobifida/Thermomonospora fusca (PDB: 1js4) [56, 57]: namely two conserved Asp’s (D55, D58) functioning in catalytic base activity and a Glu residue (E461) that functions as the catalytic acid (Figure 3). Phylogenetic analysis could not resolve domain or phylum-level relationships among the sequences tested (bootstrap values <10). Every cellulase transcript we isolated was homologous to two sequences from the Timema genome.

Figure 3
figure 3

Sections of cellulase amino acid sequence alignments containing conserved/active site residues. Catalytic sites are identified by the grey arrows [75, 76]. List of Phasmatodea sequences in Additional file 7, and all others in Additional file 7. Similar sequences from groups of related species were combined into consensus sequences. Red letters show identity with the overall consensus sequence based on conservation of physiochemical properties. Quality is the inverse likelihood of observing mutations based on the BLOSUM62 matrix [54]. Strongylocentrotus purpuratus and Flavobacterium branchiophilum are abbreviated.

The phasmid pectinase sequences aligned most closely with those of gamma proteobacteria, rather than other insects or eukaryotes. The alignment also showed Hemipteran and beetle pectinases as most similar to fungal pectinases, the latter homology already noted in the literature [30, 65]. Nearly all phasmid pectinase enzymes contained the four conserved regions of the catalytic sites, based on work on Erwinia carotovora (PDB: 1bhe) [77]: Asn226-Thr227-Asp228, Gly248-Asp249-Asp250, Gly274-His275-Gly276, and Arg305-Ile306-Lys307 (Figure 4). An exception is the transcripts from A. asperrimus, whose Arg residue is replaced with a Tyrosine (Y305). Phylogenetic analysis suggested the phasmid polygalacturonases are a monophyletic group within those of the gamma proteobacteria (Figure 5).

Figure 4
figure 4

Section of pectinase amino acid sequence alignments containing conserved/active site residues. See caption to Figure 3. Catalytic residues identified with grey bars [77]. Thermoanaerobacterium thermosaccharolyticum is abbreviated.

Figure 5
figure 5

Maximum likelihood tree for the pectinases. Phylogenetic analysis of amino acid sequences made with RAxML-HPC2 on the XSEDE system [57]. Numbers are bootstrap values (1000 runs). Branch widths based on bootstrap value, branch colors based on clade. Branch lengths based on the mean number of nucleotide substitutions per site (Scale Bar =0.9). All Phasmatodea sequences (Additional file 7) were a monophyletic group among the gamma proteobacteria. Thermoanaerobacterium thermosaccharolyticum is abbreviated.

The possibility exists that the pectinase transcripts came from gut bacteria or contaminants rather than phasmid genes. However, many of the pectinase transcripts had poly-A tails, as did those of other PCWDEs (†in Table S7), and all the PCWDE transcripts that were not truncated at the 5’ end had eukaryotic-specific signal peptides. This includes transcripts that contained complete open reading frames (*in Table S7) as well as those that were 3’ truncated. Each pectinase-encoding contig also had multiple (in most cases very many) matching genomic reads from brain tissue that uniquely aligned to them (Table 3). The same matching genomic reads could be found for cellulases, which have been demonstrated to be endogenously produced in phasmids [27] and are endogenously produced in many other insects [17] and metazoans [21]. However, none of the pectinase transcripts had homologues in the Timema genome.

The phasmid beta-glucosidases/cellobiases were in the GH1 family and aligned most with those of other insects. The phasmid transcripts mostly had the conserved residues of beta-glucosidases, including the catalytic sites, based on work with white clover, Trifolium repens (PDB: 1cbg) [78]: Arg75, His119, Asn163, Glu164, Asn306, Tyr308, Glu378, and Trp420 (Figure 6). Phylogenetic analysis suggested the phasmid cellobiases are nearly all monophyletic, except a strongly-supported clade consisting of one isogroup from each species but S. sipylus (Figure 7). Analysis could not determine interclass relationships among insect beta-glucosidases, but suggested the enzyme existed in the common ancestor of the Insecta. Every beta-glucosidase transcript we isolated was homologous to four sequences from the Timema genome.

Figure 6
figure 6

Sections of cellobiase amino acid sequence alignments containing conserved/active site residues. See caption to Figure 3. Catalytic and conserved residues identified with grey arrows [78].

Figure 7
figure 7

Maximum likelihood tree for the beta-glucosidases. Phylogenetic analysis of amino acid sequences made with RAxML-HPC2 on the XSEDE system [57]. Numbers are bootstrap values (1000 runs). Branch widths based on bootstrap value, branch colors based on clade. Branch lengths based on the mean number of nucleotide substitutions per site (Scale Bar =0.7). The Phasmatodea sequences (Additional file 7) formed two strongly supported monophyletic group with weak relationships to other insect groups.

The phasmid beta-1,3-glucanases were in the GH16 family and aligned most closely with other insect enzymes (Figure 8), however this is a relatively recently described enzyme with few recorded sequences in the literature or NCBI database. This is the first known record of endogenous beta-1,3-glucanase in the Polyneoptera. The phasmid beta-1,3-glucanases could be divided into four clear, monophyletic groups (Figure 9) with no more than one representative isogroup per each of the six phasmid species. Each group differed in their homology with the known consensus pattern for catalytically active beta-1,3-glucanases based on work on Bacillus licheniformis[79]: E-[LIV]-D-[LIVF]-x(0,1)-E-x(2)-[GQ]-[KRNF]-x-[PSTA] (Figure 8: 342–353). One group of six sequences had 11/12 amino acids conserved, a group of four had 10/12, another group of six had 8/12, and the final group consisting of one A. asperrimus sequence had 6/12. The last amino acid in the Bacillus region was not conserved among any phasmids, nor is it conserved among the Lepidoptera sequences, which were also 11/12, or many other organism sequences sampled. Every beta-1,3-glucanase transcript we isolated had six to eight homologues in the Timema genome.

Figure 8
figure 8

Sections of beta-1,3-glucanase amino acid sequence alignments containing conserved/active site residues. See caption to Figure 3. Conserved, twelve amino acid region identified with grey bar [79]. The four groups of Phasmatodea gene are separated by black lines.

Figure 9
figure 9

Maximum likelihood tree for the beta-1,3-glucanases. Phylogenetic analysis of amino acid sequences made with RAxML-HPC2 on the XSEDE system [57]. Numbers are bootstrap values (1000 runs). Branch widths based on bootstrap value, branch colors based on clade. Branch lengths based on the mean number of nucleotide substitutions per site (Scale Bar =0.7). The Phasmatodea sequences formed four strongly supported groups.


Using high coverage sequencing of RNA expressed in their midguts, we were able to produce high quality transcriptomes of several Phasmatodea species. This new data doubles the genera of phasmids with publicly available genetic resources on the NCBI databases, increasing the amount of annotated genes available for future work not only on Phasmatodea, but also on the Polyneoptera in general. Covering six species in four families, while drawing from the draft genome of a seventh species in a fifth family, the data suggests the differential expression and enzyme gene diversity of the phasmid midgut sections is mostly conserved throughout the order. Our findings will serve as a reference set for studying phasmid digestion and a jumping point for future proteomic and biochemical assays. The abundance of PCWDE isogroups in phasmids is relatively high, and the diversity of PCWDE types is comparable to those in certain leaf beetles (Chrysomelidae) like Phaedon cochleariae[69, 71] or wood-boring beetles (Cerambycidae) like Anoplophora glabripennis[80]. The current record is likely Diabrotica virgifera virgifera (Chrysomelidae) with seventy-eight genes putatively encoding proteins from the same four enzyme classes studied here [55]. As Phasmatodea and Chrysomelidae are among the few insect groups to be exclusively folivorous, a possible correlation exists between that dietary niche and a diverse PCWDE complement.

For de novo transcriptomes, assemblers such as Trinity often cannot differentiate between homologous genes and isoforms or allelic variants of the same gene. They can potentially overestimate the number of isotigs (single or groups of contigs that should each constitute one splice variant) within an isogroup (all isotigs for one gene, identified by Trinity’s output as comp#_c#) [81]. Combined with relatively low genetic resource availability for closely related insects and relatively high representation of species like the aforementioned beetles, we cannot be certain whether phasmids express more or fewer PCWDEs than the average herbivorous insect. In addition to using programs like RSEM designed to reduce such errors [46], comparing the number of reads mapping to a locus on the genome can be used to infer the true isoform number and account for inflation [82]. That most P. schultei transcriptome sequences (contigs) had more matching genome reads (Table 3) than their corresponding isogroup has members (Table 2) suggests that our contig numbers represent true isoforms within each isogroup, rather than an overestimation due to mis-assembly [82, 83]. These phasmid isoforms may reflect multiple gene copies or alternatively spliced genes, either case suggesting a diverse complement of proteins working together to fully digest multiple varieties of carbohydrate polymer. a highly derived genetic capacity for plant cell wall breakdown [84]. However, because some isotigs were truncated at the 3’ or 5’ end, the possibility exists that certain transcripts represent different ends of a single gene. Future work using RACE-PCR from primers based on the transcripts identified here would produce full-length cDNA sequences that will determine which transcripts represent unique genes and which are fragments, Such genes could then be expressed into insect cell cultures for use in downstream enzymatic activity assays [85].

Previous research has confirmed that the endogenous cellulase genes we demonstrated are most highly expressed in the anterior midgut are also most highly active in the anterior midgut [27], making it the site of both cellulase translation and action. The physical structure of the AMG supports this hypothesis: the pleating and folding serves to greatly increase the available surface area of the AMG while slowing down the transit speed of food, increasing the amount of time and space available for cellulase enzymes to hydrolyze ingested plant material. Cellulase activity falls to nearly nothing in the PMG, tracking with cellulase gene expression. If we extend the results of cellulases to those of the other PCWDEs, then we hypothesize that phasmid digestive enzymes are active in the same region of the gut where they are expressed, making the pleated AMG the site of primary plant cell wall and polymer digestion and the PMG the site of secondary digestion of smaller oligomers at most.

We also hypothesize that phasmids can fully digest cellulose into glucose, as they have the two enzymesnecessary to do so, and also actively degrade pectin into galacturonic acid. Such digestive abilities could explain how phasmids survive on otherwise uncommon, obligately folivorous diets: by fully breaking down plant cell walls into assimilatable nutrients rather than just degrading the walls to access the nutrient-rich cytoplasm within. As transcriptomics only demonstrates gene expression, not translation or activity, these hypotheses cannot be confirmed with this data alone. However, the fact that phasmids have active cellulases [27] and the presence of the relevant catalytic residues on the phasmid cellulase, pectinase, and cellobiase transcripts (Figures 3,4,6) and some beta-1,3-glucanase transcripts (Figure 8) suggests the transcripts code for functional enzymes, supporting the hypothesis that these enzymes are indeed actively degrading plant cell walls. We have thus provided the necessary preliminary work justifying biochemical and proteomic assays into cellobiase, pectinase, and beta-glucanase activity in the phasmid gut.

Phasmid pectinases are all endopolygalacturonases in the GH28 group, known in insects only from the beetles and the Hemiptera, but they show homology and align to those from gamma proteobacteria (Figure 4). Pectinase genes were also absent in the Timema genome. Our pectinase transcripts may have come from a bacterial symbiont. The successful mapping of all P. schultei pectinase and cellulase transcripts to genes in the P. schultei genome, the presence of eukaryote-specific poly-A tails and signal peptides on phasmid transcripts with complete open reading frames, previous studies with P. schultei and R. artemis suggesting their digestion is symbiont independent [33], and the absence of characteristic paunches for microbial fermentation [37], all tentatively suggest these pectinases are encoded in the phasmids own genome and not produced by gut microbes. However, the possibility remains that the samples were contaminated by a non-symbiotic microbe: either a rare bacteria that poly-adenylates its RNA or a fungal symbiont that acquired a bacterial gene via horizontal transfer.

A more parsimonious hypothesis is that the phasmid pectinase gene was acquired through horizontal transfer from a bacterial ancestor, much as the beetle pectinases are thought to have been acquired though horizontal transfer from an Ascomycete fungus [23, 30, 65], or as leaf beetle xylanases may have also transferred from a gamma proteobacteria [69]. The absence of the genes in Timema would suggest either that the transfer event occurred after the split between the Timematidae and the other Phasmatodea, or that the pectinase genes are ancestral to both and lost in Timema. Lastly, the similarity between phasmid and bacterial sequences could simply be an artefact of the over-representation of microbial and dearth of animal pectinases in the NCBI database at this time [86]. Using long range or RACE PCR to clone entire genes from phasmid genomic DNA and get introns would conclusively demonstrate whether or not the pectinases are endogenously produced or not, and such work on all six species studied here is underway.

Whether pectinase genes exist in other Polyneoptera remains to be seen, but would help determine when the horizontal transfer event could have occurred. Failure to find endogenous pectinases in closely related insects would mean the transfer occurred in an early Phasmatodea ancestor: a development that would have expanded the digestive abilities of the order and may have played a significant role in their evolution of obligate folivory. PCWDE diversity could also be correlated to the development of the longer and larger body sizes of the Euphasmatodea. Broader, multi-phyla, phylogenetic analysis for pectinolytic enzyme genes in the Animalia can answer this question [35].

Phasmid cellobiases are GH1, as are those of other insects who produce them endogenously like the higher termites [87]. Such species can break down cellobiose independently, unlike the lower termites that have symbiotic microorganisms to produce their cellobiases for them. The beta-1,3-glucanases are GH16, similar to those found in Lepidoptera [74], however the prevalence of this recently described gene in animals has not been sufficiently examined. A greater sampling of this gene's presence in other animal, fungal, and bacterial species, as well as biochemical studies to determine the conserved catalytic residues for the protein, are needed before ordinal-level hypotheses can be made for the enzyme's evolutionary history. So far, the animal enzymes appear most closely related, and we hypothesize that at least three distinct beta-1,3-glucanase gene families existed in an early Phasmatodea ancestor, including the ancestor of the Timematidae.

Our work promotes phasmids as a potentially high-value source of novel PCWDEs for biotechnological applications [70]. Cellulases and pectinases are highly sought after by the biofuel industry to degrade feedstock into the monomers later converted into fuel, or to improve the flow rate of the material by reducing the amount of solid matter [39]. Pectinases are also used in the production of coffee, tea, and juice, and in waste-water treatment [88]. Phasmid PCWDEs could be introduced into bacteria or fungi like Trichoderma reesei, for industrial-scale enzyme production or direct use in bioreactors for wastewater treatment or biofuel production [89].

Our de novo midgut transcriptomes enabled us to survey all expressed PCWDEs of the Phasmatodea at once and identify conserved catalytic domains, justifying downstream translation and activity level analyses. A benefit of this is system is the increased speed and efficiency compared to the converse [90]: running chemical assays to identify enzyme activity, using proteomics to identify the amino acid sequence of isolated enzymes, and working backwards from there to design a primer for the enzyme-encoding gene and hope it exists within the target organism’s genome itself and not a symbiont or contaminant [91]. Another benefit is that transcriptomics can reveal genes useful for phylogenetic analysis but that are not translated or whose proteins are modified post-translation such that the standard biochemical tests do not detect their function. An associated drawback is that expressed genes are not necessarily translated into active proteins, nor are they necessarily active at the site of expression [92]. However, when a reference genome is not available, a transcriptome can provide large sets of potential genes for study and, combined with genomic data, can determine whether or not they are endogenous to the target organism, which cannot be determined by homology alone. de novo transcriptome assembly combined with RNA-Seq is a powerful tool for suggesting putative functions for unknown tissues in understudied organisms and directions for future study.


The folivorous Phasmatodea are an ideal system to study the evolution of obligate herbivory, yet a paucity of genetic resources and poorly understood basic biology impede such work. Using RNA-Seq, we demonstrated a diversity of plant cell wall degrading enzymes expressed differentially in the anterior section of the phasmid midgut. Of these, the cellulases, cellobiases, and beta-1,3-glucanases are likely all encoded in the insect’s own genome, as are the pectinases, though we could not definitively rule out a microbial source for the latter. Such an abundance of endogenous enzymes was not expected from the Polyneoptera, raising important questions on their evolutionary history. The efficiency by which our de novo transcriptomes generated new genomic resources and hypotheses for future research on Polyneopteran digestion demonstrate the power of such methods to analyze organisms lacking sequenced genomes. Our findings strongly encourage expanding the searches for PCWDEs, most notably the pectinases and beta-1,3-glucanases, into other, lower Polyneopteran insects.

Availability of supporting data

All reads and sequence files described in the manuscript are available under BioProject accessions PRJNA221630 for P. schultei and PRJNA238833 for the other phasmids.



Anterior midgut


Cyberinfrastructure for phylogenetic research


Differentially expressed genes


Enzyme commission


Glycoside hydrolase


Gene ontology


Highly expressed gene


Kyoto encyclopedia of genes and genomes


National center for biotechnology information


Plant cell wall degrading enzyme


Protein data bank


Posterior midgut


Posterior probability of differential expression


Reads per kilobase per million


RNA-Seq by expectation maximization


Sequence read archive


Extreme science and engineering discovery environment.


  1. Ekblom R, Galindo J: Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity (Edinb). 2011, 107 (1): 1-15. 10.1038/hdy.2010.152.

    Article  CAS  Google Scholar 

  2. Metzker ML: Sequencing technologies - the next generation. Nat Rev Genet. 2010, 11 (1): 31-46. 10.1038/nrg2626.

    Article  CAS  PubMed  Google Scholar 

  3. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29 (7): 644-652. 10.1038/nbt.1883.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  4. Xia Z, Xu H, Zhai J, Li D, Luo H, He C, Huang X: RNA-Seq analysis and de novo transcriptome assembly of Hevea brasiliensis. Plant Mol Biol. 2011, 77 (3): 299-308. 10.1007/s11103-011-9811-z.

    Article  CAS  PubMed  Google Scholar 

  5. Francis WR, Christianson LM, Kiko R, Powers ML, Shaner NC, Haddock SH D: A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly. BMC Genomics. 2013, 14: 167-10.1186/1471-2164-14-167.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  6. Poelchau MF, Reynolds JA, Denlinger DL, Elsik CG, Armbruster PA: A de novo transcriptome of the Asian tiger mosquito, Aedes albopictus, to identify candidate transcripts for diapause preparation. BMC Genomics. 2011, 12: 619-10.1186/1471-2164-12-619.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  7. Hull JJ, Geib SM, Fabrick JA, Brent CS: Sequencing and de novo assembly of the western tarnished plant bug (Lygus hesperus) transcriptome. PLoS One. 2013, 8 (1): e55105-10.1371/journal.pone.0055105.

    Article  PubMed Central  PubMed  Google Scholar 

  8. McCarthy CB, Santini MS, Pimenta PF, Diambra LA: First comparative transcriptomic analysis of wild adult male and female Lutzomyia longipalpis, vector of visceral leishmaniasis. PLoS One. 2013, 8 (3): e58645-10.1371/journal.pone.0058645.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  9. Yazawa T, Kawahigashi H, Matsumoto T, Mizuno H: Simultaneous transcriptome analysis of Sorghum and Bipolaris sorghicola by using RNA-seq in combination with de novo transcriptome assembly. PLoS One. 2013, 8 (4): e62460-10.1371/journal.pone.0062460.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  10. Lehnert EM, Mouchka ME, Burriesci MS, Gallo ND, Schwarz JA, Pringle JR: Extensive differences in gene expression between symbiotic and aposymbiotic Cnidarians. G3 (Bethesda). 2014, 4 (2): 277-295. 2014.

    Article  CAS  Google Scholar 

  11. Bedford GO: Biology and ecology of the phasmatodea. Annu Rev Entomol. 1978, 23: 125-149. 10.1146/annurev.en.23.010178.001013.

    Article  Google Scholar 

  12. Kasenene JM: Forest association and phenology of wild coffee in Kibale National Park, Uganda. Afr J Ecol. 1998, 36: 241-250. 10.1046/j.1365-2028.1998.00142.x.

    Article  Google Scholar 

  13. Readshaw JL: Phasmatid outbreaks revisiting?. Aust J Zool. 1990, 38: 343-346. 10.1071/ZO9900343.

    Article  Google Scholar 

  14. Jurskis V, Turner J: Eucalypt dieback in eastern Australia: a simple model. Aust For. 2002, 65 (2): 87-98. 10.1080/00049158.2002.10674859.

    Article  Google Scholar 

  15. Headrick D, Wilen CA: Indian Walking Stick. Pest Notes. 2011, 74157: 1-3.

    Google Scholar 

  16. Borges PA, Reut M, Ponte NB, Quartau JA, Fletcher M, Sousa AB, Pollet M, Soares AO, Marcelino J, Rego C: New records of exotic spiders and insects to the Azores, and new data on recently introduced species. Arquipélago Life and Marine Sciences. 2013, 30: 57-70.

    Google Scholar 

  17. Calderón-Cortés N, Quesada M, Watanabe H, Cano-Camacho H, Oyama K: Endogenous plant cell wall digestion: a key mechanism in insect evolution. Annu Rev Ecol Evol Syst. 2012, 43: 45-71. 10.1146/annurev-ecolsys-110411-160312.

    Article  Google Scholar 

  18. Whitman DW, Blum MS, Slansky F: Carnivory in Phytophagous Insects. Functional Dynamics of Phytophagous Insects. Edited by: Ananthakrishnan TN. 1994, New Delhi: Oxford & IBH Publishing Co. Pvt. Ltd, 161-205.

    Google Scholar 

  19. Whitman DW, Richardson ML: Necrophagy in grasshoppers: Taeniopoda eques feeds on mammal varrion. J Orthopt Res. 2010, 19 (2): 377-380. 10.1665/034.019.0228.

    Article  Google Scholar 

  20. Lo N, Watanabe H, Sugimura M: Evidence for the presence of a cellulase gene in the last common ancestor of bilaterian animals. Proc Biol Sci. 2003, 270 (Suppl 1): S69-72.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  21. Davison A, Blaxter M: Ancient origin of glycosyl hydrolase family 9 cellulase genes. Mol Biol Evol. 2005, 22 (5): 1273-1284. 10.1093/molbev/msi107.

    Article  CAS  PubMed  Google Scholar 

  22. Calderón-Cortés N, Watanabe H, Cano-Camacho H, Zavala-Páramo G, Quesada M: cDNA cloning, homology modelling and evolutionary insights into novel endogenous cellulases of the borer beetle Oncideres albomarginata chamela (Cerambycidae). Insect Mol Biol. 2010, 19 (3): 323-336. 10.1111/j.1365-2583.2010.00991.x.

    Article  PubMed  Google Scholar 

  23. Eyun SI, Wang H, Pauchet Y, Ffrench-Constant RH, Benson AK, Valencia-Jimenez A, Moriyama EN, Siegfried BD: Molecular evolution of glycoside hydrolase genes in the western corn rootworm (Diabrotica virgifera virgifera). PLoS One. 2014, 9 (4): e94052-10.1371/journal.pone.0094052.

    Article  PubMed Central  PubMed  Google Scholar 

  24. Fischer R, Ostafe R, Twyman RM: Cellulases from insects. Adv Biochem Eng Biotechnol. 2013, 136: 51-64.

    CAS  PubMed  Google Scholar 

  25. Treves DS, Martin MM: Cellulose digestion in primitive hexapods: effect of ingested antibiotics on gut microbial populations and gut cellulase levels in the firebrat, Thermobia domestica (Zygentoma, Lepismatidae). J Chem Ecol. 1994, 20 (8): 2003-2020. 10.1007/BF02066239.

    Article  CAS  PubMed  Google Scholar 

  26. Hong SM, Sung HS, Kang MH, Kim C-G, Lee Y-H, Kim D-J, Lee JM, Kusakabe T: Characterization of cryptopygus antarcticus endo-β-1, 4-glucanase from bombyx Mori expression systems. Mol Biotechnol. 2014, 56 (10): 878-889. 10.1007/s12033-014-9767-8.

    Article  CAS  PubMed  Google Scholar 

  27. Shelomi M, Watanabe H, Arakawa G: Endogenous cellulase enzymes in the stick insect (Phasmatodea) gut. J Insect Physiol. 2014, 60: 25-30.

    Article  CAS  PubMed  Google Scholar 

  28. Watanabe H, Tokuda G: Cellulolytic systems in insects. Annu Rev Entomol. 2010, 55: 609-632. 10.1146/annurev-ento-112408-085319.

    Article  CAS  PubMed  Google Scholar 

  29. Willis JD, Oppert C, Jurat-Fuentes JL: Methods for discovery and characterization of cellulolytic enzymes from insects. Insect Sci. 2010, 17: 184-198. 10.1111/j.1744-7917.2010.01322.x.

    Article  CAS  Google Scholar 

  30. Kirsch R, Gramzow L, Theissen G, Siegfried BD, Ffrench-Constant RH, Heckel DG, Pauchet Y: Horizontal gene transfer and functional diversification of plant cell wall degrading polygalacturonases: key events in the evolution of herbivory in beetles. Insect Biochem Mol Biol. 2014, 52C: 33-50.

    Article  Google Scholar 

  31. Trautwein MD, Wiegmann BM, Beutel R, Kjer KM, Yeates DK: Advances in insect phylogeny at the dawn of the postgenomic era. Annu Rev Entomol. 2012, 57: 449-468. 10.1146/annurev-ento-120710-100538.

    Article  CAS  PubMed  Google Scholar 

  32. Letsch H, Simon S: Insect phylogenomics: new insights on the relationships of lower neopteran orders (Polyneoptera). Syst Entomol. 2013, 38 (4): 783-793. 10.1111/syen.12028.

    Article  Google Scholar 

  33. Shelomi M, Lo WS, Kimsey LS, Kuo CH: Analysis of the gut microbiota of walking sticks (Phasmatodea). BMC Res Notes. 2013, 6 (1): 368-10.1186/1756-0500-6-368.

    Article  PubMed Central  PubMed  Google Scholar 

  34. Terry MD, Whiting MF: Mantophasmatodea and phylogeny of the lower neopterous insects. Cladistics. 2005, 21: 240-257. 10.1111/j.1096-0031.2005.00062.x.

    Article  Google Scholar 

  35. Letsch HO, Meusemann K, Wipfler B, Schutte K, Beutel R, Misof B: Insect phylogenomics: results, problems and the impact of matrix composition. Proc Biol Sci. 2012, 279 (1741): 3282-3290. 10.1098/rspb.2012.0744.

    Article  PubMed Central  PubMed  Google Scholar 

  36. Plazzi F, Ricci A, Passamonti M: The mitochondrial genome of Bacillus stick insects (Phasmatodea) and the phylogeny of orthopteroid insects. Mol Phylogenet Evol. 2011, 58 (2): 304-316. 10.1016/j.ympev.2010.12.005.

    Article  CAS  PubMed  Google Scholar 

  37. Shelomi M, Kimsey LS: Vital staining of the stick insect digestive system identifies appendices of the midgut as novel system of excretion. J Morphol. 2014, 275 (6): 623-633. 10.1002/jmor.20243.

    Article  CAS  PubMed  Google Scholar 

  38. Monteiro EC, Tamaki FK, Terra WR, Ribeiro AF: The digestive system of the "stick bug" Cladomorphus phyllinus (Phasmida, Phasmatidae): a morphological, physiological and biochemical analysis. Arthropod Struct Dev. 2014, 43 (2): 123-134. 10.1016/j.asd.2013.11.005.

    Article  PubMed  Google Scholar 

  39. Geddes CC, Nieves IU, Ingram LO: Advances in ethanol production. Curr Opin Biotechnol. 2011, 22 (3): 312-319. 10.1016/j.copbio.2011.04.012.

    Article  CAS  PubMed  Google Scholar 

  40. Jurat-Fuentes JL, Oppert C, Klingeman W, Oppert B: Identification and Characterization of Insect Cellulolytic Systems for Plant Biomass Degradation. Sun Grant Initiative: Southeastern Regional Center. 2011, Knoxville, TN: University of Tennessee, 24-,

    Google Scholar 

  41. Hannon G: FASTX-toolkit. 2012, [http://hannonlabcshledu/fastx_toolkit]

    Google Scholar 

  42. Martin M: Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet journal. 2011, 17 (1): 10-12. 10.14806/ej.17.1.200.

    Article  Google Scholar 

  43. Andrews S: FastQC: A Quality Control Tool for High Throughput Sequence Data. 2010, Cambridge, UK: Babraham Bioinformatics

    Google Scholar 

  44. Trapnell C, Pachter L, Salzberg SL: TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  45. Anders S: HTSeq: analysing high-throughput sequencing data with python. 2010, []

    Google Scholar 

  46. Li B, Dewey CN: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC bioinformatics. 2011, 12: 323-10.1186/1471-2105-12-323.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  47. Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BMG, Haag JD, Gould MN, Stewart RM, Kendziorski C: EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013, 29 (8): 1035-1043. 10.1093/bioinformatics/btt087.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  48. Conesa A, Gotz S: Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int J Plant Genom. 2008, 2008: 619832-

    Google Scholar 

  49. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410. 10.1016/S0022-2836(05)80360-2.

    Article  CAS  PubMed  Google Scholar 

  50. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y: KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008, 36 (Database issue): D480-D484.

    CAS  PubMed Central  PubMed  Google Scholar 

  51. Artimo P, Jonnalagedda M, Arnold K, Baratin D, Csardi G, de Castro E, Duvaud S, Flegel V, Fortier A, Gasteiger E, Grosdidier A, Hernandez C, Ioannidis V, Kuznetsov D, Liechti R, Moretti S, Mostaguir K, Redaschi N, Rossier G, Xenarios I, Stockinger H: ExPASy: SIB bioinformatics resource portal. Nucleic Acids Res. 2012, 40 (W1): W597-W603. 10.1093/nar/gks400.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  52. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  53. Maddison WP, Maddison DR: Mesquite: a modular system for evolutionary analysis. Version 2.75. 2011, []

    Google Scholar 

  54. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ: Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009, 25 (9): 1189-1191. 10.1093/bioinformatics/btp033.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  55. Furnham N, Holliday GL, de Beer TA, Jacobsen JO, Pearson WR, Thornton JM: The catalytic site atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Res. 2014, 42 (Database issue): D485-489.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  56. Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. 2005, Seattle: Department of Genome Sciences, University of Washington

    Google Scholar 

  57. Miller MA, Pfeiffer W, Schwartz T: Creating the CIPRES Science Gateway for Inference of Large Phylogenetic Trees. Gateway Computing Environments Workshop (GCE). 2010, 1-8. IEEE

    Chapter  Google Scholar 

  58. Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22 (21): 2688-2690. 10.1093/bioinformatics/btl446.

    Article  CAS  PubMed  Google Scholar 

  59. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012, 61 (3): 539-542. 10.1093/sysbio/sys029.

    Article  PubMed Central  PubMed  Google Scholar 

  60. Rambaut A: FigTree, a graphical viewer of phylogenetic trees. 2007, [http://treebioedacuk/software/figtree]

    Google Scholar 

  61. Perl A, Rosenblatt JD, Chen IS, DiVincenzo JP, Bever R, Poiesz J, Abraham GN: Detection and cloning of new HTLV-related endogenous sequences in man. Nucleic Acids Res. 1989, 17 (17): 6841-6854. 10.1093/nar/17.17.6841.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  62. Edmonds M: A history of poly A sequences: from formation to factors to function. Prog Nucleic Acid Res Mol Biol. 2002, 71: 285-389.

    Article  CAS  PubMed  Google Scholar 

  63. Davis R, Shi Y: The polyadenylation code: a unified model for the regulation of mRNA alternative polyadenylation. J Zhejiang Univ Sci B. 2014, 15 (5): 429-437. 10.1631/jzus.B1400076.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  64. Petersen TN, Brunak S, von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011, 8 (10): 785-786. 10.1038/nmeth.1701.

    Article  CAS  PubMed  Google Scholar 

  65. Shen Z, Denton M, Mutti N, Pappan K, Kanost MR, Reese JC, Reeck GR: Polygalacturonase from Sitophilus oryzae: possible horizontal transfer of a pectinase gene from fungi to weevils. J Insect Sci. 2003, 3: 24-

    Article  PubMed Central  PubMed  Google Scholar 

  66. Kim N, Choo YM, Lee KS, Hong SJ, Seol KY, Je YH, Sohn HD, Jin BR: Molecular cloning and characterization of a glycosyl hydrolase family 9 cellulase distributed throughout the digestive tract of the cricket Teleogryllus emma. Comp Biochem Physiol B Biochem Mol Biol. 2008, 150: 368-376. 10.1016/j.cbpb.2008.04.005.

    Article  PubMed  Google Scholar 

  67. Pauchet Y, Saski CA, Feltus FA, Luyten I, Quesneville H, Heckel DG: Studying the organization of genes encoding plant cell wall degrading enzymes in Chrysomela tremula provides insights into a leaf beetle genome. Insect Mol Biol. 2014, 23 (3): 286-300.

    CAS  PubMed  Google Scholar 

  68. Chauhan R, Jones R, Wilkinson P, Pauchet Y: Cytochrome P450‒encoding genes from the Heliconius genome as candidates for cyanogenesis. Insect Mol Biol. 2013, 22 (5): 532-540. 10.1111/imb.12042.

    Article  CAS  PubMed  Google Scholar 

  69. Pauchet Y, Heckel DG: The genome of the mustard leaf beetle encodes two active xylanases originally acquired from bacteria through horizontal gene transfer. Proc Biol Sci. 2013, 280 (1763): 20131021-10.1098/rspb.2013.1021.

    Article  PubMed Central  PubMed  Google Scholar 

  70. Busconi M, Berzolla A, Chiappini E: Preliminary data on cellulase encoding genes in the xylophagous beetle, Hylotrupes bajulus (Linnaeus). Int Biodeterior Biodegradation. 2014, 86: 92-95.

    Article  CAS  Google Scholar 

  71. Kirsch R, Wielsch N, Vogel H, Svatoš A, Heckel DG, Pauchet Y: Combining proteomics and transcriptome sequencing to identify active plant-cell-wall-degrading enzymes in a leaf beetle. BMC Genomics. 2012, 13: 587-10.1186/1471-2164-13-587.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  72. Watanabe H, Tokuda G: Animal cellulases. Cell Mol Life Sci. 2001, 58 (9): 1167-1178. 10.1007/PL00000931.

    Article  CAS  PubMed  Google Scholar 

  73. Ortego F: Physiological Adaptations of the Insect Gut to Herbivory. Arthropod-Plant Interactions: Novel Insights and Approaches for IPM. Edited by: Smagghe G, Diaz I. 2012, Dordrecht, NY: Springer, 14: 75-88.

    Chapter  Google Scholar 

  74. Pauchet Y, Freitak D, Heidel-Fischer HM, Heckel DG, Vogel H: Immunity or digestion: glucanase activity in a glucan-binding protein family from Lepidoptera. J Biol Chem. 2009, 284 (4): 2214-2224. 10.1074/jbc.M806204200.

    Article  CAS  PubMed  Google Scholar 

  75. Sakon J, Irwin D, Wilson DB, Karplus PA: Structure and mechanism of endo/exocellulase E4 from Thermomonospora fusca. Nat Struct Biol. 1997, 4 (10): 810-818. 10.1038/nsb1097-810.

    Article  CAS  PubMed  Google Scholar 

  76. Zhou W, Irwin DC, Escovar-Kousen J, Wilson DB: Kinetic studies of Thermobifida fusca Cel9A active site mutant enzymes. Biochemistry. 2004, 43 (30): 9655-9663. 10.1021/bi049394n.

    Article  CAS  PubMed  Google Scholar 

  77. Pickersgill R, Smith D, Worboys K, Jenkins J: Crystal structure of polygalacturonase from Erwinia carotovora ssp. carotovora. J Biol Chem. 1998, 273 (38): 24660-24664. 10.1074/jbc.273.38.24660.

    Article  CAS  PubMed  Google Scholar 

  78. Barrett T, Suresh CG, Tolley SP, Dodson EJ, Hughes MA: The crystal structure of a cyanogenic beta-glucosidase from white clover, a family 1 glycosyl hydrolase. Structure. 1995, 3 (9): 951-960. 10.1016/S0969-2126(01)00229-5.

    Article  CAS  PubMed  Google Scholar 

  79. Juncosa M, Pons J, Dot T, Querol E, Planas A: Identification of active site carboxylic residues in Bacillus licheniformis 1,3-1,4-beta-D-glucan 4-glucanohydrolase by site-directed mutagenesis. J Biol Chem. 1994, 269 (20): 14530-14535.

    CAS  PubMed  Google Scholar 

  80. Scully ED, Hoover K, Carlson JE, Tien M, Geib SM: Midgut transcriptome profiling of Anoplophora glabripennis, a lignocellulose degrading cerambycid beetle. BMC Genomics. 2013, 14: 850-10.1186/1471-2164-14-850.

    Article  PubMed Central  PubMed  Google Scholar 

  81. O'Neil ST, Emrich SJ: Assessing De Novo transcriptome assembly metrics for consistency and utility. BMC Genomics. 2013, 14: 465-10.1186/1471-2164-14-465.

    Article  PubMed Central  PubMed  Google Scholar 

  82. Jiang H, Wong WH: Statistical inferences for isoform expression in RNA-Seq. Bioinformatics. 2009, 25 (8): 1026-1032. 10.1093/bioinformatics/btp113.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  83. Vijay N, Poelstra JW, Kunstner A, Wolf JB: Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol Ecol. 2013, 22 (3): 620-634. 10.1111/mec.12014.

    Article  CAS  PubMed  Google Scholar 

  84. Scharf ME, Karl ZJ, Sethi A, Sen R, Raychoudhury R, Boucias DG: Defining host-symbiont collaboration in termite lignocellulose digestion. Commun Integr Biol. 2011, 4 (6): 761-763. 10.4161/cib.17750.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  85. Pauchet Y, Kirsch R, Giraud S, Vogel H, Heckel DG: Identification and characterization of plant cell wall degrading enzymes from three glycoside hydrolase families in the cerambycid beetle Apriona japonica. Insect Biochem Mol Biol. 2014, 49: 1-13.

    Article  CAS  PubMed  Google Scholar 

  86. Pauchet Y, Wilkinson P, Chauhan R, Ffrench-Constant RH: Diversity of beetle genes encoding novel plant cell wall degrading enzymes. PLoS One. 2010, 5 (12): e15635-10.1371/journal.pone.0015635.

    Article  PubMed Central  PubMed  Google Scholar 

  87. Tokuda G, Watanabe H, Hojo M, Fujita A, Makiya H, Miyagi M, Arakawa G, Arioka M: Cellulolytic environment in the midgut of the wood-feeding higher termite Nasutitermes takasagoensis. J Insect Physiol. 2012, 58 (1): 147-154. 10.1016/j.jinsphys.2011.10.012.

    Article  CAS  PubMed  Google Scholar 

  88. Kashyap DR, Vohra PK, Chopra S, Tewari R: Applications of pectinases in the commercial sector: a review. Bioresour Technol. 2001, 77 (3): 215-227. 10.1016/S0960-8524(00)00118-8.

    Article  CAS  PubMed  Google Scholar 

  89. Dashtban M, Schraft H, Qin W: Fungal bioconversion of lignocellulosic residues; opportunities & perspectives. Int J Biol Sci. 2009, 5 (6): 578-

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  90. Li LL, McCorkle SR, Monchy S, Taghavi S, van der Lelie D: Bioprospecting metagenomes: glycosyl hydrolases for converting biomass. Biotechnol Biofuels. 2009, 2: 10-10.1186/1754-6834-2-10.

    Article  PubMed Central  PubMed  Google Scholar 

  91. Oppert C, Klingeman WE, Willis JD, Oppert B, Jurat-Fuentes JL: Prospecting for cellulolytic activity in insect digestive fluids. Comp Biochem Physiol Biochem Mol Biol. 2010, 155: 145-154. 10.1016/j.cbpb.2009.10.014.

    Article  Google Scholar 

  92. Góngora-Castillo E, Buell CR: Bioinformatics challenges in de novo transcriptome assembly using short read sequences in the absence of a reference genome sequence. Nat Prod Rep. 2013, 30 (4): 490-500. 10.1039/c3np20099j.

    Article  PubMed  Google Scholar 

Download references


Thanks to Dr. Steve Heydon and the Bohart Museum of Entomology staff and volunteers for insect rearing, to the Dr. Patrik Nosil Lab of the University of Sheffield, UK, for their work on the Timema genome and making it available online, and to Dr. Yannick Pauchet of the Max Planck Institute of Chemical Ecoogy in Jena, Germany, for general advising. MS was supported by the National Science Foundation (USA) Graduate Research Fellowship under Grant No. 1148897, the UC Davis & Humanities Graduate Research Fellowship in Entomology for 2012–13 and 2013–14, and the McBeth Memorial Scholarship. The research was supported by funds from the University of California at Davis. Thanks also to the editors and reviewers who contributed for their feedback.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Matan Shelomi.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

Authors MS and BRJ conceived the study and contributed equally to the work. LSK and BRJ provided resources and expertise. MS and LSK reared the insects. WCJ, JA, and BRJ carried out the RNA Seq and de novo transcriptome assembly. MS, WCJ, JA, and BRJ analyzed and interpreted data. All authors read and approved the final manuscript.

Matan Shelomi and Brian R Johnson contributed equally to this work.

Electronic supplementary material


Additional file 1: Table S1: KEGG Table for the most highly expressed genes of the Aretaon asperrimus midgut. Top 500 most highly expressed genes in each of the anterior midgut (AMG) and posterior midgut (PMG) used. (XLS 27 KB)


Additional file 2: Table S2: KEGG Table for the most highly expressed genes of the Extatosoma tiaratum midgut. Top 500 most highly expressed genes in each of the anterior midgut (AMG) and posterior midgut (PMG) used. (XLS 30 KB)


Additional file 3: Table S3: KEGG Table for the most highly expressed genes of the Medauroidea extradentata midgut. Top 500 most highly expressed genes in each of the anterior midgut (AMG) and posterior midgut (PMG) used. (XLS 25 KB)


Additional file 4: Table S4: KEGG Table for the most highly expressed genes of the Peruphasma schultei midgut. Top 500 most highly expressed genes in each of the anterior midgut (AMG) and posterior midgut (PMG) used. (XLS 42 KB)


Additional file 5: Table S5: KEGG Table for the most highly expressed genes of the Ramulus artemis midgut. Top 500 most highly expressed genes in each of the anterior midgut (AMG) and posterior midgut (PMG) used. (XLS 24 KB)


Additional file 6: Table S6: KEGG Table for the most highly expressed genes of the Sipyloidea sipylus midgut. Top 500 most highly expressed genes in each of the anterior midgut (AMG) only. (XLS 11 KB)


Additional file 7: Table S7: Representative isotigs (sequences) for each PCWDE isogroup for the phasmatodea. Data is from the full midgut transcriptomes with short sequences removed. Transcripts were identified as PCWDEs based on amino acid alignment to known proteins from the NCBI database. * = The sequence contained a complete open-reading frame. Other sequences could have been truncated at the 5’ or 3’ or both. † = The sequence or another in that isogroup (_c#) had a poly-A tail. (XLS 22 KB)


Additional file 8: Table S8: – Species and NCBI Accession No's for cellulase (Beta-1,4-endoglucanase) proteins compared to phasmid proteins. Sequences with *were not included in the alignment figure (Figure 4) due to poor or absent overlap with other sequences at that region. Table S9. – Species and NCBI Accession No's for pectinase (polygalacturonase) proteins compared to phasmid proteins. Sequences with *were not included in the alignment figure (Figure 5) due to poor or absent overlap with other sequences at that region. Table S10. – Species and NCBI Accession No's for cellobiase (beta-glucosidase) proteins compared to phasmid proteins. Sequences with *were not included in the alignment figure (Figure 7) due to poor or absent overlap with other sequences at that region. Table S11. – Species and NCBI Accession No's for beta-1,3-glucanase proteins compared to phasmid proteins. Sequences with *were not included in the alignment figure Figure 9) due to poor or absent overlap with other sequences at that region. (XLS 60 KB)


Additional file 9: Table S9: Number of Peruphasma schultei midgut transcriptome sequences with successful Blast search, mapping, and annotation. (TXT 163 bytes)


Additional file 10: Figure S1: Species distribution for top-hit Blast results of P. schultei midgut transcriptome. (PDF 4 KB)


Additional file 11: Table S12: The most differentially expressed genes (DEGs) in the P. schultei AMG. Includes genes found in both or only one tissue type. Means measured in RPKM. PPDE = Posterior Probability of Differential Expression. Annotations made with Blast2GO. (XLS 3 MB)


Additional file 12: Table S13: The most differentially expressed genes (DEGs) in the P. schultei PMG. Includes genes found in both or only one tissue type. Means measured in RPKM. PPDE = Posterior Probability of Differential Expression. Annotations made with Blast2GO. (XLS 3 MB)


Additional file 13: Table S14: The most highly expressed genes of the P. schultei AMG. Identified as genes with expression levels ten times greater than the mean for that section. Means measured in RPKM. Annotations made with Blast2GO. (XLS 3 MB)


Additional file 14: Table S15: The most highly expressed genes of the P. schultei PMG. Identified as genes with expression levels ten times greater than the mean for that section. Means measured in RPKM. Annotations made with Blast2GO. (XLS 3 MB)

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shelomi, M., Jasper, W.C., Atallah, J. et al. Differential expression of endogenous plant cell wall degrading enzyme genes in the stick insect (Phasmatodea) midgut. BMC Genomics 15, 917 (2014).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: