- Research article
- Open Access
Comparative RNA seq analysis of the New Zealand glowworm Arachnocampa luminosa reveals bioluminescence-related genes
BMC Genomicsvolume 16, Article number: 825 (2015)
The New Zealand glowworm is the larva of a carnivorous fungus gnat that produces bioluminescence to attract prey. The bioluminescent system of the glowworm is evolutionarily distinct from other well-characterised systems, especially that of the fireflies, and the molecules involved have not yet been identified. We have used high throughput sequencing technology to produce a transcriptome for the glowworm and identify transcripts encoding proteins that are likely to be involved in glowworm bioluminescence.
Here we report the sequencing and annotation of the first transcriptome of the glowworm, and a differential analysis of expression from the glowworm light organ compared with non-light organ tissue. The analysis identified six transcripts encoding proteins that are potentially involved in glowworm bioluminescence. Three of these proteins are members of the ANL superfamily of adenylating enzymes, with similar amino acid sequences to that of the luciferase enzyme found in fireflies (31 to 37 % identical), and are candidate luciferases for the glowworm bioluminescent system. The remaining three transcripts encode putative aminoacylase, phosphatidylethanolamine-binding and glutathione S-transferase proteins.
This research provides a basis for further biochemical studies into how the glowworm produces light, and a source of genetic information to aid future ecological and evolutionary studies of the glowworm.
In caves and forested river gorges across New Zealand, an abundance of star-like lights can be seen when it is dark. The creature responsible for this light, known locally as a glowworm, is the carnivorous larva of a fungus gnat, Arachnocampa luminosa. The larva lies in a mucous hammock, hanging long sticky silk threads below it like fishing lines, and luminesces from a small light organ located at the end of its tail. Small flying insects are attracted by the light, become entangled in the sticky lines, and are then consumed by the glowworm [1, 2].
Despite their common name, glowworms are actually a type of fly (order Diptera, family Keroplatidae) . Confusingly, some firefly larvae and adults are also referred to as glowworms, but the well-studied bioluminescent beetles (including fireflies, click beetles and railroad worms) are members of a different order – Coeloptera (superfamily Elateroidea) . Diptera and Coleoptera diverged about 330 million years ago, with no known bioluminescent species intervening [5–7], which indicates that bioluminescence evolved independently in these insects.
The ability to bioluminesce has evolved many times. One recent estimate suggests that bioluminescence has evolved at least 40 times across extant organisms, possibly more than 50 times, when counting the number of distinct light-producing chemical mechanisms across monophyletic lineages . Despite their differences and separate evolutionary origins, all bioluminescent systems that have been studied produce light by oxidation of a light-emitting substrate (generically referred to as a luciferin) catalyzed by an enzyme (a luciferase). Luciferase enzymes have extremely varied structures, mechanisms and substrate specificities .
Researchers have studied the biochemistry used by Arachnocampa to produce light [9–11], but many details remain elusive, including the identities of the luciferase and luciferin. Although both the glowworm and firefly systems use ATP to bioluminesce, the chemistries of bioluminescence in the two creatures are distinct. When mixed, the substrate for the firefly system (luciferin) and the glowworm luciferase do not produce light , which implies that they use different luciferase enzymes and substrates. The physiology of the glowworm light organ is also unique. It is made up of the swollen distal tips of the four Malpighian tubules [12, 13]. Malpighian tubules are part of the insect excretory system, analogous to the vertebrate kidneys, and are not part of any other insect bioluminescence system .
The ability of firefly, bacteria (such as Vibrio harveyi) and sea pansy (Renilla) luciferases and luciferins to produce easily-measured light has led to the use of these systems as tools in biomedical and biological research, for example as genetic reporters, drug screening assays, bioluminescence imaging, and assays for the presence of ATP or calcium [15, 16]. Understanding the molecular basis of light generation in glowworms will not only expand our understanding of how bioluminescence works, but may also lead to novel bioluminescent applications. For example, the glowworm system uses a different luciferin substrate and produces a different bioluminescent spectra maximum than currently used bioluminescent research tools, therefore the glowworm system could be used in conjunction with existing bioluminescent applications, for example, to detect several compounds in one sample or monitor expression of multiple genes simultaneously.
One approach to revealing the molecular physiology of glowworm bioluminescence is to sequence the transcriptome of the organism. Transcriptome sequencing is a relatively cheap and easy way of providing genome-wide sequence data for non-model organisms for which no genome sequence data is available [17, 18].
Sequencing on a genome-wide scale is still a new approach for investigating bioluminescence. Although reported high-throughput sequencing of bioluminescent creatures so far include four genomes (the ctenophore Mnemiopsis leidyi , various strains of V. fischeri [20–22], the luminous mushroom Mycena chloropho , and the European brittle star Amphiura filiformis ), and several transcriptomes (the European brittle star , cypridinid ostracods or seed shrimp , the Oplophorid shrimp , the luminous mushroom , and the dinoflagellate Lingulodinium polyedrum [27, 28]), none of these studies have reported any detailed analyses specifically looking at the differences in transcripts produced between luminous and non-luminous tissues. It should be noted that a limited transcriptional profiling study has been carried out on A. luminosa . The authors of this study sequenced 537 cDNAs that were constructed from light organ expressed sequence tags using the Sanger method, and did not include a comparison between tissues. The same research group also carried out a small transcriptional survey on a cDNA library from Macrolampis sp2 firefly lanterns .
Here we present the first in-depth sequencing of polyadenylated RNAs from the New Zealand glowworm, and a detailed analysis at the transcriptomic level of luminescent versus nonluminescent tissue, through which we have identified six proteins that are likely to be involved in bioluminescence. This research provides a basis for biochemical studies into how the glowworm produces light, and a source of genetic information to aid future ecological and evolutionary studies of the glowworm.
Results and discussion
We carried out two separate transcriptome sequencing experiments, on biological replicates, using the most appropriate bioinformatic processing and analyses for each approach. We first performed 454 GS-FLX sequencing because there was no genomics information available for the species, and then used Illumina sequencing to validate those results, and to provide greater sequencing depth to support a differential gene expression analysis. RNA was extracted from the light organ and from the rest of the body (non-light organ). The two experiments differed in the use of biological replicates, sequencing platform, and analysis software (Additional file 1: Figure S1).
Sequencing, read cleaning and de novo assembly
mRNA was isolated from two samples: one prepared from non-light organ tissue (~200 mg of tissue from eight glowworms) and one prepared from light organ tissue (~420 mg of tissue from 172 glowworms). After cDNA synthesis, 454 GS-FLX sequencing for each library was carried out on one-half of a pico-titer plate. A total of about 1.12 million high quality reads were obtained, amounting to almost 400 Mbp (Table 1; Fig. 1). Reads, including singletons, were merged from both libraries for de novo transcript assembly, which was carried out using CLC Genomics Workbench 5.1, and yielded a reference transcriptome of 18,794 transcripts, with an N50 of 897 (Table 2).
We sequenced six cDNA libraries using the Illumina HiSeq-2000 sequencer, each prepared from either the light organ or non-light organ mRNA of three individual glowworms. The increased number of biological replicates in the second experiment provided it with increased statistical power for the subsequent RNA-seq analysis . The Illumina platform also provided us with information on strand origin, i.e. from which of the two DNA strands a given RNA transcript was derived. This information can increase the percentage of alignable reads, thereby improving transcript reconstruction compared with non-strand specific data of known strand origin .
The Illumina machine generated 37.1 to 44.7 million pairs of 200 base length paired-end reads for each library (Table 3). In order to improve the quality of the data, we removed adapter sequences, trimmed low quality bases (Q < 20) from both ends of reads and discarded reads less than 50 bases in length. The resulting 29.5 to 35.7 million high quality reads per library (79 to 80 % of total raw reads) were merged together for de novo transcript assembly using the Trinity package, producing a de novo assembly containing 187,289,921 bases and a total of 196,766 transcripts (Table 2). A graph of the contig length distribution highlights the differences in contig size and number between the two assemblies (Fig. 2).
Functional annotation and gene ontology of the glowworm transcriptome
In order to provide descriptions of the functions and properties of as many glowworm gene products as possible, BLASTX searches were performed for each transcript above 1 kb in size from the Illumina/Trinity transcriptome assembly against the Drosophila RefSeq non-redundant database at the National Centre for Biotechnology Information (NCBI). 38,259 of the 55,997 transcripts matched to a known protein within the database with a score of E < 10−6). 32 % of the transcripts (17,738) did not have a BLAST result. Some of these sequences may not have a homolog in Drosophila; others may be from non-coding RNA sequences that are polyadenylated. Some sequences may be unmatched due to assembly errors. We used Blast2GO to assign Gene Ontology (GO) terms to transcripts with BLASTX matches. 34,332 transcripts were assigned GO terms (Fig. 3).
We also carried out an analysis to find out which metabolic processes predominate in the light organ. First of all we mapped all of the Illumina light organ reads back to the full Illumina transcriptome, then used an FDR adjusted enrichment test for the light organ over the whole transcriptome with 10 % as the cutoff (so that transcripts with less than 10 % of their length mapped by fragments were counted as not expressed). We used Blast2GO to assign GO terms to this subset of light organ transcripts. The results showed that there are a large range of metabolic processes occurring in the light organ, although none that could be confidently linked to its bioluminescent function (see Additional file 2: Table S1).
Read mapping, measurement of gene expression and differential expression analyses
For each experiment we assembled a database of transcripts de novo, since there is no reference genome available for Arachnocampa.
90.78 and 87.41 % of reads from the 454 light organ and non-light organ libraries, respectively, uniquely mapped to transcripts in the 454/CLC Genomics Workbench assembly. We normalized expression values for each sample by scaling so that the median values were made equal. A comparison of normalized expression values provided us with a list of 34 transcripts found at ≥ 10-fold higher levels in the light organ sample than in the rest of the glowworm body sample (Table 4). In addition, 4616 different transcripts were expressed in the light organ sample, but not in the rest of the glowworm body sample. The top 30 of these, ranked according to normalised expression levels, are listed in Table 5.
We mapped reads for each of the six samples in this experiment onto the Illumina/Trinity reference transcriptome assembly. 91 % of the reads from all six samples were matched to transcripts from the assembly set. Since there were three separate samples for each tissue type, we performed inter-sample normalization, so that cross sample comparison could be carried out without being biased by sequencing depth. We used a TMM method (trimmed-mean of M values) to accommodate the difference in sequencing depth between replicates by finding a scaling factor for each library that minimizes the log-fold changes between the samples. The scaling factor is used to normalise the expression values for each sample.
Differential expression analysis was carried out on transcript expression values from all six Illumina-sequenced samples after adjusting for library size. Only six transcripts were considered to be expressed to a significantly higher level in the light organ than in the non-light organ tissue (false discovery rate of < 0.1); these are listed in Table 6.
Differential expression analysis validation
When comparing the sequences of differentially expressed transcripts in the two experiments, it became apparent that the six transcripts from the Illumina experiment were all found to be in the top eight ranked transcripts from the 454 experiment (Table 7), with one transcript (annotated as 62762 in the Illumina experiment) listed twice in the top eight. The close matching of the results from these two separate experiments, despite differences in samples, sequencing platforms and analytical algorithms, effectively validates these results.
The number of differentially expressed transcripts is small in both analyses. This may be because there are Malpighian tubules in both tissue types: the light organ tissue sample contains the modified Malpighian tubule tips that produce light, and the non-light organ sample contains the remaining non-luminescent parts of the Malpighian tubules. Presuming the modified Malpighian tubule tips retain some of the same functionality as the remainder of the tubules, both samples would share many of the same transcripts.
Functional annotation of genes highly expressed in the light organ
The proteins encoded by these six transcripts are likely to play important roles in the bioluminescence of the glowworm, assuming that the transcript levels are equivalent to protein levels. In order to find out what these roles might be, we searched for annotated sequence homologs in the publically available non redundant Genbank protein sequence database using the BLASTX algorithm. The resulting annotations will need to be confirmed using biochemical investigation of both the native and recombinant forms of the encoded proteins.
64201-seq1, 64201-seq2, 62762
The proteins encoded by these three transcripts all display the signature motifs of the ANL superfamily of adenylating enzymes  (Fig. 4). The three main subfamilies in the ANL superfamily include the Acyl-CoA synthetases, the NRPS adenylation domains, and the beetle (firefly) Luciferase enzymes. Despite catalyzing a wide range of different overall reactions, ANL enzymes all use a two-step reaction where the first step is always the activation of a carboxylate substrate with ATP to form an adenylate intermediate. These three glowworm proteins are very similar to firefly luciferase (31 to 37 % identical with luciferase from Photinus pyralis). The glowworm proteins are very similar and two appear to be isoforms. 64201_seq1 and 64201_seq2 are 79 % identical (the Trinity software labelled them with the same number and ‘seq1’ or ‘seq2’ to reflect this), and 62762 is 43 and 45 % identical to 64201_seq2 and 64201_seq1, respectively. The differences between these three proteins do not appear to be due to alternative splicing, since the differences in sequence are scattered throughout the proteins (Fig. 4).
The protein encoded by this transcript has 44 % amino acid sequence identity with human aminoacylase-1. Amino acids can be stored with an acyl group attached to their N-terminus, which makes them more stable. Aminoacylase-1 removes the acyl group, making the amino acid available for protein synthesis and other metabolic roles [34, 35], and acts specifically on mercapturic acids (S-conjugates of N-acetyl-L-cysteine) and neutral aliphatic N-acyl-α-amino acids. If the glowworm luciferin is revealed to be a derivative of an amino acid, as it appears to be for the unrelated Siberian luminous earthworm Fridericia heliota , it is possible that the glowworm might store the luciferin substrate in a stable acylated form and 60014 could deacetylate the substrate, making it available for the bioluminescent reaction. There is, however, no other evidence for this involvement, and the identity of the luciferin is unclear at this stage.
This transcript encodes a member of the phosphatidylethanolamine-binding protein superfamily. Proteins in this family generally play roles in modulating cellular signaling . At a molecular level they have been found to bind nucleotides, opioids and phosphatidylethanolamine. They can also bind kinases, leading to inhibition or activation of signalling pathways. From this information we can infer that 51138 may be involved in the modulation of a bioluminescence signaling pathway.
This protein is a member of the glutathione S-transferase (GST) family of proteins, which play an important role in insecticide resistance and protection against oxidative stress. Members of this family catalyze the conjugation of reduced glutathione to a variety of exogenous and endogenous hydrophobic electrophiles for the purpose of detoxification . 56768 has closest homology with the Delta class of insect GSTs, and has 45 % identity with a mosquito Delta GST that has DDT dehydrochlorinase activity . Therefore glutathione may play a role in glowworm bioluminescence, either directly or indirectly, although it is unclear at this stage what this role might be.
Evolution of bioluminescence in glowworms
It is interesting that the firefly and glowworm luciferase enzymes belong to the same family of enzymes (assuming that one or more of the three ANL proteins from the light organ are confirmed biochemically to be the glowworm luciferases). In one way it is not unexpected as both are from the same class (Insecta), but because of the evolutionary distance between the glowworm and the firefly, and because the two bioluminescent systems use different substrates , it seemed likely that the proteins would differ significantly. However, these differential transcriptomic analyses, and the observation that both the glowworm and the firefly use ATP as a cofactor , suggest that the two luciferases may indeed have evolved from the same ancestral non-bioluminescent enzyme. Other explanations for both glowworms and fireflies having a similar luciferase enzyme are unlikely, such as horizontal gene transfer, or the possibility that the insect ancestral to both Coleoptera and Diptera was bioluminescent and passed the bioluminescent gene to both fireflies and glowworms but not to the vast majority of other non-bioluminescent Coleoptera and Diptera that exist today. The independent evolution of the two bioluminescent enzymes from a non-bioluminescent ancestral acyl-CoA synthetase enzyme is most likely, because it has been well established that the beetle luciferases evolved from non-bioluminescent acyl-CoA synthetases [40–42]. In addition, acyl-CoA synthetases from two nonluminous insects, the mealworm Zophobas morio  and the fruit fly Drosophila melanogaster , have been shown to bioluminesce faintly in the presence of either the firefly luciferin substrate or a synthetic analog of this luciferin. A firefly luciferase-like gene has also been identified from an animal unrelated to insects: the siliceous sponge Suberites domuncula ; however, this claim needs to be confirmed , especially since the native luciferase protein itself has not yet been isolated from the sponge tissue.
We carried out a phylogenetic analysis of the three glowworm luciferase-like proteins along with known firefly luciferases and other luciferase-like proteins from various insects. Notably, the three glowworm proteins are grouped together and are placed closer to proteins from non-bioluminescent dipteran insects (D. melanogaster and A. aegypti) than to firefly luciferases (see Fig. 5 and Table 8).
It is uncertain why there are three isoforms of the firefly-luciferase-like protein expressed in the glowworm light organ. The firefly Luciola cruciata expresses three isoforms, where only one has bioluminescent activity , so it is possible that the glowworm may also follow a similar pattern of gene duplication. Gene duplication followed by enzyme diversification and the development of novel functions has been a feature previously observed in vertebrate acyl-CoA synthetase genes . There appear to be many duplication events in the evolution of the acyl-CoA synthetase enzyme family in insects. We used tBLASTn to search the total glowworm transcriptome from the Illumina experiment and found that the three glowworm proteins have numerous paralogs: between 65 and 68 transcripts have 20 to 50 % identity with the three protein sequences. In addition, the limited transcriptional profiling studies by Silva et al. mentioned previously found about 1 % of the 537 A. luminosa light organ cDNAs and 2 % of the 572 firefly lantern cDNAs sequenced were members of the acyl-CoA synthetase enzyme family (also known as AMP/CoA-ligases) [29, 30].
This report describes the first high-throughput transcriptome sequencing of the New Zealand glowworm, and the use of comparative RNAseq to identify genes expressed in luminescent tissue that are involved in bioluminescence. Two separate differential expression analyses have identified six genes that are significantly more highly expressed in the light organ than in non-luminescent tissue. These genes encode putative aminoacylase, GST and phosphatidylethanolamine-binding proteins, and, most notably, three proteins that are homologs of firefly luciferase, at least one of which we expect to be the glowworm luciferase.
Interestingly, in the Silva et al. study of glowworm light organ cDNAs , one of the members of the acyl-CoA synthetase enzyme family sequenced showed 44–45 % identity with railroad worm luciferases, and 2.1 % of the transcripts sequenced were putative GST proteins. This, combined with our research underlines the potential importance of these sequences in glowworm bioluminescence.
Further biochemical studies are required to confirm that one or more of the candidate luciferases are able to produce light. These studies should include two approaches: firstly, produce these proteins in a recombinant form and assay them for bioluminescent activity using the native luciferin substrate extracted from the glowworm; secondly, isolate the native luciferase protein(s) from the light organ tissue, using the same assay to track bioluminescent activity, and identify the isolated protein(s) using mass spectrometry and the transcriptome database generated in the current study. If the candidate luciferase(s) is/are confirmed, then this will show that this enzyme has independently evolved the ability to produce light at least twice in extant organisms, in New Zealand glowworms and in fireflies, but with different substrates. Once the identity of the glowworm enzyme has been confirmed, and the chemistry of the glowworm substrate has been revealed, potential applications of the novel glowworm bioluminescence system can be explored.
Sample collection and RNA extraction
A. luminosa is an endemic species and is of particular interest to the indigenous Maori people of New Zealand. Therefore local Maori were consulted about this research through Te Komiti Rakahau ki Kāi Tahu (the Ngāi Tahu Research Consultation Committee). As glowworms are invertebrates, the approval of an ethics committee was not required. A. luminosa is not protected, and no experiment was performed on living animals.
A. luminosa specimens were collected from locations around New Zealand, including Dunedin, Te Anau and Palmerston North. Live glowworms were snap frozen by being placed on foil above dry ice, and then stored until required at −80 °C. Using a razor, light organs were removed from the glowworm bodies while still frozen. Light organ samples contained only white light organ tissue, and non-light organ samples contained the rest of the glowworm body and head (darker tissue) with white light organ tissue removed entirely. Tissues were homogenised with TRIzol® Reagent (Invitrogen) and a glass dounce homogeniser; total RNA was extracted using UltraPure™ (Phenol: Chloroform:Isoamyl Alcohol; Invitrogen), and then further purified using the RNeasy Kit (Qiagen). RNA quantification and integrity assessment were performed for each sample on an RNA chip (Bioanalyzer 2100, Agilent Technologies).
cDNA library construction, sequencing and quality control
Two hundred μg of non-light organ RNA (prepared from ~200 mg of non-light organ tissue from eight glowworms) and 65 μg of light organ RNA (prepared from ~420 mg of light organ tissue from 172 glowworms) were sent to the 454 Life Sciences Sequencing Center (Roche, Branford, Connecticut, USA) for mRNA enrichment, cDNA library construction (including mRNA fragmentation using a zinc chloride solution, cDNA synthesis using random hexamer primers and adaptor ligation) and subsequent 454 sequencing on a Roche GS FLX Titanium Genome Sequencer. Each sample was sequenced on one half of a picotiter plate. Low quality reads were discarded (quality limit of 0.05). We also removed: ambigious nucleotides (maximum of 5 nucleotides allowed), terminal nucleotides (1 each from both the 5′ and 3′ ends), adapter sequences and sequences less than 20 nucleotides in length.
Light organ and non-light organ total RNA was prepared separately from three individual glowworms (0.6 to 1.2 μg of RNA for each light organ sample, and 8.8 to 27.0 μg of RNA for each non-light organ sample). The six samples, each with an RNA Integrity Number (RIN) of over 6, were sent to the Otago Genomics and Bioinformatics Facility, where mRNA was isolated using oligo-dT magnetic beads, and cDNA libraries were constructed using the Illumina TruSeq Stranded mRNA Sample Preparation Kit. Sequencing was carried out on an Illumina HiSeq-2000 sequencer, generating 100 bp paired-end reads. Each sample was run on one eighth of a sequencing lane. Adaptor sequences were trimmed from reads using fastq-mcf , and bases with low quality phred scores trimmed (cut-off phred score of Q20). Adapter and quality trimmed reads less than 50 nucleotides in length were discarded using the SolexaQA package . Reads were quality assessed using FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc).
De novo assembly
We used the CLC Genomics Workbench v5.1 (http://www.clcbio.com) to de novo-assemble 454 reads from the combined light organ and non-light organ samples into a single, non-redundant contig dataset. Singletons were merged into the contig set. Assembly parameters were set at the program default settings.
Reads combined from all light organ and non-light organ samples were assembled using the Trinity software package . Assembly parameters were adjusted for Illumina stranded paired end sequencing (i.e. –left and –right for both R1 and R2 using --SS_lib_type RF for the stranded library type).
Read mapping and measurement of gene expression
We used the CLC Genomics Workbench to map reads from each of the 454 light organ and non-light organ samples back onto the CLC Genomics Workbench/454 assembly dataset. Expression of genes was represented by the abundance of reads uniquely mapped to each contig in the assembly. Abundance was expressed as RPKM (reads per kilobase of exon model per million mapped reads) or normalized expression values (the contig read counts were scaled so that the median values were made equal).
For each of the six Illumina samples, reads were mapped onto the Trinity/Illumina assembly using Bowtie 2 , and transcript abundance for each sample was estimated using the RSEM package . Per sample relative abundance was estimated using FPKM (fragments per kilobase of transcript per million fragments mapped), EC (expected counts) and TPM (transcripts per million). For cross-sample comparisons, we normalized raw read counts for these samples using the TMM method (trimmed-mean of M values; a weighted trimmed mean of the log expression ratios ) as implemented in the edgeR package  (http://www.bioconductor.org).
Differential expression analyses
To identify candidate bioluminescence-related genes from our datasets, we compared normalized transcript abundance values between corresponding light organ and non-light organ samples.
In order to detect the differentially expressed genes between the two 454-sequenced samples, a two-group differential analysis was performed in CLC Genomics Workbench on normalized expression values.
Genes with significantly different levels of expression between light organ and non-light organ samples (a single factor design) were identified using the quantile-adjusted Conditional Maximum Likelihood method (qCML) in the edgeR package . We considered transcripts to be differentially expressed to a significant level at a false discovery rate at < 0.1.
Transcripts that were expressed at significantly higher levels in light organ compared with non-light organ tissue were annotated by searching the GenBank non-redundant database at the NCBI (http://www.ncbi.nlm.nih.gov) using the BLASTX software  with default parameters. Automated annotation of the transcriptome database was carried out using BLASTX searches to the NCBI non-redundant database within the Blast2GO program v2.8.0 (http://www.blast2go.com) . An E-value cut-off of 1E−6 was used to identify similar annotated proteins from which function could be inferred.
Multiple alignments were produced using MUSCLE [58, 59]. A phylogenetic analysis was carried out in MrBayes , using the WAG model of protein evolution  which was found to be the most appropriate after tests with mixed models. Monte-Carlo Markov chains were run for 1 000 000 generations with the initial 25 % of trees discarded as burn-in. The consensus trees produced were visualized with CLC Genomics Workbench.
Availability of supporting data
The raw sequence data from the Illumina experiment was submitted in FASTQ format to the NCBI Sequence Read Archive (SRA) database (accessions: SRR2241413, SRR2283829, SRR2283830, SRR2283831, SRR2283975 and SRR2284246) and are also accessible through the BioProject accession PRJNA290397 (http://www.ncbi.nlm.nih.gov/bioproject/290397). The Illumina/Trinity transcriptome assembly has been deposited at the NCBI Transcriptome Shotgun Assembly (TSA) database and is available at the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI under the accession GDQV00000000.
Reads per kilobase per million
Willis RE, White CR, Merritt DJ. Using light as a lure is an efficient predatory strategy in Arachnocampa flava, an Australian glowworm. J Comp Physiol B. 2010;181(4):477–86.
Meyer-Rochow VB. Glowworms: a review of Arachnocampa spp. and kin. Luminescence. 2007;22(3):251–65.
Baker CH, Graham GC, Scott KD, Cameron SL, Yeates DK, Merritt DJ. Distribution and phylogenetic relationships of Australian glow-worms Arachnocampa (Diptera, Keroplatidae). Mol Phylogenet Evol. 2008;48(2):506–14.
Amaral DT, Arnoldi FGC, Rosa SP, Viviani VR. Molecular phylogeny of neotropical bioluminescent beetles (Coleoptera: Elateroidea) in southern and central Brazil. Luminescence. 2014;29(5):412–22.
Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346(6210):763–7.
Haddock SHD, Moline MA, Case JF. Bioluminescence in the sea. Ann Rev Mar Sci. 2010;2(1):443–93.
Lloyd JE. Bioluminescence and communication in insects. Annu Rev Entomol. 1983;28(1):131–60.
Sharpe ML, Hastings JW, Krause KL. Luciferases and light-emitting accessory proteins: structural biology. In: eLS. John Wiley & Sons, Ltd; 2014, doi: 10.1002/9780470015902.a0003064.pub2.
Lee J. Bioluminescence of the Australian glow-worm Arachnocampa richardsae Harrison. Photochem Photobiol. 1976;24(3):279–85.
Shimomura O, Johnson FH, Haneda Y. Observation on the biochemistry of luminescence in the New Zealand glowworm Arachnocampa luminosa. In: Johnson FH, Haneda Y, editors. Bioluminescence in progress. Princeton: Princeton University Press; 1966. p. 487–94.
Viviani VR, Hastings JW, Wilson T. Two bioluminescent diptera: the North American Orfelia fultoni and the Australian Arachnocampa flava. Similar niche, different bioluminescence systems. Photochem Photobiol. 2002;75(1):22–7.
Gatenby JB. Notes on the New Zealand glow-worm, Bolitophila (Arachnocampa) luminosa. Trans Proc R Soc N Z. 1959;87:291–314.
Green LF. The fine structure of the light organ of the New Zealand glow-worm Arachnocampa luminosa (Diptera: Mycetophilidae). Tissue Cell. 1979;11(3):457–65.
Shimomura O. Bioluminescence: chemical principles and methods. Singapore: World Scientific Publishing Co. Pte. Ltd.; 2006.
Frank L, Krasitskaya V. Application of enzyme bioluminescence for medical diagnostics. Adv Biochem Eng Biotechnol. 2014;144:175–97.
Michelini E, Cevenini L, Calabretta M, Calabria D, Roda A. Exploiting in vitro and in vivo bioluminescence for the implementation of the three Rs principle (replacement, reduction, and refinement) in drug discovery. Anal Bioanal Chem. 2014;406(23):5531–9.
Ekblom R, Galindo J. Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 2011;107(1):1–15.
Finseth FR, Harrison RG. A comparison of next generation sequencing technologies for transcriptome assembly and utility for RNA-Seq in a non-model bird. PLoS One. 2014;9(10):e108550.
Schnitzler CE, Pang K, Powers ML, Reitzel AM, Ryan JF, Simmons D, et al. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes. BMC Biol. 2012;10:107.
Ruby EG, Urbanowski M, Campbell J, Dunn A, Faini M, Gunsalus R, et al. Complete genome sequence of Vibrio fischeri: a symbiotic bacterium with pathogenic congeners. Proc Natl Acad Sci U S A. 2005;102(8):3004–9.
Mandel MJ, Stabb EV, Ruby EG. Comparative genomics-based investigation of resequencing targets in Vibrio fischeri: focus on point miscalls and artefactual expansions. BMC Genomics. 2008;9:138.
Wang Z, Hervey WJ, Kim S, Lin B, Vora GJ. Complete genome sequence of the bioluminescent marine bacterium Vibrio harveyi ATCC 33843 (392 [MAV]). Genome Announc. 2015;3(1):e01493–01414.
Tanaka Y, Kasuga D, Oba Y, Hase S, Sato K, Oba Y, et al. Genome sequence of the luminous mushroom Mycena chlorophos for searching fungal bioluminescence genes. In: 18th International Symposium on Bioluminescence and Chemiluminescence: 2014; Uppsala, Sweden. 2014. Luminescence 29 (Suppl. 21): 47–48.
Delroisse J, Flammang P, Mallefet J. Marine luciferases: are they really taxon-specific? A putative luciferase evolved by co-option in an echinoderm lineage. In: 18th International Symposium on Bioluminescence and Chemiluminescence: 2014; Uppsala, Sweden. 2014. Luminescence 29 (Suppl. 21): 15–16.
Leung NL, Taketa DA, Torres E, Oakley TH. Origin of luciferase genes in cypridinid ostracods (Crustacea). In: Society for Integrative and Comparative Biology: 2013; San Francisco, CA. 2013. P1.163.
Wong JM, Pérez-Moreno JL, Chan T-Y, Frank TM, Bracken-Grissom HD. Phylogenetic and transcriptomic analyses reveal the evolution of bioluminescence and light detection in marine deep-sea shrimps of the family Oplophoridae (Crustacea: Decapoda). Mol Phylogenet Evol. 2015;83:278–92.
Roy S, Letourneau L, Morse D. Cold-induced cysts of the photosynthetic dinoflagellate Lingulodinium polyedrum have an arrested circadian bioluminescence rhythm and lower levels of protein phosphorylation. Plant Physiol. 2014;164(2):966–77.
Roy S, Beauchemin M, Dagenais-Bellefeuille S, Letourneau L, Cappadocia M, Morse D. The Lingulodinium circadian system lacks rhythmic changes in transcript abundance. BMC Biol. 2014;12(1):107.
Silva JR, Amaral DT, Hastings JW, Wilson T, Viviani VR. A transcriptional and proteomic survey of Arachnocampa luminosa (Diptera: Keroplatidae) lanterns gives insights into the origin of bioluminescence from the Malpighian tubules in Diptera. Luminescence. 2015, doi: 10.1002/bio.2850 [Epub ahead of print].
Viviani VR, Carmargo IA, Amaral DT. A transcriptional survey of the cDNA library of Macrolampis sp2 firefly lanterns (Coleoptera: Lampyridae). Comp Biochem Physiol Part D Genomics Proteomics. 2013;8(1):82–5.
Liu Y, Zhou J, White KP. RNA-seq differential expression studies: more sequence or more replication? Bioinformatics. 2013;30(3):301–4.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013;8(8):1494–512.
Gulick AM. Conformational dynamics in the acyl-CoA synthetases, adenylation domains of non-ribosomal peptide synthetases, and firefly luciferase. ACS Chem Biol. 2009;4(10):811–27.
Lindner HA, Lunin VV, Alary A, Hecker R, Cygler M, Ménard R. Essential roles of zinc ligation and enzyme dimerization for catalysis in the aminoacylase-1/M20 family. J Biol Chem. 2003;278(45):44496–504.
Perrier J, Durand A, Giardina T, Puigserver A. Catabolism of intracellular N-terminal acetylated proteins: involvement of acylpeptide hydrolase and acylase. Biochimie. 2005;87(8):673–85.
Petushkov VN, Dubinnyi MA, Tsarkova AS, Rodionova NS, Baranov MS, Kublitski VS, et al. A novel type of luciferin from the Siberian luminous earthworm Fridericia heliota: structure elucidation by spectral studies and total synthesis. Angew Chem Int Ed. 2014;53(22):5566–8.
Al-Mulla F, Bitar MS, Taqi Z, Yeung KC. RKIP: Much more than Raf Kinase inhibitory protein. J Cell Physiol. 2013;228(8):1688–702.
Ranson H, Hemingway J. Mosquito glutathione transferases. Methods Enzymol. 2005;401:226–41.
Ranson H, Prapanthadara LA, Hemingway J. Cloning and characterization of two glutathione S-transferases from a DDT-resistant strain of Anopheles gambiae. Biochem J. 1997;324(1):97–102.
Viviani VR. The origin, diversity, and structure function relationships of insect luciferases. Cell Mol Life Sci. 2002;59(11):1833–50.
Wood KV. The chemical mechanism and evolutionary development of beetle bioluminescence. Photochem Photobiol. 1995;62(4):662–73.
Oba Y, Sato M, Ohta Y, Inouye S. Identification of paralogous genes of firefly luciferase in the Japanese firefly, Luciola cruciata. Gene. 2006;368:53–60.
Viviani VR, Prado RA, Arnoldi FCG, Abdalla FC. An ancestral luciferase in the malpighi tubules of a non-bioluminescent beetle! Photochem Photobiol Sci. 2009;8(1):57–61.
Mofford DM, Reddy GR, Miller SC. Latent luciferase activity in the fruit fly revealed by a synthetic luciferin. Proc Natl Acad Sci U S A. 2014;111(12):4443–8.
Müller WEG, Kasueske M, Wang X, Schröder HC, Wang Y, Pisignano D, et al. Luciferase a light source for the silica-based optical waveguides (spicules) in the demosponge Suberites domuncula. Cell Mol Life Sci. 2009;66(3):537–52.
Hastings JW. Progress and perspectives on bioluminescence: from luminous organisms to molecular mechanisms. In: Roda A, editor. Chemiluminescence and bioluminescence: past, present and future. Cambridge, UK: The Royal Society of Chemistry; 2011. p. 91–112.
Lopes-Marques M, Cunha I, Reis-Henriques M, Santos M, Castro LF. Diversity and history of the long-chain acyl-CoA synthetase (Acsl) gene family in vertebrates. BMC Evol Biol. 2013;13(1):271.
Aronesty E. ea-utils: Command-line tools for processing biological sequencing data. 2011. http://code.google.com/p/ea-utils. Accessed 2014.
Cox MP, Peterson DA, Biggs PJ. SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinform. 2010;11:485.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9.
Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform. 2011;12:323.
Kvam VM, Liu P, Si Y. A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. Am J Bot. 2012;99(2):248–56.
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11(3):R25.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005;21(18):3674–6.
Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.
Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform. 2004;5:113.
Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19(12):1572–4.
Whelan S, Goldman N. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001;18(5):691–9.
Branchini BR, Magyar RA, Murtiashaw MH, Portier NC. The role of active site residue arginine 218 in firefly luciferase bioluminescence. Biochemistry. 2001;40(8):2410–8.
Branchini BR, Southworth TL, Murtiashaw MH, Boije H, Fleet SE. A mutagenesis study of the putative luciferin binding site residues of firefly luciferase. Biochemistry. 2003;42(35):10429–36.
Branchini BR, Southworth TL, Murtiashaw MH, Wilkinson SR, Khattak NF, Rosenberg JC, et al. Mutagenesis evidence that the partial reactions of firefly bioluminescence are catalyzed by different conformations of the luciferase C-terminal domain. Biochemistry. 2005;44(5):1385–93.
Inouye S. Firefly luciferase: an adenylate-forming enzyme for multicatalytic functions. Cell Mol Life Sci. 2010;67(3):387–404.
Xu X, Gopalacharyulu P, Seppänen-Laakso T, Ruskeepää A-L, Aye CC, Carson BP, et al. Insulin signaling regulates fatty acid catabolism at the level of CoA activation. PLoS Genet. 2012;8(1):e1002478.
Oba Y, Sato M, Inouye S. Cloning and characterization of the homologous genes of firefly luciferase in the mealworm beetle, Tenebrio molitor. Insect Mol Biol. 2006;15(3):293–9.
Funding for this study was provided by the Foundation for Research, Science and Technology (now the New Zealand Ministry of Business, Innovation, and Employment) and the Division of Health Sciences, University of Otago. New Zealand government subsidised sequencing services were provided by New Zealand Genomics Limited (www.nzgenomics.co.nz).
The authors declare that they have no competing interests.
MS carried out sample collection and RNA extraction, wrote the paper and revised the manuscript. PD contributed to experiment design, carried out bioinformatics analysis of the sequence data and revised the manuscript. GG contributed to experiment design and carried out bioinformatics analysis of the sequence data. KK proposed and supervised the study, contributed to experiment design and revised the manuscript. All authors read and approved the final manuscript.