The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects

Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history.


Background
Insect mitochondrial genomes (mtDNAs) are usually a double-strand circular molecule containing 13 proteincoding genes (PCGs), 22 transfer RNAs (tRNAs) and 2 ribosomal RNAs, i.e., the small and large subunits [1]. A notable exception is represented by the mtDNA of the human body louse Pediculus humanus, which consists of 18 miniature circular chromosomes containing one to three genes [2]. Insect mtDNA size is typically in the range of 14 to 20 kbp, but genomes exceeding these values are known [3]. In these latter cases, the size increase is connected to the expansion of the main noncoding region, named the AT-rich or control region [3]. The gene order is a feature of mtDNA that can provide important evidence to establish evolutionary relationships among taxa at high and/or low taxonomic levels [4,5]. The most widespread gene order in insect mtDNAs is shown in Figure 1. This gene order, initially determined for Drosophila yakuba mtDNA, is considered ancestral for the entire class Insecta [5][6][7]. Several gene orders departing from the ancestral arrangement exist and can be restricted to single species as in the strepsipteran Mengenilla australiensis or common to whole groups of higher taxonomic rank, ranging from family (e.g., Culicidae) to order (e.g., Lepidoptera) (Figure 1) [8][9][10][11][12]. In the latter cases, the peculiar gene order becomes an important marker to delimit taxonomic boundaries and constitutes a major signature of mtDNA. To date, full-length mtDNAs in insects have been sequenced in an imbalanced manner, with whole orders still lacking any published information or being poorly represented in data banks. In this respect, the neuropterid orders Megaloptera, Neuroptera (also named Planipennia), and Raphidioptera are underrepresented taxa. Only recently have sequences become available for these taxa [13][14][15]. Currently, full-length mtDNAs are known for the megalopterans Corydalus cornutus and Protohermes concolorus, both members of the Corydalidae family, and Sialis hamata, of the Sialidae [13][14][15]. Three mtDNAs are available for the neuropterans Ascaloptynx appendiculatus (Ascalaphidae), Ditaxis latistyla (Mantispidae), and Polystoechotes punctatus (Polystoechotidae) [13,14]. A complete mtDNA sequence exists for the raphidiopteran Mongoloraphidia harmandi (Raphidiidae) [14]. Finally, partial mtDNA sequences are available for the neuropteran Myrmelon immaculatus (Myrmelontidae) and the raphidiopteran Agulla sp. (Raphidiidae) [13].
Here we determined the complete mtDNA sequence of the ascalaphid owlfly Libelloides macaronius (Scopoli, 1763) (Neuroptera, Ascalaphidae). The newly determined  [53]. Black name, order for which there is no available full-length mtDNA; dark-green name, order exhibiting the insect ancestral gene order; variously colored name, order that has a gene order different from the insect ancestral gene order. **, ***, several distinct gene orders differing from the insect ancestral gene order. In the case of Diptera, only the family Culicidae has a gene order different from the ancestral arrangement. In the case of Coleoptera the gene order change is restricted to Mordella atrata [14]. Genes coded on the α strand (clockwise orientation) are light/ deep-green. Genes coded on the β strand (counterclockwise orientation) are red or orange. Alternation of colors was applied for clarity. Gene names are the standard abbreviations used in this paper; tRNA genes are indicated by the single-letter IUPAC-IUB abbreviation for their corresponding amino acid in the figure. On the branch leading to the relative holometabolous order, the genomic change characterizing the gene order is shown. In case of multiple gene orders within a single order, the changes were not depicted to avoid overcrowding. genome was compared with available neuropterid mtDNAs (complete and partial) as well as with genomes obtained from other holometabolous insects (Table 1) [6,. The analyses were performed following a comparative and evolutionary perspective.

Genome structure
The mtDNA genome of L. macaronius is a circular molecule 15,890 bp long ( Figure 2). It contains the entire set of 37 genes usually present in animal mtDNAs and 14 non-coding portions. Three of these spanning at least 15 bp are labeled as intergenic spacers (s1-s3) in Figure 2.
Atp8 and atp6 are the only genes, located on the same strand, that overlap. The atp8-atp6 partial superimposition is a common feature of neuropterid mtDNAs and, in general, of animal mtDNAs [1, 13,15]. Other genes located on the same strand (both α and β) are contiguous (e.g., nad4L and nad4) or separated by a few nucleotides (e.g., nad3 and trnA). Genes on opposite strands overlap (e.g., nad2 and trnC), are contiguous (e. g., trnQ and trnM), or are separated by some nucleotides (e.g., trnT and trnP) or intergenic spacers (e.g., trnS2 and nad1) ( Figure 2). Similar patterns can be observed in other neuropterid mtDNAs. It must be noted that nad4L and nad4 are contiguous in A. appendiculatus and L. macaronius (both members of the family Ascalaphidae). Conversely, in all other sequenced neuropterid mtDNAs, nad4L and nad4 overlap by 7 bp [13,15].
The gene order of L. macaronius mtDNA differs from the insect ancestral gene order in the translocation of trnC, which is located upstream of trnW in contrast to its traditional location downstream of trnW (Figures 1, 2). This translocation is shared by and exclusive to all neuropteran mtDNAs that are fully or partly sequenced to date. Conversely, Megaloptera and Raphidioptera exhibit the typical insect ancestral gene order as shown in Figure  1 [13,15]. This figure presents the distribution of various gene orders in holometabolous insects, mapped on the tree obtained by Wiegmann et al. [53], but alternative phylogenetic hypotheses exist ([e.g., [54]]).
Further work is needed to establish whether the trnC translocation represents a strong molecular signature for the entire Neuroptera clade.

Base composition and AT-and GC-skews in the mitochondrial genomes of endopterygote insects
The nucleotide-compositional behavior of mitochondrial genomes is usually investigated through the calculation of the three parameters: AT-skew, GC-skew, and A+T content (AT%), expressed in percentages (e.g., [8,12]). Here these parameters were studied for 84 complete endopterygotan mtDNAs (Table 2; Additional file 1, Table S1) and combined, for the first time, in a single three-dimensional scatter-plot analysis ( Figure 3). Arbitrary partitioning schemes, further detailed below, were applied to the estimated values to facilitate the presentation/discussion of the results. Each empirical clustering scheme was defined in order to maximize the appreciation of the uneven distribution of various mtDNAs within the continuum of variation of the considered parameter.

A+T content
The range of variation of AT% spanned from 66.99 to 87.41, with the average value equal to 77.77. Different mtDNAs exhibited an uneven distribution within this range. Five principal clusters (A1-A5) were obtained that divided the whole range of AT% into intervals encompassing 5% of the variation.

AT-skews
The average AT-skew was 0.039, while the whole range of variation extended from -0.047 to 0.248. The endopterygote mtDNAs were grouped into six clusters (B1-B6) through the partitioning of the whole range of ATskews in intervals spanning 0.050.

GC-skews
The average GC-skew value was -0.213, and all 84 mtDNAs displayed negative skews. The whole set of genomes was grouped into six clusters (C1-C6), each spanning 0.050 of the range of GC-skew variation.

Overall comparison
When the three variables analyzed above were considered together, some patterns emerged. Even if the  parameter distribution followed a continuum range of variation, most of the sequenced endopterygote mtDNAs exhibited A+T contents in the range 70%-80%, with the highest density in the interval 75%-80%; ATskews in the range 0.003-0.010; and GC-skews in the interval -0.300-0.150. This behavior can be represented by the cluster formula: A2/A3,B2/B3,C3-C5. Most neuropterid mtDNAs are included in this formula, with the exceptions of P. punctatus and P. concolorus (B1). While no genome exhibited extreme values for any of the three parameters, some mtDNAs markedly departed from the most represented ranges for the behavior of one or two parameters. Very low AT content was coupled with high AT-skew in some coleopteran mtDNAs. Very high A+T contents (>84%) were restricted to a small group of hymenopterans and dipterans. Highly negative GC-skews (< -0.300) were found mostly in hymenopterans, plus the moth O. lunifer and the beetle T. castaneum.
The comparisons performed at the intra-ordinal level showed that in several cases the extremes of the ranges of variation of the analyzed parameters were occupied by the same mtDNAs. This behavior could be the result of limited taxon sampling. Thus, further data are necessary to verify if the observed patterns are consistent over a broader taxon sampling.
Generally, the A+T content, AT-skew, and GC-skew exhibited complex behaviors that do not appear to be tightly linked, at least at the order level.

Start/stop codons in neuropterid PCGs
The inferred start/stop codons for neuropterid PCGs are listed in Table 3. ATN, GTG, TTG, and GTT are accepted canonical mitochondrial start codons for invertebrate mtDNs and most of PCGs exhibited these start codons [64].
The cox1 start codon for L. macaronius was inferred to be the triplet ACG, a non-canonical start codon for mtDNA genes. Despite the fact that cox1 is one of more conserved mitochondrial genes, the identification of its start codon has been problematic (see [13]). An ATC codon located 27 bp upstream of ACG, within the trnY gene that is encoded on the β strand, could be a start codon for L. macaronius cox1. However, amino acid sequence comparison with a broad selection of endopterygote insects led us to the more probable conclusion that the ACG triplet was the start codon for L. macaronius cox1. An ACG start codon for cox1 has also been  Table 2.
proposed for M. immaculatus [13]. The non-canonical start codons TTA and TCG have previously been inferred for cox1 in A. appendiculatus and P. punctatus [13]. The remaining neuropterid cox1 genes exhibited an ATT start codon. Non-canonical start codons in cox1 have been inferred repeatedly for endopterygote insects (e.g., [12,65,66]). Stop codons for the 13 neuropterid PCGs were almost invariably complete (TAA) or incomplete TAa/Taa. The only exception was observed in nad1, where the two ascalaphids L. macaronius and A. appendiculatus and P. punctatus exhibited TAG as stop codon.

Codon usage and amino acids abundance
All analyzed neuropterid mtDNAs use the standard invertebrate mitochondrial genetic code. This result was determined using the server GenDecoder [67,68]. The behavior of codon families in PCGs was investigated and the results are presented in Figures 4,5. First codons, as well as stop codons, complete and incomplete, were excluded from the analysis to avoid biases due to unusual putative start codons and incomplete stop codons. The abundance of codon families and relative synonymous codon usage (RSCU) were calculated together with the other statistics listed below.
The A+T content, AT-skew, G+C content and GCskew, which were calculated in pooled-α + β, pooled-α and pooled-β PCGs, exhibited behaviors identical, similar to or contrasting with those observed for the whole genomes (see Additional file 2, Table S2).
The total number of codons was different in analyzed species ( Figure 4; Additional file 2, Table S2). The codon families exhibiting more than 50 codons per thousand cds varied (7-9) and the majority of them were two-fold degenerate in codon usage and AT-rich. In almost all mtDNAs, the four most represented codon families were Leu2, Ile, Phe and Met with Leu2 being the most common. The only exception was P. punctatus, in which Ser2 was the fourth most abundant codon family, while Met family occupied only the seventh position. Leu, Ile, Ser and Phe were the most abundant amino acids, and their sum varied from 42.75% (L. macaronius) to 46.62% (M. harmandi) of the total number of amino acids.
Four-fold degenerate codon usage was A/T biased in the third position, and T was the preferred nucleotide, except in the Arg, Gly and Ser1 codon families, where A was the most common nucleotide in the third position. Two-fold degenerate codon usage showed a similarly biased pattern, with A/T favored over G/C in the third position ( Figure 5). Both patterns were in agreement with the AT-biased content exhibited by pooled-α+β PCGs.
In pooled-α PCGs, the four-fold degenerate codon usage followed a pattern similar to that of the whole PCG set, with the exceptions of Ala, Pro and Val codon families ( Figure 5 vs. Additional file 3, Figure S1).
In the pooled-β PCGs, the four-fold degenerate codon usage partly mirrored the pattern observed for pooled-α +β PCGs. Among the discrepancies, the loss of synonymous GC-rich codons was the easiest to detect ( Figure  5 vs. Additional file 4, Figure S2).
The behavior of pooled-α+β PCGs in different species varied with respect to the usage of the whole set (62) of codons encoding amino acids. All missed codons were GC-rich. Codon usage could not be directly linked to total number of codons nor to A+T content ( Figure 5; Additional file 2, Table S2). The reduction in codon usage could be possibly linked to AT/GC-skews. Indeed, P. punctatus, which used the smallest number of codons, also exhibited the most negative AT-skew α+β and the highest GC-skew α+β . The number of used codons decreased when PCGs encoded on a single strand were analyzed (Additional file 2, Table S2; Additional file 3, Figure S1; Additional file 4, Figure S2). The pattern observed for pooled-α+β PCGs was mostly mirrored by pooled-α PCGs. On the β strand, codon reductions appeared to be species specific. The lost codons belonged to GC-rich and comparatively A-rich codonfamilies. Finally, on the β strand, the pattern of codon  CCT  GTT  ACT  TCT  AAT GAT  CAT  GAA  ATA  TGA  TTA  CAA  GCC CGC  CCC  GTC  ACC  TCC  AAC GAC  CAC  GAG  ATG  TGG  TTG  CAG  GCA CGA  CCA  GTA  ACA  TCA  TGT  AAA  ATT  TTT  TAT   GCG CGG   CTT  CTC  CTA  CTG  CCG  GTG   AGT  AGC  AGA  AGG  ACG  TCG   GGT  GGC  GGA  GGG  TGC  AAG  ATC  TTC  loss appeared to be linked to the single/combined effects of GC content and AT-/GC-skews. The total number of α-codons vs. the total number of β-codons (ratio α/β ) was calculated for every species (Table 4). The overall (o) average (avg) ratio α/β (1.59 ± 4 × 10 -3 ) was used as a measure to identify possible distributional biases of different codon families. Several codon families exhibited ratio α/β values close to oavgratio α/β , suggesting that their distribution is not markedly diverse in the PCGs coded on different strands. Codon families His, Leu1, Pro, and Thr exhibited ratio α/ β values strongly biased toward the α strand. Conversely, codon families Cys, Leu2 and Ser1 presented ratio α/β values clearly β strand-oriented.

GCT CGT
The most obvious bias was due to structural/functional requirements of PCGs. This point is well represented by the distribution of Cys codon family. Cys was the rarest amino acid in mtDNA proteins and was more abundant in polypeptides (NADH subunits) encoded on the β strand than in the proteins encoded on the α strand. The Cys residues can produce intra-and interchain disulfide bridges (S-S) that play a fundamental structural role. Disulfide bonds and interactions between pairs of Cys residues have been postulated to have an important structural role in NADH subunits of primate mtDNA [69]. The abundance of Cys residues in neuropterid NADHs, and in general in endopterygotes, appears to agree with findings derived from primates. Several marked departures from oavg-ratio α/β can be explained in terms of structural/functional requirements of single polypeptides and are not explored here.
A compositional bias effect was evident in the behavior of Leu1 (CTN) and Leu2 (TTR) codon families. As mentioned above, Leu was the most abundant amino acid in the PCGs. The pooled superfamily Leu = Leu1 + Leu2 exhibited an avg-ratio α/β = 1.28 ± 0.11. This result supports a moderate β strand bias in the distribution of Leu. However, Leu1 and Leu2 codon families exhibited an opposite bias in term of strand distribution. These strand biases can be linked to the AT-skew and GC content shown by pooled-α and pooled-β PCGs.

Behavior of PCGs in neuropterid mtDNAs
Different parameters have been used to evaluate the performance of mitochondrial single/combined PCGs for phylogenetic, population genetics, and taxon molecular identification purposes (e.g., [11,12]). The suitability of different PCGs is linked to, influenced by the evolutionary history of each molecular marker. The calculation of p-distance values has been used to test the level of divergence within each PCG (e.g., [11]). Recently, codon usage has also been investigated to study the differentiation of mitochondrial PCGs [12]. In the present paper, the evolutionary behavior of PCGs, at both the nucleotide and amino acid levels, was studied in a combined analysis based on the calculation/estimation of p-distances, effective number of codon usage (ENC), phylogenetic signal and the saturation of the substitution process present in single/combined PCGs.
The percent of fully resolved quartets (%FRQ), obtained in a maximum likelihood mapping analysis, was used as a measure of phylogenetic signal [70]. The saturation of the substitution process was established by calculating the m-slope of the regression line passing through the origin in a scatter plot analysis of p-distances vs. GTR+I+G -mtREV24/JTT + G distances (see Methods for further details).
Parameters were estimated for pooled-α+β, pooled-α, pooled-β, and single PCGs at both the nucleotide and amino acid levels, with the exception of atp8. This gene was not analyzed individually because it contains a limited number of codons (< 55) to provide reliable estimations of ENC [71]. Conversely, atp8 was included in the pooled-α+β/-α sets. The results of the global analysis are summarized in Figure 6 and Table 5.
As a general rule, proteins, individually or pooled, exhibited levels of saturation of the substitution process lower than those observed in the nucleotidic counterparts. This result can be explained in terms of the Oavg, overall (o) average(avg) ratio α/β ; avg-ratio α/β , average-ratio α/β ; STD, standard deviation substitution process, which permits more alternatives in the amino acid replacements. The m-slope values show that all PCGs/proteins experienced some degree of saturation of the substitution process. This phenomenon was particularly marked at the third codon positions of some PCGs, where the GTR+I+G-distance values largely exceeded 1 (data not shown), a result strongly supporting a complete saturation of the substitution process [72].
Pooled-α+β PCGs/proteins and pooled-α PCGs/proteins showed the best phylogenetic signals and clearly surpassed pooled-β PCGs/proteins. Because pooled-α+β and pooled-α proteins also exhibited a less pronounced saturation of the substitution process than their nucleotide counterparts, they represented the first choice as markers to investigate the phylogenetic relationships among Neuropterida at the inter-order taxonomic level.   Table 5. A) ENC vs. avg-p-distance vs. m-slope; B) ENC vs. avg-p-distance vs. % of fully resolved quartets; C) % of fully resolved quartets vs. m-slope vs. avg-p-distance.
Pooled and single PCGs having the best phylogenetic signals usually exhibited a combination of high avg-ENCs, small avg-pds, and high m-slopes in their protein counterparts (e.g., cox1), but exceptions occurred. Conversely, PCGs with low phylogenetic signals usually showed high avg-pds and low values of m-slope at both the nucleotide and amino acid levels (e.g., nad2).
Reduction of phylogenetic signal with respect to the best-performing PCGs can be explained in terms of three different processes/properties: a) accelerated evolution; b) retarded evolution; and c) possession of an intrinsically lower phylogenetic signal.
A marked accelerated evolution mechanism can be invoked for nad2, nad3 and nad6. These PCGs showed avg-pds, at both the nucleotide and amino acid levels, which were clearly higher than those observed in cox1, cox3 and cob. Furthermore, their m-slope values favored the occurrence of a more pronounced saturation of the substitution process, a finding corroborated by the lowest values of fully resolved quartets. All of these results are compatible only with an acceleration of the substitution process. Thus, nad2, nad3 and nad6 should be excellent markers for studies performed at medium/very low taxonomic levels (i.e., family, genus, species).
The mechanism of accelerated evolution, but less pronounced than that described above, can be invoked also for atp6, nad5 and nad4. These genes showed avg-pds higher than cox1, cox3 and cob, at both the nucleotide and amino acid levels. They exhibited phylogenetic signals higher than nad2, nad3 and nad6 but lower than the best-performing PCGs. Finally, their m-slope values were comparable to those of cox1, cox3 and cob. All these findings suggest that atp6, nad5 and nad4 evolved slightly faster than the best-performing PCGs and that these genes can be optimally used at the order or lower taxonomic levels.
The nad1 and cox2 genes had avg-pds similar to or even smaller than cox1, cox3 and cob, while their levels of saturation of the substitution process appeared to be less marked than PCGs exhibiting the highest %FRQ values. These results suggest that the reduction of phylogenetic signal in cox2 and nad1 was probably due to a slower rate of divergence of these two genes in comparison with cox1, cox3 and cob. Thus, they should be considered favored markers at high/very high taxonomic ranks.
The behavior of nad4L provided contradictory evidence. The low %FRQ and relatively high avg-pd value suggested that this is a fast evolving gene and that the loss of phylogenetic signal was due to an accelerated rate of multiple substitutions. However, the m-value was the highest obtained in the analysis and did not favor a strong saturation of the substitution process in the evolution of this gene. Thus, it could be that even evolving in a way not dissimilar to cox1, cox3 and cob, nad4L has an intrinsically lower phylogenetic signal due to its reduced size.
Taken together, these data indicate that the 13 PCGs encoded by mtDNA exhibit complex evolutionary trajectories that can be investigated using the combination of Pcg-PROTEIN, protein coding gene/relative protein; %FRQ, percent of fully resolved quartets obtained in a maximum likelihood mapping analysis [70]; m-slope, slope of the regression line passing through the origin in a scatter plot analysis of p-distances vs. GTR+I+G -mtREV24/JTT + G + G distances distances; avg-pd, average p-distance; avg-ENC, average-ENC (effective number of codon usage); n.a., not applicable calculation.   Figure 7 Secondary structure of tRNA families (trnA-trnL1) in neuropterid mtDNAs. The nucleotide substitution pattern for each tRNA family was modeled using as reference the structure determined for L. macaronius.   Figure 8 Secondary structure of tRNA families (trnL2-trnV) in neuropterid mtDNAs. The nucleotide substitution pattern for each tRNA family was modeled using as reference the structure determined for L. macaronius. the parameters considered above. No single parameter seems able to fully describe the behavior of PCGs.

Comparison of tRNA structure in neuropterid genomes
The mtDNA of L. macaronius contained all 22 tRNAs genes found in other neuropterid mtDNAs and typical of animal mitochondrial genomes [1]. All neuropterid mtDNAs had 14 α-strand tRNAs and 8 β-strand tRNAs (Figures 1, 2). Among the 22 tRNAs, only trnS1 did not exhibit the common cloverleaf structure, due to the absence of DHU stem. Loss of this stem in trnS1 is a feature common to many insect mtDNAs (e.g., [12]).
The results of comparative analyses on secondary structures of neuropterid tRNAs are provided in Figures 7, 8. Multiple alignments of tRNAs genes were produced, and the percent of identical nucleotides (%INUC) was calculated for each group of orthologous sequences (Additional file 5, Table S3). The pattern of nucleotide conservation was markedly α strand-biased. Indeed, trnG, trnK, trnN, trnM and trnE, which showed the highest levels of nucleotide conservation (%INUC ≥ 70), were all located on the α strand.
The tRNAs closest to the control region (i.e., trnV and trnI), which is one of the start points of mtDNA replication, were not among most/less conserved. The same applied to trnS2, immediately upstream of the s2 spacer, where another center of mtDNA replication was located (see below). The abundance of codon families did not appear to be linked to the level of tRNA conservation. Indeed, none of the most abundant codon families (Leu2, Ile and Phe) exhibited the highest %INUCs.
Irrespective of the level of conservation, some tRNAs presented mismatched pairs in stems (e.g., A-A in the anticodon stem of trnK; T-T in the acceptor stem of trnM). These mismatches are common in arthropod mtDNAs and can be corrected through editing processes (e.g., [73]) or may represent unusual pairings [74]. Further research at the transcript level is necessary to better characterize this feature in neuropterid mtDNAs.
In the most conserved tRNAs the nucleotide substitutions are largely restricted to TΨC and DHU loops and extra arms (Figures 7, 8), with changes on acceptor and anticodon stems reduced to 0-3 fully compensatory base changes (cbcs) (e.g., G-C vs. A-T in the anticodon stem of trnG) and/or hemi-cbc (e.g., G-T vs. A-T on the acceptor stem of trnM) [75]. Note that in trnG, the most conserved tRNA, it was not possible to model the substitution pattern in the TΨC loop due to a high level of variation among orthologous sequences (Figure 7). With the increased variation, the number of cbcs and hemi-cbcs increased in stems, usually with a more marked substitution process in the TΨC stem (e.g., trnI, trnV). In several tRNAs it was not possible to model properly the substitution patterns in TΨC and DHU loops due to the high level of divergence among sequences. Cbcs and hemi-cbcs were restricted to a single species or characterized taxa at a higher taxonomic rank (family/order). An example of the first type is the G-C pair found in the trnH acceptor harm of P. concolorus, which was mirrored by A-T in all other neuropterids (Figure 7). An example of a full cbc characterizing a unique family is the G-C pair found in the acceptor stem of trnEs of family Ascalaphidae (A. appendiculatus and L. macaronius), while other taxa exhibited the A-T pair (Figure 7). A substitution pattern involving two full cbcs characterizing the trnS1 acceptor arm of C. cornutus and P. concolorus (Corydalidae megalopterans) is another example of high-taxonomic rank cbcs ( Figure  8). The presence of hemi-cbcs and, particularly, full cbcs characterizing taxa at different taxonomic levels underscores the potential phylogenetic value of tRNA sequences, especially when secondary structures are taken into account. Until now, tRNAs have been rarely considered in the study of insect phylogenetic relationships [76,77]. Our results suggest that these markers deserve much more attention and should be more routinely used, as demonstrated by analyses performed in other animal groups (e.g., [78]).
The study of the level of conservation in tRNA families was extended to other endopterygotan orders in which at least two fully sequenced mtDNAs exist. The results of this analysis are summarized in Figure 9.
Neuropterid %INUCs were in some cases very similar (e.g., trnA, trnH, trnM) or even surpassed (trnE, trnG, trnS1) their lepidopteran counterparts (Figure 9). This finding is striking because it indicates that the level of variation among the three neuropterid orders is very low, taking into account that Lepidoptera have very conserved tRNAs among Insecta [11]. Conversely, Hymenoptera and Coleoptera had diverging codon families. Indeed, all hymenopteran tRNAs showed %INUC values ≤ 47.14, while no coleopteran tRNA exhibited %INUC > 58.21. The %INUC scores indicate a clear-cut discrepancy between the morphological diversity observed in neuropterid orders and the low rate of substitution that characterizes the evolution of their codon families. If the level of divergence is considered a measure of taxonomic diversity, then the %INUC values would suggest that Neuroptera, Megaloptera and Raphidioptera should be treated as members of a single order.

Comparison of neuropterid rrnS and rrnL structures
The inferred secondary structure model for rrnS of L. macaronius is provided in Figure 10. This is the first prediction for a neuropterid insect. The overall structure, including three domains and 34 helices (progressively numbered in Figure 10), is largely in agreement with those proposed for other endopterygote orders (i.e. Coleoptera, Diptera, Hymenoptera, and Lepidoptera) [11,17,74,79]. A limited number of non-canonical pairings (e.g., G-A on helix H23-434) were present in the rrnS of L. macaronius. The multiple alignment of neuropterid rrnSs spanned 812 positions and contained 353 conserved (43.47%) and 459 variable (56.53%) positions, respectively. Nucleotide variability among domains and helices was unevenly distributed (Figure 10).
The inferred secondary structure model for the rrnL of L. macaronius is provided in Figure 11. This is the first prediction for a neuropterid insect. The structure of this rrnL largely overlaps with previously published structures for endopterygote insects (i.e. Coleoptera, Diptera, Hymenoptera, and Lepidoptera) [11,17,74,79]. It presents the five canonical domains (I-II, IV-VI) of insect rrnLs that do not contain domain III [74]. In the present paper domain I was fully modeled. This is the first time that a secondary structure has been provided for this domain, which in previous studies has been left unmodeled (e.g., [8,11,56,74,79]). The predicted secondary structure of domain I included eight helices and is consistent among all neuropterid taxa (Additional file 6, Figure S3). The proposed structure can also be fitted to rrnL of D. melanogaster (Additional file 6, Figure S3), further extending the taxonomic range. Domain I contained a limited number of non-canonical pairings (Additional file 6, Figure S3). More studies will be necessary to corroborate the validity of the new structure presented here. Taking into account the secondary structure of domain I, L. macaronius rrnL includes 50 helices. The multiple alignment of neuropterid rrnLs extended over 1376 positions and contains 591 conserved (42.95%) and 785 variable sites (57.05%). Conserved nucleotides were unevenly distributed throughout the rrnL secondary structure, with highest level of invariable positions located on domain IV and lowest level observed in domains I-II ( Figure 11). Hymenoptera Lepidoptera Neuropterida Figure 9 Conservation of tRNA families in Endopterygota mtDNAs. The percentage of identical nucleotides for each tRNA family was inferred from a multiple alignment produced with ClustalW [90] and refined manually, taking into account the secondary structure.

Comparison of neuropterid mtDNA genomic spacers
The L. macaronius mtDNA contained 14 non-coding portions (ncps), extending from 1 to 1049 nucleotides, interspersed through PCGs, tRNAs and rrnL and rrnS genes ( Figure 2). Three of them spanned more than 15 bp and are labeled as spacers s1-s3 in Figures 2 and 12.
More or less similar patterns, but not fully overlapping, were observed in other neuropteran mtDNAs, where the ncps varied from a maximum of 15 (D. biseriata) to a minimum of 10 (P. punctatus), with 13 non-coding portions found in A. appendiculatus. The dobsonflies C. cornutus and P. concolorus exhibited the same number (10) of ncps even if their distributional patterns did not coincide. Conversely, the other megalopteran, S. hamata, had 12 ncps. Finally, the raphidiopteran M. harmandi exhibited the most compact mtDNA, having only six npcs ( Figure 12). The spacers of L. macaronius are described more in detail below. The s1 spacer was located between trnI and trnQ and spanned 55 bp. Spacer s1 was present in all partly/fully sequenced mtDNAs available for Neuroptera, i.e., A. appendiculatus, P. punctatus, M. immaculatus and D. biseriata [13,14], present paper)]. It was also found in the megalopteran S. hamata, while it was absent in the other megalopterans C. cornutus and P. concolorus and in the raphidiopteran M. harmandi (Figure 12) [13][14][15]. The L. macaronius s1 can be aligned confidently only with the counterpart found in A. appendiculatus.
The s1 sequences of L. macaronius and A. appendiculatus shared a CCCCCC repeat located near the boundary with trnI and the CAA(A/G)TTAA(A/C)TAAAT (TA/GT)A(C/T)GCA motif adjacent to trnQ. A. appendiculatus and L. macaronius belong to the same family, Ascalaphidae. Thus, it seems that s1 sequences, when present, diverge very fast among different families of the same order.
The L. macaronius s2 was 20 bp long and was located between trnS2 and nad1. The s2 spacer was present in all partly or fully sequenced neuropterid mtDNAs R S1 S1 S1 S1 S1 S1 S1 S1  Figure 12 Distribution of non-coding portions in neuropterid mtDNAs. The non-coding portions are presented as cyan/yellow dots. s1-s3, spacers 1-3. Gene nomenclature is as in Figures 1 and 2. Genomes, genes and non-coding portions are not drawn to scale, for reasons of clarity.
The unknown portions of partial mtDNAs are gray.
( Figure 12) [ [13][14][15], present paper]. The s2 spacer is supposed to contain [13] the binding site for the DmTFF bidirectional transcription termination factor [80]. Spacer s2 is a common feature of insect mtDNAs (e.g., [12]). The L. macaronius s3 spacer was composed of a 1049bp AT-rich region. The A+T content was 84.46%. This finding implies that s3 contained 10% more A+T than the whole α strand. Conversely, s3 was much less ATskewed than the complete α strand (0.029 vs. 0.072). Strings of repeated single nucleotides (T) 3-8 , (A) 3-11 , (C) 2-5 , (G) 2-5 characterized the s3 sequence. Furthermore, the abundance of A/T was mirrored by the presence of repeated motifs containing mostly/only A/T, a common feature of insect control regions [81]. These motifs were represented by shorter identical strings as well as longer consensus patterns sharing 75% of identical positions. The motif AT was the most abundant, occurring 192 times. ATA occurred 70 times, ATAT 29, AATAATTA 5, and ATATAAATA 6. Finally, five distinct A/T-rich 12-mers were repeated twice each (data not shown). When a 75% minimum identity was allowed among repeated motifs, two strings were identified, each containing 37 nucleotides and extending, respectively, from base 551 to base 591 and from 811 to 847. Their consensus sequence was ATATTA (C/T) ATAT (GT/AA) ATA (AT/TG) TA (T/A) TTATTA (A/T) TATATAAAT. Thus, the L. macaronius AT-rich region contains several repeated motifs of different sizes. A similar trend also occurs in other neuropterids (data not shown).
Pair-wise as well as multiple alignments between/ among neuropterid AT-rich regions were performed by applying very relaxed alignment parameters (i.e., gapopening = 1 vs. standard 15; and gap-extending = 3 vs. standard 6.6). This strategy was applied to maximize the matches of these fast-evolving genomic portions (see below). This approach allowed us to align quite diverse sequences but did not guarantee that the positional homology principle was retained when the more diverging sequences were compared. The obtained results are summarized in Table 6.
What is immediately evident is that the level of conserved positions dropped rapidly when comparisons were extended above the taxonomic rank of family. Indeed, L. macaronius and A. appendiculatus, both members of the family Ascalaphidae, shared 71.59% nucleotide identity and a common GC-rich motif TCCCCGGCCCCCCAG-GAT located 96 bp downstream of the 5'-end of rrnS in L. macaronius mtDNA. This motif could be a molecular signature for the whole family, but a better taxon sampling is necessary to assess this point.
When all Neuroptera were considered, the identical positions diminished to 30.15% in the multiple alignment. Thus, a substantial drop in nucleotide identity shared by AT-rich regions occurs at the order level, even permitting very permissive gap costs.
Similarly, C. cornutus and P. concolorus, both members of the family Corydalidae, shared 63.50% nucleotide identity in their control regions. Conversely, the identity level decreased to 45.91% when all megalopteran ATrich regions were compared. Finally, the alignment at the interordinal taxonomic rank of neuropterid control regions produced only 14.80% identical positions. Provided that gaps are quite numerous, this result leads to the conclusion that the AT-rich region is a fast-evolving genomic region. This behavior of neuropterid control regions is a common feature of insect AT-rich regions (e.g., [12]) and is shared with other animal groups. In this respect, a well-documented group is represented by mammals (e.g., [82,83]).
The utility of the AT-rich region as a phylogenetic marker should be most effective at low taxonomic rank (family level and below).

Conclusions
The mtDNA of L. macaronius, the fourth genome sequenced for the order Neuroptera, exhibits the peculiar translocation of the trnC gene with respect to the ancestral gene order of insects. This structural modification represents an exclusive feature of partly or fully sequenced neuropteran mtDNAs and could be a peculiar genomic marker characterizing the entire order Neuroptera.
The analysis of the AT%, AT-skew and GC-skew parameters, which were performed on a set of 84 Table 6 Pairwise and multilple alignments of neuropterid control regions holometabolous insect mtDNAs, shows that these characteristics exhibit complex patterns of variations. Such patterns can be linked or completely unlinked in their variation process.
The neuropterid mtDNAs sequenced to date use the standard invertebrate mitochondrial genome. The distribution of codon families in the PCGs located on the α and β strands is influenced by both structural/functional requirements of the corresponding proteins and by the base composition and AT-/GC-skews. The comparison among orthologous neuropterid PCGs shows that different genes have been subject to different rates of molecular evolution and form a pool of markers suitable for different phylogenetic purposes.
The evolution of neuropterid tRNAs seems to have been variable both in terms of sequence conservation and nucleotide substitution processes.
Neuropterid rrnL and rrnS structures, as demonstrated by the models produced for L. macaronius, appear similar to those determined for other insects. In their structural domains, they show diverse levels of conservation, influenced by different rates of substitution.
An important advance in our comprehension of the structure of rrnL is the production of a model for its domain I. To our knowledge, this is the first time that such a structural modeling has been attempted for an arthropod rrnL.
Neuropterid mtDNAs are punctuated by non-coding portions highly variable in size. Among them, the most extended segment is the control region (AT-rich region), which appears to be a fast-evolving genomic region characterized by short to medium-size repeated motifs/AT-rich patterns.

Sample origin and DNA extraction
The L. macaronius specimen used to determine the mtDNA was collected in June 2007 by EN on arid prairies (45°47'35.75"N 13°35'50.63"E, 98 m elevation) located in the Triestine karst along the road connecting San Giovanni al Timavo to Medeazza (Friuli Venezia Giulia region, northeastern Italy).
Total DNA was purified using the Invisorb DNA extraction kit (Invitec). The quality of DNA was assessed through electrophoresis in a 1% agarose gel and staining with SYBR-safe DNA gel stain (Invitrogen).
PCR amplification and sequencing of L. macaronius mtDNA PCR amplification was performed using a mix of insect universal primers and primers specifically designed against the L. macaronius sequences [84,85]. The complete list of successful primers is available upon request. The PCR products were visualized by electrophoresis in a 1% agarose gel and staining with SYBR-safe DNA gel stain. Each PCR product represented by a single electrophoretic band was purified with the ExoSAP-IT kit (Amersham Biosciences) and directly sequenced. Sequencing of both strands was performed at the BMR Genomics service (http://www.bmr-genomics.it/) on automated DNA sequencers mostly employing the primers used for PCR amplification.

Sequence assembly and annotation
The mtDNA final consensus sequence was assembled using the SeqMan II program from the Lasergene software package (DNAStar, Madison, WI). Gene and strand nomenclature used in this paper were described by Negrisolo et al. [86].
Sequence analysis was performed as follows. Initially, the mtDNA sequence was translated in silico into putative proteins using the Transeq program available at the EBI web site. The true identity of these polypeptides was established using the BLAST program available at the NCBI web site [87,88]. Gene boundaries were determined as follows. The 5' ends of PCGs were inferred to be at the first legitimate in-frame start codon (ATN, GTG, TTG, GTT) in the open reading frame that was not located within the upstream gene encoded on the same strand [64,89]. The only exceptions were atp6 and nad4, which overlap with their upstream genes (atp8 and nad4L, respectively) in many mtDNAs (e.g., [13]). The PCG terminus was inferred to be at the first inframe stop codon encountered. When the stop codon was located within the sequence of a downstream gene encoded on the same strand, a truncated stop codon (T or TA) adjacent to the beginning of the downstream gene was designated the termination codon. This codon was thought to be completed by polyadenylation to a complete TAA stop codon after transcript processing. Finally, pair-wise comparisons with orthologous proteins were performed with ClustalW program to better define the limits of PCGs [90].
Irrespective of the real initiation codon, formyl-Met was assumed to be the starting amino acid for all proteins, as previously shown for other mitochondrial genomes [91,92].
The transfer RNA genes were identified using the tRNAscan-SE program or recognized manually as sequences having the appropriate anticodon and capability of folding into the typical cloverleaf secondary structure [64,93].
The boundaries of the ribosomal rrnL gene were assumed to be delimited by the ends of the trnV-trnL1 pair. The 3' end of the rrnS gene was assumed to be delimited by the start of trnV, while the 5' end was determined through comparison with orthologous genes of other previously sequenced neuropterid insect mtDNAs.
The published neuropterid genomes were re-annotated using the criteria listed above. This approach led us to propose different start/end positions for some genes.

Genomic analysis
Nucleotide composition was calculated with the EditSeq program included in the Lasergene software package. GC-skew = (G-C)/(G+C) and AT-skew = (A-T)/(A+T) were used to measure the base-compositional difference between the different strands or between genes coded on the alternative strands [94]. The prediction of the genetic code used by each mtDNA was performed using the GenDecoder web server [67,68].
Codon usage in neuropterid mtDNAs was studied by calculating the effective number of codon used index (ENC) with the INCA 2.1 program [71,96]. Multiple alignments of genes were produced with the ClustalW algorithm implemented in MEGA 4 [90,95]. Both p-distances and the numbers of different nucleotides in pairwise comparisons were calculated with the PAUP* program [97].
Sequence motifs in the AT-rich region were identified using the Spectral Repeat Finder program [98]. Not only simple motifs were searched but also longer consensus patterns with a minimum match of 75% among different strings.
Testing for phylogenetic signal and saturation of the substitution process An a priori estimation of the phylogenetic signal present in the multiple alignments was performed by maximum likelihood mapping [70]. The phylogenetic signal was evaluated using the TREEPUZZLE 5.2 program [99].
For the nucleotide sequences, the level of saturation of the substitution process was estimated by plotting uncorrected p-distances (based on observed substitutions) against GTR+I+G estimated distances for multiple alignment of single PCGs as well as concatenated PCGs. In the case of amino acid sequences, the level of saturation was estimated by plotting uncorrected p-distances against mtREV24/JTT + G models implemented in MEGA4 and TREEPUZZLE 5.2. After fitting a regression line, its slope (m) was used as a measure of saturation. If m = 1, no saturation is inferred, while for m <<1, a high level of saturation has occurred.
rrnL and rrnS homology modeling Secondary structures of rrnL and rrnS were produced through homology modeling using as templates published structures of Apis mellifera, Drosophila melanogaster and Drosophila virilis [74,79].