Skip to main content
  • Research article
  • Open access
  • Published:

Genome-wide diversity in temporal and regional populations of the betabaculovirus Erinnyis ello granulovirus (ErelGV)



Erinnyis ello granulovirus (ErelGV) is a betabaculovirus infecting caterpillars of the sphingid moth E. ello ello (cassava hornworm), an important pest of cassava crops (Manihot esculenta). In this study, the genome of seven field isolates of the virus ErelGV were deep sequenced and their inter- and intrapopulational sequence diversity were analyzed.


No events of gene gain/loss or translocations were observed, and indels were mainly found within highly repetitive regions (direct repeats, drs). A naturally occurring isolate from Northern Brazil (Acre State, an Amazonian region) has shown to be the most diverse population, with a unique pattern of polymorphisms. Overall, non-synonymous substitutions were found all over the seven genomes, with no specific gathering of mutations on hotspot regions. Independently of their sizes, some ORFs have shown higher levels of non-synonymous changes than others. Non-core genes of known functions and structural genes were among the most diverse ones; and as expected, core genes were the least variable genes. We observed remarkable differences on diversity of paralogous genes, as in multiple copies of p10, fgf, and pep. Another important contrast on sequence diversity was found on genes encoding complex subunits and/or involved in the same biological processes, as late expression factors (lefs) and per os infectivity factors (pifs). Interestingly, several polymorphisms in coding regions lie on sequences encoding specific protein domains.


By comparing and integrating information about inter- and intrapopulational diversity of viral isolates, we provide a detailed description on how evolution operates on field isolates of a betabaculovirus. Our results revealed that 35–41% of the SNPs of ErelGV lead to amino acid changes (non-synonymous substitutions). Some genes, especially non-core genes of unknown functions, tend to accumulate more mutations, while core genes evolve slowly and are more conserved. Additional studies would be necessary to understand the actual effects of such gene variations on viral infection and fitness.


Baculoviruses are large double-stranded DNA viruses infecting insects from three different orders [1]. The Baculoviridae family is divided into four genera [2], the lepidopteran-specific (Alphabaculovirus and Betabaculovirus), hymenopteran-specific (Gammabaculovirus), and dipteran-specific viruses (Deltabaculovirus). The genomes of such viruses commonly show high levels of genome collinearity and compaction [3, 4], varying in size from approximately 81 to 178 Kb, and encoding from 89 to 183 ORFs [5]. Some baculoviruses have been used as efficient and sustainable bioinsecticides for controlling populations of pests in forests and crops, and are safe alternatives to chemical pesticides [6, 7]. One of the features that enhance the applicability of these viruses as bioinseticides are their high resistance to degradation in the environment, characteristic provided a paracrystalline protein matrix that naturally surrounds their viral particles, forming the occlusion bodies (OBs) [1]. Two of the best examples of baculoviral species used as insecticides are Anticarsia gemmatalis multiple nucleopolyhedrovirus (AgMNPV, an alphabaculovirus), in soybean crops [6, 8], and Cydia pomonella granulovirus (CpGV, a betabaculovirus), in fruit crops [9].

Erinnyis ello (Lepidoptera: Sphingidae) is a serious pest of cassava (Manihot esculenta) in the neotropics, with a broad geographic range extending from southern Brazil, Argentina, and Paraguay to the Caribbean basin and the southern United States [10]. This insect is also a severe pest of rubber tree (Hevea brasiliensis M. Arg.). Several natural enemies of this insect have been identified including parasites, predators, fungi, bacteria, and a virus (Erinnyis ello granulovirus, ErelGV). Because of the migratory behavior of hornworm adults, this abundance of natural enemies does not prevent periodic caterpillar outbreaks [11].

In Brazil, ErelGV has been used efficiently to control populations of the cassava hornworm (E. ello) in cassava plantations [12, 13] and very few studies on the biology and molecular characterization of this virus have been carried out [14, 15]. Cassava is a native plant from Brazil and an important carbohydrate source for human consumption, especially in Africa, Asia, and Latin America [16]. Given the agricultural importance of cassava, and the economic impacts of E. ello caterpillars, a recent study has characterized the virus morphology, genome sequence, and evolutionary history of ErelGV [14]. It revealed a 102,759 bp genome that lacks typical homologous regions (hrs) and encodes at least 130 ORFs. As for all baculoviruses, ErelGV also encodes a set of 38 genes shared by all baculoviruses, called ‘core genes’ [17]. Recently, Ardisson-Araujo et al. [15] constructed a recombinant Autographa californica multiple nucleopolyhedrovirus (AcMNPV) containing the the tmk-dut fusion gene (erel5) of ErelGV showing that the recombinant virus was able to accelerated viral DNA replication, BV and OB production.

Field populations of betabaculoviruses are known to be composed by multiple genotypic variants [18], however, few studies on genome sequence diversity of baculovirus isolates have been done [3]. Inter-isolate comparative studies have shown that baculoviral genes have low levels of polymorphisms, and non-synonymous substitutions (NSS) tend to be located within highly variable genes or specific genomic regions [19,20,21]. Gain and loss of genomic fragments are observed especially in repetitive regions, such as homologous regions (hrs) and direct repeats (drs) [3, 18, 19].

In this study, we investigated the inter- and intra-isolate genetic variability of seven temporal and regional field populations of ErelGV. Aspects of the ErelGV genomic organization and evolution are discussed; and we offer a detailed summary of polymorphisms on genes belonging to different functional categories.


Viral samples and granules purification

The samples “ErelGV-86”, “-94”, “-98”, “-99” and “-00” correspond to Brazilian field isolates of ErelGV sequentially collected in 1986, 1994, 1998, 1999 and 2000 in cassava crops in Itajaí/Jaguaruna region (Santa Catarina State, Brazil), where these viruses were used for controlling populations of E. ello caterpillars (Fig. 1). The “ErelGV-AC” isolate was found in infected larvae on cassava plants collected at Cruzeiro do Sul (Acre State, Brazil). The ErelGV-PA isolate was found in infected larvae present on rubber trees collected at Belem (Pará State, Brazil) (Table 1). Viral particles were purified according to [22]. In brief, viral-infected dead caterpillars were macerated in homogenization buffer (1% ascorbic acid, 2% SDS, 10 mM Tris, pH 7.8, 1 mM EDTA, pH 8.0), filtered through cheesecloth layers and centrifuged at 10,000 x g for 15 min at 4 °C. The pellet was suspended in 10 mL of TE buffer (10 mM Tris-HCl, pH 8.0 and 1 mM EDTA, pH 8.0) and submitted to another centrifugation step at 12,000 x g for 12 min, at 4 °C. This new pellet was resuspended in TE buffer and applied onto sucrose gradients with densities varying from 1.17 g/mL to 1.26 g/mL. The gradients were submitted to centrifugation at 100,000 x g, for 40 min at 4 °C. The granule-containing band was collected, diluted in TE buffer and centrifuged at 12,000 x g for 15 min at 4 °C. Finally the viral particles (granules) were suspended in water and stored at − 20 °C.

Fig. 1
figure 1

Geographical locations where ErelGV samples were isolated from (approximate coordinates). Red dots depict Brazilian Isolates, and the green one (top-left corner) represents a Colombian isolate shown here for illustrative purposes (see phylogenetic section for more details)

Table 1 Statistics of ErelGV isolates

DNA extraction

Granules (1.5 mL) were dissolved by addition of sodium carbonate solution to a final concentration of 0.1 M and incubation at 37 °C, for 30 min. [23]. Viral disruption buffer (10 mM Tris, pH 7.6, 10 mM EDTA, pH 8.0, 0.25% SDS) containing 500 μg/mL Proteinase K was added to the sample which was incubated at 37 °C (overnight). Viral DNA purification was carried out by extraction cycles of phenol; followed by phenol:chloroform:isoamyl alcohol (25:24:1), and chloroform:isoamyl alcohol (24:1), according to [24]. The DNA was precipitated with absolute ethanol and 3 M sodium acetate, pelleted, and washed with 70% ethanol. After air drying, the DNA was suspended in TE buffer and kept at 4 °C. DNA quantification was carried out by spectrophotometry using a spectrophotometer Thermo Scientific NanoDrop™ 1000 (220 nm – 750 nm).

Genome sequencing, assembly, and sequence analysis

ErelGV isolates were sequenced in a GS FLX Titanium platform (Roche 454 Life Sciences) at the ‘Centro de Genômica de Alto Desempenho do Distrito Federal’ (Brasília, Brazil), following the manufacturer’s recommendations (Sequencing Method Manual, GS FLX Titanium Series, Roche 454 Life Sciences). Genome assembly were performed de novo using Geneious R7 [25]. A single representative genome was reconstructed from each isolate, considering the following level of stringency: minimum overlap of 150 base pairs among reads, and minimum sequence identity of 97%. Potential out-of-frame sequence errors observed after the assembly were manually inspected and corrected in pairwise comparisons with the genome of ErelGV-86 [14], from which the annotations were transferred into the newly assembled genomes. To analyse the intra-populational diversity, each sequence dataset was mapped against its respective representative genome, and Geneious R7 was used to identify SNPs inside and outside coding sequences of all isolates. After such process, only variants supported by a minimum of five reads and showing at least 1% of frequency were considered, and substitutions within tandem repeats were ignored. In order to assess the diversity of the whole ErelGV metapopulation, a large dataset of over 210,000 reads were created by combining sequences from all viral isolates, and this dataset was mapped against a final ErelGV consensus genome. Prior to SNP detection, errors in homopolymeric regions were identified and corrected using RC454 [26], where low-frequency variants (< 0.01) and those supported for less than 3 reads were removed. To ensure the reliability of our genetic diversity analysis, polymorphic sites identified in coding sequences were manually curated to avoid potential false-positive detections. Gene Ontology information (biological process, molecular function, cellular component) was retrieved from the records of ErelGV-86 available on UniProt [27]. The betabaculovirus maximum likelyhood (ML) tree was inferred in PhyML [28], with 500 replicates under a WAG+I + G model selected by ProtTest [29], using concatenated amino acid sequences of 38 core genes, aligned with MAFFT [30]. ErelGV-specific ML tree was inferred using concatenated partial sequences of granulin, lef-9 and lef-8 genes, also aligned with MAFFT, and inferred using the FastTree method implemented in Geneious R7 [25]. The genomic circular map was plotted using Circos [31].


General aspects of ErelGV genomes

All isolates of ErelGV included in this study are collinear and no gene gain or loss was observed: all genome encode at least 130 ORFs (minimum size of 150 bp), as observed for ErelGV-86 [14]. When compared to the isolate 86, most genomes showed high nucleotide similarity (from 99.47 to 99.94%) and average G + C content of 39.75% (Table 1). Genome length ranged from 102,616 to 102,764, and insertions/deletions (10 or more base pairs) were found mainly within direct repeats (drs), i.e. tandem repeats located in coding and non-coding regions (Additional file 1).

By performing whole genome alignments, large indels (12–88 bp) were detected mainly in seven drs. Some of them were found within coding regions, leading to size variations in low-complexity regions, as observed in: erel44 (dr8, with three size variants); p10 (dr10, with three size variants); and erel121 (dr15/16, with three size variants). Three other direct repeats with large indels were found in non-coding regions located: downstream of erel11 (dr2, showing up to 46-bp indel); downstream of erel19 (dr3, showing up to 26-bp indel); and upstream of erel23 (dr5, showing up to 88-bp indel). No promoter motifs were found in these intergenic drs, except for dr5, where a putative TATA box motif is disrupted due to deletion in ErelGV-98, -AC, and -PA.

Intrapopulational genetic diversity

To analyse the intra-isolate diversity, each read dataset from the sequenced isolates was mapped against its respective representative genome, and polymorphisms inside and outside the coding sequences were detected. On average, around 52% of the polymorphisms observed in each isolate corresponded to synonymous substitutions; 38.5% to non-synonymous; and 9.5% to substitutions in non-coding regions. The intra-isolate diversity was slightly similar in most samples, except for ErelGV-AC and ErelGV-PA, which showed extremely high and low levels of diversity, respectively. The total number of polymorphisms ranged from 20 (in ErelGV-PA) to 1267 (in ErelGV-AC) (Fig. 2A), and the number of SNPs observed for each isolate did not correlate to the sequencing coverage (r = 0.015) (Fig. 2B and Additional file 2).

Fig. 2
figure 2

Overview of SNPs on ErelGV genomes. a) Total number of SNPs, and frequency of synonymous, non-synonymous substitutions, and SNPs on non-coding regions of ErelGV isolates. b) Relationship between number of SNPs and sequencing coverage. As observed, these two variables are not correlated (r = 0.015), showing that sequencing depth did not influence the levels of diversity detected in each isolate. Additional statistics of the sequencing libraries generated in this study can be found at Additional file 2

ErelGV ORFs range from 153 to 3303 bp, and taking into account their differences in size and polymorphisms, genetic diversity was estimated by means of the number of non-synonymous substitutions per base pair (NSS/bp). This approach revealed that most of the genes are either conserved (no NSS) or showed low levels of diversity (from 1 to 3 × 10− 3 NSS/bp) (see dark green dots on Fig. 3A and Additional file 3). Moreover, low correlations were observed between diversity and ORF size (− 0.01 < r < to 0.29). As expected, the exception was ErelGV-AC, which showed moderate correlation for such genetic aspect (r = 0.516), with some genes showing high levels of diversity (greater than 9 × 10− 3 NSS/bp) (see dark purple dots on Fig. 3A and Additional file 3).

Fig. 3
figure 3

Non-synonymous substitutions, ORF size and functional categories. In these plots, each dot represents a gene depicted in Fig. 4. a) Number of NSS per base pair (× 10− 3) and ORF size (bp) have shown moderate correlation (r = 0.51). The grey area corresponds to the 95% confidence interval, and highly conversed or highly diverse genes are shown as labelled outliers. b) Scatter plot showing how genetic diversity (NSS/bp) relates to gene functions. As shown, most highly diverse genes still have unknown functions. For the sake of clarity, the structural gene p10, which shows 53.85 NSS per base pair (× 10− 3), are not included in this plot. See Additional file 8 for quantitative data

Polymorphisms of ErelGV

We have also estimated the diversity in ErelGV species as a whole. Firstly, we assumed all isolates as members of a hypothetical ErelGV metapopulation, and their reads were combined and mapped against a consensus genome. As SNPs present in frequency and coverage below the minimum thresholds cannot be detected in some isolates, by combining all isolate-specific reads, rare polymorphisms could be identified in the metapopulation due to the cumulative effect provided by all reads in association. Conversely, low frequency isolate-specific SNPs could not be detected when all reads were combined and mapped against the consensus.

The full mapping against the consensus genome revealed at least 1893 substitutions in coding and non-coding regions. As expected, despite the fact that non-coding regions cover only 6% of the ErelGV genomes, more than 21% of the substitutions (401) were found in such regions, proportion of SNPs much higher than that observed for coding regions., Moreover, a total of 564 non-synonymous substitutions were found spread all over the genome, with variable genes interleaved among genes with few polymorphisms (inner ring, Fig. 4).

Fig. 4
figure 4

Genetic diversity of the ErelGV. Externally in this circular map ORFs are represented in positive and negative sense. Arrowheads highlight genes shared by all baculoviruses (core genes). From the first to seventh ring the heatmaps depict the number of NSS per gene. Finally, the inner ring summarizes the genetic diversity of ErelGV sp. when all reads are combined and mapped against a consensus genome. See Additional file 8 for quantitative data

The ErelGV scatterplot of ORF sizes against non-synonymous substitutions has revealed moderate correlation between these two variables (r = 0.514), and the distribution was similar to that observed for ErelGV-AC (Fig. 3A and Additional file 3). Even after combining all reads, at least 20 invariable genes were identified in the ErelGV species, and seven of them are core genes: ac78-like (erel98), vlf-1 (erel99), and genes encoding structural proteins like odv-e18 (erel13), odv-e27 (erel89), p6.9 (erel79), erel95 (ac81-like), and erel109 (ac68-like). On the other hand, some genes showed high levels of diversity (more than 15 × 10− 3 NSS/bp), as the structural genes ac53-like (erel123), ac110-like (erel53), pif-3, erel17, p10, and pif-7, which is the 38th recently identified core gene [17], and 11 non-core genes with unknown function.

Diversity on functional groups

By grouping ErelGV genes based on their main function, we analysed the level of polymorphisms in genes belonging to the following categories: Auxiliary; Host modulation; Replication; Transcription; Structural, as well as core and non-core genes (Fig. 3B). Functional categories were assigned based on Cohen et al. [32].

Genes implicated in replication and DNA metabolism, such as alk-exo, dnapol, and dna-helicase-1, were the most conserved genes in the ErelGV genomes, with an average diversity of 3.03 × 10− 3 NSS/bp. Among these genes, vlf-1, lef-3, and dbp do not show any polymorphisms in all isolates, and, conversely, lef-1 was the most variable. Genes implicated in host modulation and auxiliary functions have shown similar intermediate levels of polymorphisms, varying on average between 3.49 and 3.84 × 10− 3 NSS/bp, respectively (Fig. 3B). Genes involved in structural and transcriptional functions were among the most variable ErelGV genes of known function, showing respectively 4.42 and 4.59 × 10− 3 NSS/bp of average diversity. Our analyses have revealed that the structural genes pif-3, desmoplakin, pep-1, erel17, and erel123 (Ac53-like, a core gene), as well as the transcriptional regulatory gene lef-10, are among the most polymorphic genes of ErelGV. Conversely, at least six structural genes (erel95, erel109, granulin, odv-e18, odv-e27 and p6.9), and one gene involved in transcription regulation (lef-6) have shown no polymorphisms in all viral isolates.

Interestingly, most genes with high number of polymorphisms (> 15 × 10− 3 NSS/bp) have unknown functions (Fig. 3B). Among them are three genes unique to ErelGV (erel53; erel59; erel70), and other seven only encoded by betabaculoviruses (erel19; erel124; erel121; erel18; erel24; erel37 and erel29). On the other hand, some genes of unknown function have also shown to be invariable in all isolates, such as: erel2; erel35; erel69; erel98 (Ac78-like, a core gene); erel102 (unique to ErelGV); and erel116. An important difference in polymorphisms was observed between core and non-core genes. On average, non-core genes have nearly double as many non-synonymous substitutions as core gene (6.05 and 3.70 × 10− 3, respectively) (Fig. 3B), characteristic already reported in other field-isolated alphabaculoviruses [19].

Phylogenetic analysis

Our phylogenetic analysis with concatenated protein sequences of core genes revealed the evolutionary relationship of ErelGV isolates and other betabaculoviruses. As expected, all ErelGV isolates clustered together, having Choristoneura occidentalis granulovirus (ChocGV) as their most closely related taxon, as observed by [14]. The analysis also revealed two main clades of betabaculoviruses (Fig. 5 and Additional file 4), which showed a slightly different species composition compared to previous studies [33,34,35]. Instead of placing Plutella xylostella granulovirus (PlxyGV) and Agrotis segetum granulovirus (AgseGV) as basal taxa in clade A, the topology of the betabaculovirus subtree shows them in clade B, which includes CpGV and the ErelGV isolates.

Fig. 5
figure 5

Maximum likelihood tree of Betabaculovirus and ErelGV isolates. The phylogeny was inferred using concatenated amino acid sequences of baculovirus core genes. ErelGV isolates are highlighted in grey. The baculoviral genera Alphabaculovirus (α = AcMNPV); and Gammabaculovirus (γ = NeabNPV); Deltabaculovirus (δ = CuniNPV) were included as outgroups, and the latter was used as root. Bootstrap values are indicated for each interior branch. The tree is shown as a cladogram for purposes of clarity only, and a phylogram can be found at the (Additional file 4)

In order to better understand the evolutionary history of ErelGV isolates in South America, an additional analysis including the Colombian isolate ErelGV-M34 [36] was performed using concatenated partial sequences of the granulin, lef-9, and lef-8. ErelGV isolates from the southern Brazilian state of Santa Catarina (SC, see Table 1 and Fig. 1) clustered together, having the northern isolates ErelGV-AC and -PA, and the Colombian isolate M34 as the most distantly related taxa (see Additional file 5).


PIF proteins have distinct evolutionary histories and different sequence diversities

ErelGV encodes at least eight genes known to act as per os infection factors (pif) (Table 2). A previous study has proposed that proteins P74, PIF1, PIF2 and PIF3 form a conserved complex shared by all baculovirus, which plays essential roles in the baculoviral entry into midgut cells [37, 38]. Our results have revealed that among these genes, the core gene pif-3 is by far the most variable of them. The protein it encodes shows a conserved hydrophobic (transmembrane) N-terminal sequence [39], and a domain with unknown function (DUF666, e-value: 6.78e-66) at its C-terminal, region where most polymorphisms (6 NSS) were found. Interestingly, this domain is likely to be located externally to the ODV envelope, establishing interactions primarily with other PIFs, but is not directly involved in viral binding or fusion [38]. Furthermore, a recent study identified the core gene p95/vp91 (pif-8, erel93) as novel PIF protein required for both ODV and BV nucleocapsid assembly, and the formation of the PIF complex in ODV envelopes [17]. Together pif-3, pif-8 and pif-0 are the three most diverse pif genes of ErelGV.

Table 2 The eight pif genes shared by all baculoviruses, and their diversity in ErelGV (sp.)

Paralogous genes evolve under distinct diversifying strategies

Two other putative structural genes were found to be highly variable in this study: erel17 (18.75 × 10− 3 NSS/bp) and erel123 (15.04 × 10− 3 NSS/bp). By homology, the product encoded by erel17 is suggested to be a viral capsid protein [GO:0019028], and most of the polymorphisms (7 out of 9) were found at its N-terminal region. This region corresponds to a NPV_P10 domain (e-value: 1.40e-07), commonly encoded by the p10 (erel54), and responsible for aggregating P10-like proteins to form filaments and tubular structures [40]. Interestingly, the NPV_P10 domain encoded by p10 is rather conserved, and its overall gene diversity is much lower than the domain encoded by its potential second copy, erel17. Another important difference between the peptides encoded by p10 and erel17 lies on their C-terminal regions: while P10 shows a proline-rich region composed by 6–8 copies of a motif “PEPEPESK” (inside dr10), EREL17 shows a Serine/Threonine rich C-terminal, apparently not homologous to the one observed in P10. It is unclear what function erel17 may play, however, since P10-like proteins tend to interact with each other [40], further experimental studies investigating P10-like protein aggregation will be necessary in order to understand their role during host infection.

Another polymorphic structural gene of ErelGV is pep-1 (9.47 × 10− 3 NSS/bp). This gene encodes a protein with a Baculo_PEP_N domain (e-value: 5.1e− 35), and all pep-1 NSS were found within this region. Immediately downstream to this gene, two other genes encoding Baculo_PEP_N are found with different levels of diversity, they are: pep/p10 (6.82 × 10− 3 NSS/bp); and pep-2 (2.21 × 10− 3 NSS/bp). The PEP (polyhedron envelope protein) proteins are known to form protective layers that ensure OB integrity in the field [41]. As previously shown [42], alpha- and betabaculoviruses express different numbers of pep genes. While alphabaculoviruses encode PEP proteins with Baculo_PEP_N and Baculo_PEP_C domains, the betabaculoviruses show three copies of pep: pep-1 and pep-2, which encode a single Baculo_PEP_N domain; and pep/p10, which encodes both the N- and the C-terminal domains. Although PEP protein structures and their binding modes are still unclear [41], the presence of three PEPs with different domain composition and levels of diversity in ErelGV may lead to changes in the OB calyx structure, however, further studies are required to demonstrate it conclusively.

Finally, viral homologs of fibroblast growth factor (fgf) gene are shared by both alpha- and betabaculoviruses, and its product (FGF protein) has been suggested to act as a chemoattractant that enhances the migration of uninfected hemocytes towards infected tissues, both promoting viral spread through the larvae circulatory system [43] and accelerating host mortality [44]. Interestingly, all betabaculoviruses encode three fgf paralogs (fgf-1, − 2, and − 3), which have probably emerged via independent events of duplication or HGT occurring after the betabaculovirus speciation [42]. In ErelGV, fgf-2 and -3 have shown similar levels of diversity (3.34 and 4.54 × 10− 3 NSS/bp, respectively), while fgf-1 has shown to be the most diverse copy (7.31 × 10− 3 NSS/bp). Interestingly, the ErelGV FGFs have different lengths, and most of their polymorphisms are located within the C-terminal region (Additional file 6). FGF proteins are known to contain a signal peptide in its N-terminus, and a C-terminus of variable length [43].

Some core genes of structural function are highly polymorphic

The ac53-like (erel123) gene was the second most diverse core gene. This gene is suggested to be involved in both nucleocapsid assembly and transportation [45], and in ErelGV, erel123 is has a Baculo_RING domain (e-value: 6.04e-44). Interestingly, 5 out 6 polymorphisms found in this gene are located specifically in a Zinc finger motif (IPR013083) of the Baculo_RING domain, however, the impacts of such mutations on gene function are still unclear.

The high level of diversity observed for the structural gene desmoplakin (9.07 × 10− 3 NSS/bp) coincides with results shown previously, where this gene was pointed as the most variable core gene in baculoviruses [21]. Desmoplakin acts in different viral processes such as: nucleocapsid egression from the nucleus; synthesis of pre-occluded virions; and OB formation [46]. In ErelGV, desmoplakin encodes a protein with a conserved Desmo_N domain at its N-terminal, but most NSS (17 out of 18) were found within its C-terminal region, which in turn have no specific domains assigned.

Genes involved in DNA replication and genome processing: The diversity of LEF proteins

At least three lef genes (lef-1, − 2 and − 10) have shown to be the most diverse genes among those involved in DNA replication and genome processing. LEF-1 and LEF-2 proteins seem to form a complex, where LEF-2 is responsible to bind both the DNA and LEF-1 [47]. All polymorphisms of LEF-1 and LEF-2 were located at their C-terminal regions (Additional file 7). While no specific protein domain was found on LEF-1, LEF-2 encodes a single domain (Baculo_LEF-2) that comprises its entire length. The lef-10 has shown to be the most polymorphic lef gene (12.92 × 10− 3 NSS/bp). This gene encodes a short protein that is probably a component of a multisubunit RNA polymerase [48]. Although lef-10 shares 1/3 (134 bp) of its coding region with vp1054, only one of its five non-synonymous substitutions is found inside this overlapping region, also causing a synonymous change (CTG → CTA) at the 5′ end of vp1054 (Additional file 7). Since mutations in such regions are likely to impair both genes [49], this pattern evidences an interesting constraint that can limit the adaptive space of some baculovirus overlapping genes.


In this article, we presented a whole genome study on the intra- and inter-isolate diversity of field betabaculovirus populations. ErelGV have been used as an important insect biological control agent on cassava crops, and the present study brings out an extensive analysis on the evolution of this viral species. In terms of phylogenetics, southern Brazil isolates (86, 94, 98, 99 and 00) have shown to be more distantly related to the northern isolates (AC, PA). Our results have also revealed that among SNPs detected by deep sequencing of these isolates, 35–41% correspond to mutations leading to amino acid changes (non-synonymous substitutions). No clear trend was found as to genes gathering into clusters based on diversity, as highly conversed or diverse genes are scattered all over the genome. ORF size and number of NSS were not always proportional, revealing that some genes, especially non-core genes of unknown functions, tend to accumulate more mutations, while others, notably core genes, evolve slowly and gather few or no changes over time. Some genes of ErelGV are present in multiple copies (p10, fgf, pep), and each copy shows its particular patterns of evolution. Interestingly, despite theirs names, set of genes known to act in association and/or involved in the same biological processes, as lefs, pifs and their respective cognates, are not members of to the same protein family, and also show distinct modes of evolution. More studies are necessary to help us understand the effects of such gene variations on viral infection and fitness.



Acre State, Brazil


Base pair


Copy Number Variant

drs :

Direct repeats


Erinnyis ello granulovirus

fgf :

Fibroblast growth factor


Horizontal Gene Transfer

hr :

Homologous region

lef :

Late expression factor


Non-synonymous substitutions


Occlusion-derived virus


Open Reading Frame


Pará State, Brazil

pep :

Polyhedron envelope protein

pif :

Per os infectivity factor


Single nucleotide polymorphism


  1. Baculovirus Molecular Biology [].

  2. Jehle JA, Blissard GW, Bonning BC, Cory JS, Herniou EA, Rohrmann GF, Theilmann DA, Thiem SM, Vlak JM. On the classification and nomenclature of baculoviruses: a proposal for revision. Arch Virol. 2006;151(7):1257–66.

    Article  CAS  Google Scholar 

  3. Chateigner A, Bezier A, Labrousse C, Jiolle D, Barbe V, Herniou EA. Ultra deep sequencing of a Baculovirus population reveals widespread genomic variations. Viruses. 2015;7(7):3625–46.

    Article  CAS  Google Scholar 

  4. Theze J, Cabodevilla O, Palma L, Williams T, Caballero P, Herniou EA. Genomic diversity in European Spodoptera exigua multiple nucleopolyhedrovirus isolates. J Gen Virol. 2014;95(Pt 10):2297–309.

    Article  Google Scholar 

  5. Brister JR, Ako-Adjei D, Bao Y, Blinkova O. NCBI viral genomes resource. Nucleic Acids Res. 43(Database issue):D571–7.

    Article  Google Scholar 

  6. Moscardi F. Assessment of the application of baculoviruses for control of Lepidoptera. Annu Rev Entomol. 1999;44:257–89.

    Article  CAS  Google Scholar 

  7. Moscardi F, Souza ML, Castro ME, Moscardi ML, Szewczyk B. Baculovirus pesticides: Present state and future perspectives. In: Ahmad I, Ahmad F, Pichtel J, editors. Microbes and microbial technology. New York: Springer; 2011. p. 415–45.

    Chapter  Google Scholar 

  8. Oliveira JV, Wolff JL, Garcia-Maruniak A, Ribeiro BM, de Castro ME, de Souza ML, Moscardi F, Maruniak JE, Zanotto PM: Genome of the most widely used viral biopesticide: Anticarsia gemmatalis multiple nucleopolyhedrovirus. J Gen Virol 2006, 87(Pt 11):3233–3250.

    Article  CAS  Google Scholar 

  9. Grzywacz D, Moore S. Chapter 7 - production, formulation, and bioassay of Baculoviruses for Pest control A2 - Lacey, Lawrence a. In: Microbial Control of Insect and Mite Pests. Amsterdam: Academic Press; 2016. p. 109–24.

    Chapter  Google Scholar 

  10. Bellotti AC, Smith L, Lapointe SL. Recent advances in cassava pest management. Annu Rev Entomol. 1999;44:343–70.

    Article  CAS  Google Scholar 

  11. Bellotti AC, Braun AR, Arias B, Castillo JA, Guerrero JM. Origin and management of neotropical cassava artropod pests. Afr Crop Sci J. 1994;2(4):407–17.

    Google Scholar 

  12. Fazolin M, Estrela JLV, Filho MDC, Santiago ACC, Frota FS. Manejo integrado do mandarová-da-mandioca Erinnyis ello (L.) (Lepidoptera: Sphingidae): conceitos e experiências na região do Vale do rio Juruá, Acre. Rio Branco: Embrapa; 2007.

    Google Scholar 

  13. Schmitt AT. Uso de Baculovirus erinnyis para el control biologico del gusano cachon de la yuca. Yuca Bol Inf (Colomb). 1988;12:1–4.

    Google Scholar 

  14. Ardisson-Araujo DM, de Melo FL, Andrade Mde S, Sihler W, Bao SN, Ribeiro BM, de Souza ML. Genome sequence of Erinnyis ello granulovirus (ErelGV), a natural cassava hornworm pesticide and the first sequenced sphingid-infecting betabaculovirus. BMC Genomics. 2014;15:856.

    Article  Google Scholar 

  15. Ardisson-Araujo DM, Lima RN, Melo FL, Clem RJ, Huang N, Bao SN, Sosa-Gomez DR, Ribeiro BM. Genome sequence of Perigonia lusca single nucleopolyhedrovirus: insights into the evolution of a nucleotide metabolism enzyme in the family Baculoviridae. Sci Rep. 2016;6:24612.

    Article  CAS  Google Scholar 

  16. Plucknett DL, Philipps TP, Kagbo RB. A global development strategy for cassava: transforming a traditional tropical root crop. In: The global cassava development strategy and implementation plan. Rome: FAO; 2001. p. 5–39.

    Google Scholar 

  17. Javed M, Biswas S, Willis L, Harris S, Pritchard C, van Oers M, Donly B, Erlandson M, Hegedus D, Theilmann D. Autographa californica multiple Nucleopolyhedrovirus AC83 is a per Os infectivity factor (PIF) protein required for occlusion-derived virus (ODV) and budded virus Nucleocapsid assembly as well as assembly of the PIF complex in ODV envelopes. J Virol. 2017;91(5).

  18. Erlandson MA. Genetic variation in field populations of baculoviruses: mechanisms for generating variation and its potential role in baculovirus epizootiology. Virol Sin. 2009;24(5):458.

    Article  Google Scholar 

  19. Brito AF, Braconi CT, Weidmann M, Dilcher M, Alves JM, Gruber A, Zanotto PM. The Pangenome of the Anticarsia gemmatalis multiple Nucleopolyhedrovirus (AgMNPV). Genome biology and evolution. 2015;8(1):94–108.

    Article  Google Scholar 

  20. Li L, Li Q, Willis LG, Erlandson M, Theilmann DA, Donly C. Complete comparative genomic analysis of two field isolates of Mamestra configurata nucleopolyhedrovirus-a. J Gen Virol. 2005;86(Pt 1):91–105.

    Article  CAS  Google Scholar 

  21. Miele SA, Garavaglia MJ, Belaich MN, Ghiringhelli PD. Baculovirus: molecular insights on their diversity and conservation. Int J Evol Biol. 2011;2011.

  22. Maruniak JE. Baculovirus structural proteins and protein synthesis. In: Granados RR, Federici BA, editors. The biology of baculovirus, vol. vol. 1. Boca Raton, Florida: CRC Press; 1986. p. 129–46.

    Google Scholar 

  23. O'Reilly DR, Miller LK, Luckow VA. Baculovirus expression vectors: a laboratory manual. New York: Oxford University Press; 1993.

    Google Scholar 

  24. Green MR, Sambrook J. Molecular cloning: a laboratory manual. 4th ed. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press announces; 2012.

    Google Scholar 

  25. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28(12):1647–9.

    Article  Google Scholar 

  26. Henn MR, Boutwell CL, Charlebois P, Lennon NJ, Power KA, Macalalad AR, Berlin AM, Malboeuf CM, Ryan EM, Gnerre S, et al. Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 2012;8(3):e1002529.

    Article  CAS  Google Scholar 

  27. UniProt C. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(Database issue):D204–12.

    Google Scholar 

  28. Guindon S, Delsuc F, Dufayard JF, Gascuel O. Estimating maximum likelihood phylogenies with PhyML. Methods Mol Biol. 2009;537:113–37.

    Article  CAS  Google Scholar 

  29. Darriba D, Taboada GL, Doallo R, Posada D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011;27(8):1164–5.

    Article  CAS  Google Scholar 

  30. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.

    Article  CAS  Google Scholar 

  31. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19(9):1639–45.

    Article  CAS  Google Scholar 

  32. Cohen D, Marek M, Davies B, Vlak JM, van Oers M. Encyclopedia of Autographa californica Nucleopolyhedrovirus genes. Virol Sin. 2009;24(5):359–414.

    Article  Google Scholar 

  33. Ferrelli ML, Salvador R, Biedma ME, Berretta MF, Haase S, Sciocco-Cap A, Ghiringhelli PD, Romanowski V. Genome of Epinotia aporema granulovirus (EpapGV), a polyorganotropic fast killing betabaculovirus with a novel thymidylate kinase gene. BMC Genomics. 2012;13:548.

    Article  CAS  Google Scholar 

  34. Harrison RL, Rowley DL, Mowery J, Bauchan GR, Theilmann DA, Rohrmann GF, Erlandson MA. The complete genome sequence of a second distinct Betabaculovirus from the true armyworm, Mythimna unipuncta. PLoS One. 2017;12(1):e0170510.

    Article  Google Scholar 

  35. Liang Z, Zhang X, Yin X, Cao S, Xu F. Genomic sequencing and analysis of Clostera anachoreta granulovirus. Arch Virol. 2011;156(7):1185–98.

    Article  CAS  Google Scholar 

  36. Jehle JA, Lange M, Wang H, Hu Z, Wang Y, Hauschild R. Molecular identification and phylogenetic analysis of baculoviruses from Lepidoptera. Virology. 2006;346(1):180–93.

    Article  CAS  Google Scholar 

  37. Ferreira BC, Melo FL, Souza ML, Moscardi F, Bao SN, Ribeiro BM. High genetic stability of peroral infection factors from Anticarsia gemmatalis MNPV over 20years of sampling. J Invertebr Pathol. 2014;118:66–70.

    Article  Google Scholar 

  38. Peng K, van Oers MM, Hu Z, van Lent JW, Vlak JM. Baculovirus per os infectivity factors form a complex on the surface of occlusion-derived virus. J Virol. 2010;84(18):9497–504.

    Article  CAS  Google Scholar 

  39. Braunagel SC, Summers MD. Molecular biology of the baculovirus occlusion-derived virus envelope. Curr Drug Targets. 2007;8(10):1084–95.

    Article  CAS  Google Scholar 

  40. Carpentier DC, Griffiths CM, King LA. The baculovirus P10 protein of Autographa californica nucleopolyhedrovirus forms two distinct cytoskeletal-like structures and associates with polyhedral occlusion bodies during infection. Virology. 2008;371(2):278–91.

    Article  CAS  Google Scholar 

  41. Sajjan DB, Hinchigeri SB. Structural Organization of Baculovirus Occlusion Bodies and Protective Role of multilayered polyhedron envelope protein. Food and environmental virology. 2016;8(1):86–100.

    Article  CAS  Google Scholar 

  42. Yin F, Zhu Z, Liu X, Hou D, Wang J, Zhang L, Wang M, Kou Z, Wang H, Deng F, et al. The complete genome of a new Betabaculovirus from Clostera anastomosis. PLoS One. 2015;10(7):e0132792.

    Article  Google Scholar 

  43. Detvisitsakun C, Berretta MF, Lehiy C, Passarelli AL. Stimulation of cell motility by a viral fibroblast growth factor homolog: proposal for a role in viral pathogenesis. Virology. 2005;336(2):308–17.

    Article  CAS  Google Scholar 

  44. Detvisitsakun C, Cain EL, Passarelli AL. The Autographa californica M nucleopolyhedrovirus fibroblast growth factor accelerates host mortality. Virology. 2007;365(1):70–8.

    Article  CAS  Google Scholar 

  45. Liu C, Li Z, Wu W, Li L, Yuan M, Pan L, Yang K, Pang Y. Autographa californica multiple nucleopolyhedrovirus ac53 plays a role in nucleocapsid assembly. Virology. 2008;382(1):59–68.

    Article  CAS  Google Scholar 

  46. Ke J, Wang J, Deng R, Wang X. Autographa californica multiple nucleopolyhedrovirus ac66 is required for the efficient egress of nucleocapsids from the nucleus, general synthesis of preoccluded virions and occlusion body formation. Virology. 2008;374(2):421–31.

    Article  CAS  Google Scholar 

  47. Mikhailov VS, Rohrmann GF. Baculovirus replication factor LEF-1 is a DNA primase. J Virol. 2002;76(5):2287–97.

    Article  CAS  Google Scholar 

  48. Lu A, Miller LK. The roles of eighteen baculovirus late expression factor genes in transcription and DNA replication. J Virol. 1995;69(2):975–82.

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Krakauer DC. Stability and evolution of overlapping genes. Evolution. 2000;54(3):731–9.

    Article  CAS  Google Scholar 

Download references


We also thank Dr. Renato Arcangelo Pegoraro (EPAGRI-SC), Dr. Murilo Fazolin (Embrapa Acre), and Dr. Orlando Shigueo Ohashi (UFRA) for providing the viral samples.


This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grant numbers 407908/2013–7 and 483677/2012–4), Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF, grant number 193.001.532/2016 and EMBRAPA Recursos Genéticos e Biotecnologia. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Availability of data and materials

The genomes sequenced in this work are available in GenBank under the accession numbers: KX859079, KX859080, KX859081, KX859082, KX859083, and KX859084.

Author information

Authors and Affiliations



Conceived and designed the experiments: MLS, WS; Performed the experiments: MLS, WS; Analyzed the data: AFB, DMPAA, FLM; Contributed reagents/materials/analysis tools: BMR, MLS; Wrote the original manuscript draft: AFB, MLS; Reviewed and edited the manuscript: AFB, BMR, DMPAA, FLM, MLS. All authors read and approved the final manuscript.

Corresponding author

Correspondence to B. M. Ribeiro.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Showing a list of drs (direct repeats) observed in isolates of ErelGV. CNVs stand for the total of ‘Copy Number Variants’ found for each repetitive region. Values assigned to each dr correspond to the number of repeat units. (PDF 1385 kb)

Additional file 2:

Showing statistics of the sequencing libraries generated in this study. (PDF 1377 kb)

Additional file 3:

Showing the crrelations between non-synonymous substitutions and gene lengths. In these plots, each dot represents a gene depicted in the Fig. 3 of the main manuscript. A) Intra-isolate diversity of ErelGV-86 genes. B) ErelGV-94. C) ErelGV-98. D) ErelGV-99. E) ErelGV-00. F) ErelGV-AC. G) ErelGV-PA. As shown, the number of NSS per base pair (× 10− 3) and gene length (bp) have low to moderate correlation. The grey area corresponds to the 95% confidence interval, with highly conserved/diverse genes shown as outliers. (PDF 612 kb)

Additional file 4:

Showing a phylogram of Betabaculovirus and ErelGV isolates. The phylogeny was inferred using concatenated amino acid sequences of baculovirus core genes. ErelGV isolates are highlighted in grey. The baculoviral genera Alphabaculovirus (α = AcMNPV); and Gammabaculovirus (γ = NeabNPV); Deltabaculovirus (δ = CuniNPV) were included as outgroups, and the latter was used as root. Bootstrap values are indicated for each interior branch. (PDF 1353 kb)

Additional file 5:

Showing a maximum likelihood tree of South American ErelGV isolates from Brazil (red dots) and Colombia (ErelGV-M34) inferred using a concatenated alignment of partial sequences of granulin, lef-9 and lef-8. (PDF 1358 kb)

Additional file 6:

Showing the three paralogs (fgf-1, − 2, and − 3) encoded by ErelGV isolates. As observed, fgfs have different lengths (here shown in bp), and most of their polymorphisms are located in the C-terminal, and not in the region encoding the FGF domain. Their identities are low (20–33%), and mostly restricted to their central regions, responsible by encoding their main functional domain. Additional studies would be relevant to understand the roles of such proteins on ErelGV infection. (PDF 1408 kb)

Additional file 7:

Showing polymorphisms in lef genes. Non-synonymous mutations in lef-1 (A) and lef-2 (B) were mainly found on the C-terminal of the proteins they encode. For LEF-10, polymorphisms were found mainly on it N-terminal (C). The exception is a single non-synonymous substitution at a region overlapping with vp1054, which in turn causes a synonymous change (CTG → CTA) at the 5′ end of the latter. (PDF 1357 kb)

Additional file 8:

Showing quantitative data on the genetic diversity of ErelGV isolates, and their consensus genome (combined diversity). Core genes are highlighted in bold. NSS = Number of non-synonimous substitutions. NSS/bp is shown as a factor of 10− 3. AUX = Auxiliary; MOD = Host modulation; REP = Replication; STR = Structural; TRA = Transcription; UNK = Unknown function. (PDF 1545 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brito, A.F., Melo, F.L., Ardisson-Araújo, D.M.P. et al. Genome-wide diversity in temporal and regional populations of the betabaculovirus Erinnyis ello granulovirus (ErelGV). BMC Genomics 19, 698 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: