- Research article
- Open Access
De novo profiling of RNA viruses in Anopheles malaria vector mosquitoes from forest ecological zones in Senegal and Cambodia
BMC Genomicsvolume 20, Article number: 664 (2019)
The Correction to this article has been published in BMC Genomics 2019 20:698
Mosquitoes are colonized by a large but mostly uncharacterized natural virome of RNA viruses, and the composition and distribution of the natural RNA virome may influence the biology and immunity of Anopheles malaria vector populations.
Anopheles mosquitoes were sampled in malaria endemic forest village sites in Senegal and Cambodia, including Anopheles funestus, Anopheles gambiae group sp., and Anopheles coustani in Senegal, and Anopheles hyrcanus group sp., Anopheles maculatus group sp., and Anopheles dirus in Cambodia. The most frequent mosquito species sampled at both study sites are human malaria vectors. Small and long RNA sequences were depleted of mosquito host sequences, de novo assembled and clustered to yield non-redundant contigs longer than 500 nucleotides. Analysis of the assemblies by sequence similarity to known virus families yielded 115 novel virus sequences, and evidence supports a functional status for at least 86 of the novel viral contigs. Important monophyletic virus clades in the Bunyavirales and Mononegavirales orders were found in these Anopheles from Africa and Asia. The remaining non-host RNA assemblies that were unclassified by sequence similarity to known viruses were clustered by small RNA profiles, and 39 high-quality independent contigs strongly matched a pattern of classic RNAi processing of viral replication intermediates, suggesting they are entirely undescribed viruses. One thousand five hundred sixty-six additional high-quality unclassified contigs matched a pattern consistent with Piwi-interacting RNAs (piRNAs), suggesting that strand-biased piRNAs are generated from the natural virome in Anopheles. To functionally query piRNA effect, we analyzed piRNA expression in Anopheles coluzzii after infection with O’nyong nyong virus (family Togaviridae), and identified two piRNAs that appear to display specifically altered abundance upon arbovirus infection.
Anopheles vectors of human malaria in Africa and Asia are ubiquitously colonized by RNA viruses, some of which are monophyletic but clearly diverged from other arthropod viruses. The interplay between small RNA pathways, immunity, and the virome may represent part of the homeostatic mechanism maintaining virome members in a commensal or nonpathogenic state, and could potentially influence vector competence.
Anopheles mosquitoes are the only vectors of human malaria, which kills at least 400,000 persons and causes 200 million cases per year, with the greatest impact concentrated in sub-Saharan Africa and South-East Asia . In addition to malaria, Anopheles mosquitoes also transmit the alphavirus O’nyong nyong (ONNV, family Togaviridae), which is the only arbovirus known to employ Anopheles mosquitoes as the primary vector [2, 3]. A recent review found reports of at least 51 viruses naturally associated with Anopheles , and Anopheles mosquitoes harbor a diverse natural virome of RNA viruses [4,5,6,7]. These viruses are comprised mainly of insect specific viruses (ISVs) thought to multiply only in insects, but also includes relatives of arboviruses that can replicate in both insects and vertebrate cells.
Anopheles viruses have been discovered by isolation from cultured cells exposed to mosquito extract, serology, specific amplification and sequencing, and more recently, deep sequencing and de novo assembly . Although this work has increased the number of ISVs discovered in Anopheles, there are probably many still unknown. Because Anopheles mosquitoes are not thought to be important arbovirus vectors, relatively little research has been done on their viruses. In contrast, culicine mosquitoes in the genera Aedes and Culex transmit multiple arboviruses such as dengue virus (DENV, family Flaviviridae) Zika virus (ZIKV, family Flaviviridae), chikungunya virus (CHIKV, family Togaviridae) and others, but do not transmit human malaria.
Here, we assembled small and long RNA sequences from wild Anopheles mosquitoes captured in forest ecologies in central and northern Cambodia and eastern Senegal. The collection sites are considered disease emergence zones, with high levels of fevers and encephalopathies of unknown origin.
It is important to study Anopheles viruses because persistent exposure to ISVs, rather than the relatively infrequent exposure to arboviruses such as ONNV, has probably been the main evolutionary pressure shaping Anopheles antiviral immunity. Anopheles resistance mechanisms against arbovirus infection may be particularly efficient, given their low level of known arbovirus transmission despite highly anthropophilic feeding behavior, including on viremic hosts. Nevertheless, ONNV transmission indicates that arbovirus transmission by Anopheles is at least possible, so it is worth asking why transmission by Anopheles is apparently restricted to a single known virus. Identifying the complement of natural viruses inhabiting the Anopheles niche will help clarify the biology underlying the apparent inefficiency of arbovirus transmission by Anopheles, and may suggest new translational tools to decrease arbovirus transmission by the more efficient Aedes and Culex vectors.
Mosquito species estimation
Metagenomic sequencing of long and small fractions of RNA was carried out for four biological replicate pools of mosquitoes from Ratanakiri and Kampong Chhnang provinces in central and northern Cambodia near the border with Laos, and four replicate pools from Kedougou in eastern Senegal near the border with the Republic of Guinea (Conakry). Mosquito species composition of sample pools was estimated using sequences of transcripts from the mitochondrial cytochrome c oxidase subunit 1 (COI) gene, which were compared with Anopheles sequences from the Barcode of Life COI-5P database (Fig. 1, Additional file 1: Table S1). In the Senegal samples, the most frequent mosquito species were Anopheles rufipes, Anopheles funestus, Anopheles gambiae group sp., and Anopheles coustani, which are all human malaria vectors, including the recently incriminated An. rufipes . In the Cambodia samples, the most frequent species were Anopheles hyrcanus group sp., Anopheles maculatus group sp., Anopheles karwari, Anopheles jeyporeisis, Anopheles aconitus and Anopheles dirus. All are considered human malaria vectors [9,10,11,12]. Elevated rates of human blood-feeding by a mosquito species is a prerequisite for malaria vectorial capacity , and therefore the main Anopheles species sampled for virome discovery in this study display consistently high levels of human contact in nature. In addition, a number of rare mosquito species calls represent species supported by less than 100 sequence reads and 1% frequency in the sample pool (Additional file 1: Table S1). These rare calls could result from sequencing technical artifacts, mutations of COI, errors in the COI-5P database, and/or undescribed mosquito taxa not in the database.
Virus discovery by de novo assembly and classification by sequence similarity
Small and long RNAseq reads were de novo assembled after removal of mosquito sequences. Non-redundant contigs longer than 500 nucleotides from assemblies of both sampling sites, Cambodia and Senegal, were translated into predicted peptide sequences and used to search the GenBank protein sequence database using BLASTX with an e-value threshold of 1e-10. BLASTX translates a DNA sequence in all six putative reading frames and searches against a protein database to identify peptide homology matches. This analysis pipeline allowed identification of 115 novel assembled virus sequences, 37 from the Senegal samples (virus ID suffix “Dak”, Table 1), and 78 from the Cambodia samples (virus ID suffix “Camb”, Table 2), possibly pointing to higher viral diversity in mosquitoes from Cambodia. Some of the 115 virus sequences showed remote similarity by BLASTX to 24 reference viruses in GenBank that include single-stranded RNA (ssRNA)-negative strand viruses of the families Orthomyxoviridae, Rhabdoviridae and Phenuiviridae, ssRNA positive-strand viruses of the families Virgaviridae, Flaviviridae and Bromoviridae, dsRNA viruses of the family Reoviridae and multiple unclassified viruses of both ssRNA and dsRNA types (Table 3). Most of these remote similarities were with viruses characterized in a recent virus survey of 70 different arthropod species collected in China , which emphasizes the importance of high throughput surveys of the arthropod virosphere in the identification of viruses associated with different arthropod species.
In order to place these 115 novel virus assemblies in an evolutionary context, phylogenetic trees were constructed from predicted peptide sequences of conserved regions of the RNA-dependent RNA polymerase (RdRP) gene annotated in the 115 virus sequences, along with RdRP peptide sequences of related virus sequences from GenBank. This allowed the placement of 44 of the 115 assembled viruses in phylogenetic trees, revealing clusters of highly related viruses in the wild Anopheles. Notable examples include five novel virus assemblies from Cambodian Anopheles placed near Wuhan Mosquito Virus 1 in a monophyletic group of the Phasmaviridae family (Bunyavirales) (Fig. 2). In addition, within the order Mononegavirales, 14 novel Anopheles virus assemblies (7 from Cambodia and 7 from Senegal) formed a monophyletic group that includes Xincheng Mosquito Virus and Shungao Fly Virus. Finally, 10 novel virus assemblies (9 from Cambodia, 1 from Senegal) formed a monophyletic group that includes Beaumont Virus and a virus from Culex tritaeniorhynchus within the Rhabdoviridae family (Fig. 3a).
We characterized the degree of completeness of the virus assemblies to determine whether they contain full or almost full viral genome sequences, and whether predicted peptides are likely to be functional. Many of the viral contigs are too diverged from each other and from reference viruses in the phylogenetic tree to align informatively at the nucleotide level, and reliable sequence comparisons were only possible at the peptide sequence level. We translated nucleotide sequences for the novel viral contigs and compared them to the translated nucleotide database using TBLASTX. An example of this analysis is shown for viral contigs homologous to Xincheng virus (Fig. 3b). Closely-related viral contigs (for example RdRP1.7_XinchengVirDak and RdRP1.3_XinchengVirDak), with on average 95% nucleotide identity over the full contig length, can thus be compared with the more divergent viral contigs such as RdRP1.7_XinchengVirDak, which does not align at the nucleotide level with the first two, but does align when translated to peptide sequences.
This combined nucleotide and peptide-based analysis was applied to the 115 novel viral contigs. A total of 195 open reading frames (ORFs) were annotated among the 115 viral contigs, an average of 1.7 ORFs per viral contig (Additional file 2: Table S2 and Additional file 3: Table S3). Based on TBLASTX alignments with the closest reference viral genomes, 56 of the 195 ORFs, found in 25 of the viral contigs, appear to be fragmented or frameshifted ORF sequences potentially associated with pseudogenes, as compared to the complete gene present in the homologous reference virus, indicating a possible non-functional status for these 25 contigs. Four additional viral contigs contained small ORFs as compared to the cognate gene in the closest annotated reference viral genomes. In contrast, 67 of the 195 ORFs were complete, and 68 ORFs were partial for technical reasons, because of fragmented viral assemblies that do not cover the entire viral gene (Additional file 3: Table S3).
Thus, peptide comparisons with reference viral genomes provided evidence supporting a functional status for 86 of the 115 novel viral contigs, while 29 of the contigs displayed a potential non-functional status. The source of these latter 29 viral contigs is unknown, but they display equivalent sequence representation and assembly quality as the 86 contigs. They most likely represent functional viruses that engage in programmed ribosomal frameshifting or transcriptional slippage [15, 16], which has been reported for at least flavivirus and alphavirus ISVs [17, 18]. For these otherwise high-quality viral contigs with frameshifts or short ORFs, further work would be necessary to distinguish between hypotheses of transcriptional slippage, ORFs under relaxed selection pressure, or technical error. However, overall we find high levels of collinearity and similarity among novel viruses at the protein level that are not necessarily matched by comparable levels of similarity at the nucleotide level. These comparisons revealed potential populations of closely related but diverged viruses colonizing Anopheles from widely separated geographic locations, in some cases with different degrees of divergence over the same genomic region.
Quantification of novel virus sequences in mosquito samples
In order to evaluate the prevalence of novel virus sequences across the analyzed mosquito samples, host-filtered small and long RNA reads were mapped over the 115 novel virus sequences identified by de novo sequence assembly. Based on long RNAseq reads, the abundance profiles of the 115 virus assemblies display a non-overlapping distribution across different sample pools of 5 or 10 mosquitoes per pool, and virus sequences can be localized to particular sample pools from the abundance profiles (Fig. 4, left panel). This probably indicates a patchy prevalence and abundance of the different viruses among individual mosquitoes, such that an individual mosquito highly infected with a given virus could potentially generate a strong signal for that virus in the sample pool. The sample pools from Cambodia share a higher fraction of common viruses, while there is less overlap in virus abundance distribution across sample pools from Senegal. The representation of virus distribution based on small RNA sequence reads displayed profiles broadly similar to the long RNA-based abundance distribution (Fig. 4, right panel). This observation is consistent with the expectation that small RNA representation is a signature of virus double-stranded RNA (dsRNA) processing by the mosquito RNA interference (RNAi) machinery , and was examined next.
Small RNA size profiling
The processing of virus sequences by small RNA pathways of the insect host generates diagnostic patterns of small RNA read sizes from different viruses. In order to evaluate this phenomenon we first imposed a threshold of at least 100 small RNA reads mapped to the viral contig, to assure reliable small RNA size profiling, and 82 of the 115 novel virus assemblies were retained for the analysis. Small RNA reads that mapped to each of the 82 virus assemblies were extracted, and their size distributions were normalized with a z-score transformation. This allowed comparison of the z-score profiles among virus assemblies by pairwise correlation analysis and hierarchical clustering. The relationship between the small RNA profiles of the different viruses could then be visualized as a heatmap. The results of this analysis revealed the presence of four major groups of virus sequences based on small RNA size profiles (Fig. 5). Cluster 1 consists of 7 virus assemblies generating small RNAs predominantly in the size range of 23–29 nt mapping over the positive, and to a lesser extent negative, strand. Cluster 2 includes 7 viruses, all from Senegal, and displays a similar size profile as viruses of Cluster 1 with reads in the 23–29 nt size range, but also with a higher frequency of 21 nt reads mapping over the positive and negative strands, emblematic of virus cleavage through the mosquito host RNAi pathway. Cluster 3 includes 15 viruses that exhibit the classic pattern of viral RNA processing by the host RNAi pathway, with reads predominantly of 21 nt in length mapping over virus positive and negative strands (small RNA size and coverage profiles for this Cluster shown in Additional file 4: Figure S1). Finally, Cluster 4 includes 52 viruses with small RNA size profiles dominated by reads of 23–29 nt mapping predominantly over the negative strand of virus sequences. Because of the strong strand bias of small RNAs observed, this pattern could correspond to degradation products of virus RNAs, although alternatively, there appears to be size enrichment in the 27–28 nt size peaks characteristic of PIWI-interacting RNAs (piRNAs), and we examine this possibility below using bioinformatic and functional analyses.
Viral origin of unclassified transcripts by small RNA size profiling
A major drawback of sequence similarity-based identification of novel viruses in de novo sequence assemblies is the dependence of detection upon existing records of close relatives in public databases. Aguiar et al. proposed that the small RNA size profiles of arthropod-derived viruses detected by sequence similarity could be used as signature to recruit unclassified contigs from de novo sequence assemblies of potential viral origin . We implemented this strategy in order to identify additional sequences of putative viral origin in the set of 2114 contigs left unclassified by sequence similarity searching but meeting the same quality criteria as the 115 contigs (non-redundant and > 500 nucleotides), and with at least 100 small RNA sequence reads.
Of these unclassified contigs, a likely viral origin is supported for 4 and 35 contigs that display strong association by small RNA profile with Cluster 2 and Cluster 3, respectively (Spearman correlation> 0.9, Additional file 5: Figure S2). These clusters display small RNA size profiles mapping to both genome strands, which are characteristic of classic RNAi processing of viral dsRNA replication intermediates. Thus, in addition to the 115 novel virus assemblies classified by sequence similarity to known viruses, 39 unclassified high-quality novel Anopheles virus assemblies were identified, without sequence similarity to identified viruses. Further work will be necessary to characterize the biology of these unclassified novel virus assemblies.
Of the other assemblies unclassified by sequence similarity, 1566 showed strong associations between their small RNA size profiles and the small RNA size profiles of virus contigs detected by sequence similarity (Spearman correlation> 0.9). Among these, the majority were associated with Cluster 4 virus assemblies (1219 unclassified contigs) and to less extent with Cluster 1 (309 unclassified contigs). Both clusters were characterized by a strong bias towards reads from a single strand (positive for Cluster 1 and negative for Cluster 4).
To evaluate how specific these latter profiles of 1219 and 309 contigs are for virus-related sequences, we designed a reconstruction control experiment using the same small RNA size profiling and clustering analysis as above, but instead using 669 RNA contigs known to map to the mosquito reference assembly, thus strictly of host origin. As above, contigs with at least 100 small RNA sequence reads were used. Five hundred sixty-one of these mosquito contigs could be grouped with small RNA size profiles of virus contigs (Spearman correlation> 0.9), most of them (98.21%) with Cluster 4 (78.6%) and Cluster 1 (19.6%) profiles.
Anopheles may produce piRNAs from the RNA virome
piRNAs are endogenous small noncoding RNAs that ensure genome stability by protecting it from invasive transposable elements such as retrotransposons and repetitive or selfish sequences . In addition, in Aedes mosquito cells, piRNAs can probably mediate responses to arboviruses or ISVs [20,21,22,23]. Anopheles mosquitoes express annotated piRNAs from genomic piRNA clusters [24, 25]. The small RNAs in Clusters 1 and 4 display a strand bias, and many somatic piRNAs also map to only one strand in Drosophila and other arthropods [20, 26]. Notably, many virus-related piRNAs in Aedes, which are largely ISV-derived, mainly map only to the virus strand antisense to the viral ORF .
In An. coluzzii, about half of annotated piRNAs display a strong or exclusive strand bias , which is a greater proportion of unidirectional piRNAs than Drosophila. Until the current study, Anopheles piRNAs have not previously been examined for relatedness to ISVs. Overall, these small RNA results are probably most consistent with an interpretation that RNA profile Cluster 1 and Cluster 4 detect strand-biased piRNAs derived from the natural ISV virome of wild Anopheles. On that interpretation, the above 561 contigs mapping to host that share the Cluster 1 and Cluster 4 RNA profiles are most likely also piRNAs, but instead derived from endogenous host RNA templates. Previous results showed that most An. coluzzii piRNAs are derived from long-terminal repeat retrotransposons and DNA transposable elements . Our current results add wild ISVs as a possible source of template for Anopheles piRNA production, and indicate that further work is warranted on Anopheles piRNA. Our results also suggest the possibility that piRNAs may be involved in Anopheles response to viruses, a phenomenon found for only Aedes among a wide range of arthropods , but Anopheles were not tested.
O’nyong nyong alphavirus infection influences expression of piRNAs in Anopheles coluzzii
The potential that Anopheles piRNAs could be involved in response or protection to virus infection has not been previously examined or reported to our knowledge. To examine this possibility, we challenged An. coluzzii mosquitoes with the alphavirus ONNV by feeding an infectious bloodmeal, and sequenced small RNAs expressed during the primary infection at 3 d post-bloodmeal. Mosquitoes fed a normal bloodmeal were used as the control condition.
The small RNAs were mapped to previously annotated An. coluzzii candidate piRNA genes located in 187 genomic piRNA clusters , and expression levels of the piRNA genes in response to ONNV infection were tested using Cuffdiff. The analysis detected 86 piRNA genes displaying differential abundance levels between ONNV infected mosquitoes and normal bloodmeal controls (Additional file 6: Table S4). Filtering these 86 genes on quality criteria of i) length of the contiguous region expressed in small RNA < 40 nt, characteristic of piRNA size, and ii) normalized read depth in the upper 10% for the most robust signals, highlighted just two annotated piRNA candidates, XLOC_012931 and XLOC_012762. Both candidate piRNAs displayed significantly lower abundance in small RNA after ONNV infection as compared to uninfected controls, suggesting that these two piRNAs were downregulated during ONNV infection (locus XLOC_012931, Cuffdiff test statistic = 3.23, p-value = 5e-5, adjusted p-value = 6.7e-3, reference genome sequence coordinates AgamP4:UNKN:19043685:19043716; and locus XLOC_012762, Cuffdiff test statistic = 2.39, p-value = 9.5e-4, adjusted p-value = 0.046, reference genome sequence coordinates AgamP4:UNKN:13088289:13088321).
Differential abundance was confirmed by quantifying small RNAs mapping to the two candidate piRNAs using the Integrative Genomics Viewer. By this test also, both candidate piRNAs displayed lower normalized read counts in ONNV infected samples as compared to uninfected controls (Fig. 6; locus XLOC_012931, Chi-squared = 77.36, df = 1, p-value< 2.2e-16; and locus XLOC_012762, Chi-squared = 75.78, df = 1, p-value< 2.2e-16).
The current study contributes to a growing body of work defining the deep diversity of the invertebrate virosphere [14, 27, 28]. Because mosquitoes transmit viral infections of humans and animals, there is particular interest in discovery of ISVs comprising the mosquito virome [6, 29,30,31]. Here, we sampled Anopheles mosquitoes from two zones of forest exploitation that are considered disease emergence zones with likely exposure of the human and domestic animal populations to sylvan pathogens. Using assembly quality criteria of non-redundant contigs at least 500 nt in length, we identified 115 novel RNA virus assemblies by sequence similarity to known virus families, and an additional 39 high-confidence virus assemblies that were unclassified by sequence similarity, but display characteristic products of RNAi processing of replication intermediates. Finally, 1566 unclassified contigs possessed comparable assembly quality, and lacked a strong RNAi processing signature, but displayed a signature consistent with piRNA origin. This latter group will require additional work to filter genuine virus-derived piRNA sequences, which have been previously reported in Aedes mosquitoes [20,21,22,23], from other potential sources of piRNAs such as retrotransposons and DNA transposable elements, as well as possible physical degradation.
Taken together, at least 115 novel and non-redundant virus assemblies, and possibly many more, were identified in wild Anopheles mosquitoes in the current report. Small and long RNAs were sequenced from pools of 5–10 mosquitoes. Pooled sample analysis obscures the distribution and abundance of viruses among individuals in the population. Individual mosquito analysis will become a research priority as sequencing costs drop, and is the best way to determine ISV distribution and prevalence. However, some insight on virus distribution can be gained from the comparison of sample pools collected from the same site, for example Senegal or Cambodia. The abundance heatmap shown in Fig. 4 indicates that virus diversity is high in the Anopheles population, while the distribution of particular viruses is relatively uneven when comparing across Anopheles sample pools. This suggests that the number of viruses per individual is probably also low, leading to a patchy distribution of particular viruses among individuals. This is consistent with observations in our laboratory from individual mosquito sequencing and de novo assembly, which typically identifies < 5 distinct viruses per individual. We cannot exclude the presence of contaminating environmental viruses in the sequence set, for example adhered to the adult mosquito cuticle. Nevertheless, the samples were all washed, and if present, environmental virus contaminants would likely be rare, and would have been discarded early in the assembly pipeline because they would not contribute enough sequence reads to generate long assemblies to meet the quality threshold.
The dynamics of the virome may thus be different from the bacterial microbiome, in which at least tens of taxa are typically present per individual, and microbial diversity is thought to lead to homeostasis or resilience of the microbiota as an ecosystem within the host [32, 33]. By comparison, very little is known about the function of the mosquito virome within the host. At least three important topics are worth exploring.
First, unlike the bacterial microbiota, the stability and resilience over time of the viral assemblage in an individual mosquito is unknown. Members of the virome could persist in individual host populations over time in commensal form, or the uneven and patchy viral distribution observed among sample pools could be a consequence of successive waves of epidemic infection peaks passing through local populations. The commensal or epidemic models would have distinct biological implications for the potential influence of the virome, including on host immunity and competence for transmission of pathogens.
Second, the individual and population-level effect of ISV carriage on vector competence for pathogen transmission is a key question. In the current study, the predominant host species sampled are Anopheles vectors of human malaria, and in Africa, these species are also known or likely vectors of ONNV. ISVs have not been tested for influence on Plasmodium or ONNV infection in Anopheles, to our knowledge. ISVs could affect host immunity and malaria susceptibility, or even cause temporary vector population crashes during a putative ISV epidemic. A similar concept may apply to ISV interactions with the mosquito host for arbovirus transmission . We identified relatives of Phasi Charoen-like virus (PCLV) in Anopheles from Senegal and Cambodia. PCLV relatives also infect Aedes, where they reduced the replication of ZIKV and DENV arboviruses . Palm Creek virus, an insect specific flavivirus, caused reduced replication of the West Nile virus and Murray Valley encephalitis arboviruses in Aedes cells . Clearly, ISV co-infection of mosquito vectors with Plasmodium and/or arboviruses in nature is probable, because all Anopheles sample pools in the current work were ISV-positive.
Third, characterization of the arthropod virome may shed light on the evolution of mosquito antiviral immune mechanisms, as well as the evolution of pathogenic arboviruses. ISV replication is restricted to insect cells, but the potential of most mosquito-associated viruses for transmission to humans or other vertebrates is currently unknown, because few studies of host range and transmission have been done. Some viruses may have a host range restricted to only Anopheles. For example, Anopheles cypovirus and Anopheles C virus replicate and are maintained by vertical transmission in An. coluzzii, but were not able to infect Ae. aegypti in exposure experiments, and infected Anopheles stephensi only transiently . Thus, Anopheles ISVs may display fine host restriction to genus or even to particular Anopheles species and not others.
It is likely that the main evolutionary pressure shaping mosquito antiviral mechanisms is their persistent exposure in nature to members of the natural virome, rather than the probably less frequent exposure to vertebrate-pathogenic arboviruses. Maintenance of bacterial microbiome commensals in the non-pathogenic commensal state requires active policing by basal host immunity . By analogy, the maintenance of persistent ISVs as non-pathogenic may also result from a dialog with host immunity. Presumably, the same antiviral mechanisms used in basal maintenance of ISVs are also deployed against arboviruses when encountered, which are not novel to the vector because they are often in the same families as members of the insect virome . Knowledge of the mechanisms that allow Anopheles to carry a natural RNA virome, but apparently reject arboviruses, may provide new tools to raise the barrier to arbovirus transmission by the more efficient Aedes and Culex vectors.
In addition to the canonical immune signaling pathways, piRNAs can be involved in antiviral protection, although this research is just beginning [22, 37]. One function of genomic piRNA clusters appears to be storage of a molecular archive of genomic threats such as transposable elements, linked to an effector mechanism to inactivate them. This is analogous to bacterial molecular memory mediated by the CRISPR/Cas system. We identified two candidate piRNA genes that appear to be downregulated upon ONNV infection in An. coluzzii. Involvement of piRNAs during viral infection has not been previously demonstrated in Anopheles. piRNA monitoring of the virome may be part of the normal basal management of ISVs to limit their pathogenicity if not controlled, and our current results suggest that piRNA dynamics may also be involved in host response to an arbovirus. Further work including specific piRNA silencing studies will be required to draw these connections.
The current report shows that the Anopheles virome is complex and diverse, and can be influenced by the geography of mosquito species. This is exemplified by the fact that some viruses are restricted to Anopheles in Senegal, and others in Cambodia. Similar results were seen in Ae. aegypti, where five ISVs were specific to the Australian host population, while six others were found only in the Thai host population . Differences in the Anopheles virome across geography could be explained by climate, environmental conditions, breeding sites, and mosquito bloodmeal sources, among other factors. The presence in this study of such a large number of novel and unclassified virus assemblies highlights the fact that the malaria vector virome is understudied. The same observation has been made during metagenomics surveys in Drosophila, Aedes and Culex [28, 39, 40] among other arthropods, indicating that the vast majority of insect viruses are not yet discovered.
Mosquitoes were collected in Cambodia in Kres village, Ratanakiri province (sample pools Cam5–02 and Cam10–02) and Cheav Rov village, Kampong Chhnang province (sample pools Cam5–01 and Cam10–01). The majority of inhabitants are engaged in forest-related activities (agriculture, logging and hunting) and may spend the night in forest plots during the harvest period. Vegetation varies from evergreen forest to scattered forest, and the dry season typically runs from November to May and the rainy season from June to October. In Senegal, sampling sites were located in the department of Kedougou in southeastern Senegal. Kedougou lies in a transition zone between dry tropical forest and the savanna belt, and includes the richest and most diverse fauna of Senegal. Recent arbovirus outbreaks include CHIKV in 2009–2010, yellow fever virus in 2011, Zika virus in 2010, and DENV in 2008–2009.
Permission to collect mosquitoes was obtained by Institut Pasteur Cambodia from authorities of Ratanakiri and Kampong Chhnang, and by Institut Pasteur Dakar from authorities of Kedougou. Wild mosquitoes visually identified as Anopheles spp. at the collection site (non-Anopheles were not retained) were immediately transferred into RNAlater stabilization reagent kept at 4 °C, and then returned to the laboratory and stored at − 80 °C until RNA extraction.
RNA extraction, library construction, and sequencing
Total RNA was extracted from four pools of mosquitoes from each of Senegal and Cambodia (Senegal sample pools: 5 mosquitoes, Dak5–03, Dak5–04, 10 mosquitoes, Dak10–03, Dak10–04; Cambodia sample pools: 5 mosquitoes, Cam5–01, Cam5–02, 10 mosquitoes, Cam10–01, Cam10–02) using the Nucleospin RNA kit (Macherey-Nagel) following the supplied protocol. Library preparation and sequencing steps were performed by Fasteris (Plan-les-Ouates, Switzerland, www.fasteris.com). Long RNA libraries from the eight mosquito pools were made from total RNA depleted of ribosomal RNA by treatment with RiboZero (Illumina, San Diego, CA). Libraries were multiplexed and sequenced on a single lane of the Illumina HiSeq 2500 platform (Illumina, San Diego, CA) by the paired-ends method (2 × 125 bp), generating on average 36 million high-quality read pairs per library. Small RNA libraries with insert size 18–30 nt were generated from the same eight mosquito pools as above, multiplexed and sequenced in duplicate (two technical replicates per pool) in two lanes of the Illumina HiSeq2500 platform (Illumina, San Diego, CA) by the single-end method (1 × 50 bp) generating on average 34 million reads of high-quality small RNA reads per library.
Pre-processing of long and small RNA libraries
Cutadapt 1.13  was used for quality filtering and adaptor trimming of reads from long and small RNA libraries. Low-quality 3′ ends of long RNA reads were trimmed by fixing a phred quality score of 15, and reads smaller than 50 bp after quality filtering and adaptor trimming were removed. In the case of small RNA libraries, reads shorter than 15 bp after quality filtering and adaptor trimming were removed.
In order to filter sequences originating in the mosquito host, sequences passing the above quality filter step were mapped against a custom database consisting of 24 Anopheles genomes available in Vectorbase in February 2016 . Bowtie 1.2.0  was used to map small RNA libraries with two mismatches allowed, whereas the BWA-MEM algorithm from BWA-0.7.12  with default parameters was used to map long RNA libraries. Sequence reads that did not map against Anopheles genomes, herein referred to as non-host processed reads, were retained and used for de novo assembly and subsequent binning of virus transcripts.
Estimation of Anopheles species composition of mosquito sample pools
Quality-filtered long RNA read pairs were mapped with SortMeRNA  against a custom database of Anopheles sequences of the mitochondrial cytochrome c oxidase subunit 1 gene (COI-5P database) extracted from the Barcode of Life database . 98% identity and 98% alignment coverage thresholds were fixed for the operational taxonomic unit calling step of SortMeRNA. Operational taxonomic unit counts were collapsed at species level and relative abundances of Anopheles species with at least 100 reads and 1% frequency in the sample pool were represented as pie charts using the ggplots2 R package.
De novo sequence assembly and identification of virus contigs by sequence similarity
Processed reads from each country (Cambodia and Senegal) were combined and de novo assembled using different strategies for long and small RNA libraries. Small RNA reads were assembled using the Velvet/Oases pipeline  using a range of k-mer values from 13 to 35. Long RNA reads were assembled using both the Velvet/Oases pipeline with a range of k-mer values from 11 to 67 and Trinity .
Contigs produced by parallel assembly of Cambodia and Senegal processed reads were filtered in order to remove trans-self chimeric sequences using custom shell scripts, and the resulting contigs were merged with cd-hit-est  (95% nucleotide identity over 90% alignment length) in order to generate a final set of non-redundant contig sequences. Non-redundant contigs longer than 500 nucleotides were compared against the GenBank protein sequence reference database using BLASTX  with an e-value threshold of 1e-10, and the results were imported into MEGAN6 in order to classify contigs taxonomically using the LCA algorithm . Contigs of viral origin were further subjected to manual curation by pairwise sequence alignments of nucleotide sequences using BLASTN, and of translated query sequences searched against the translated nucleotide database using TBLASTX and the Easyfig genome comparison tool  in order to remove redundancies not detected in previous steps. Sequence assemblies and annotations are available in Additional file 7: Classified Virus Sequences and Additional file 8: Unclassified Virus Sequences.
Structural and functional annotation of virus assemblies
Assembled contigs of viral origin were annotated as follows: ORFs were predicted with MetaGeneMark , and functionally annotated using Prokka  with Virus kingdom as primary core reference database for initial BLASTP searches and including also as reference Hidden Markov Models of virus protein families defined in vFam database . Also, protein sequences of predicted ORFs were processed with the Blast2GO pipeline , which generates functional annotation of proteins from BLASTP results against the virus subdivision of GenBank as well as Gene Ontology annotations from top BLASTP results. Prediction of InterPro signatures over viral proteins was also carried out with the InterProScan tool integrated in Blast2GO. The results of the different strategies of structural and functional annotation were integrated and manually curated with Artemis .
In order to place the new virus sequences characterized in the present study into an evolutionary context, the peptide sequences of RdRP ORFs detected in the annotation step were aligned with the corresponding homologs in reference positive-sense and negative-sense single-strand RNA viruses (ssRNA) and double strand RNA (dsRNA) viruses using MAFFT v7.055b with the E-INS-i algorithm . Independent alignments were generated for all ssRNA and dsRNA viruses and for different virus families (Bunya-Arenavirus, Monenegavirus, Orthomyxovivirus, Flavivirus, Reovirus). The resulting alignments were trimmed with TrimAI  in order to remove highly variable positions, keeping the most conserved domains for phylogenetic reconstruction. Phylogenetic trees were reconstructed by maximum likelihood with RAxML  with the WAG+GAMMA model of amino acid substitution and 100 bootstrap replicates. Phylogenetic trees were visualized with the R package Ape .
Prediction of unclassified contigs of viral origin by small RNA size profiling
In order to recruit contigs of potential viral origin from the pool of unclassified transcripts, we use the approach of Aguiar . This approach uses the size profiles of small RNA reads that maps over positive and negative strands of viruses detected by sequence similarity as a signature to identify unclassified transcripts by sequence similarity of potential viral origin. For this purpose, processed small RNA reads were re-mapped over virus contigs and unclassified contigs by sequence similarity using bowtie 1.2.0  allowing at most one mismatch and retaining only those contigs with at least 100 small RNA reads mapped. From the mapped small RNA reads over each contig, the small RNA size profiles were defined as the frequency of each small RNA read of size from 15 to 35 nucleotides that map over the positive and negative strand of the reference sequence. To compute these small RNA size profiles, reads mapped over positive and negative strands of each reference sequence were extracted with Samtools , and the size of small RNA reads were computed with the Infoseq program of the EMBOSS package . Custom shell scripts were used to parse Infoseq output to a matrix representing the frequency of reads of different sizes and polarity across virus/unclassified contigs. This matrix was further processed in R (version 3.3.2). In order to normalize the small RNA size profiles, a z-score transformation is applied over the read frequencies of each contig (virus/unclassified). The similarity between small RNA size profiles of virus and unclassified contigs is computed as the Pearson correlation coefficient of the corresponding z-score profiles, and the relationship between small RNA size profiles of virus/unclassified contigs was defined from this similarity values using UPGMA as linkage criterion with the R package Phangorn . These relationships were visualized as heatmaps of the z-score profiles in R with gplots package (version 3.0.1) using the UPGMA dendrogram as the clustering pattern of virus/unclassified sequences. Unclassified contigs with a Pearson correlation coefficient of at least 0.9 with virus contigs and coming from the same mosquito sample pool were regrouped into clusters.
ONNV infection and candidate piRNA gene regulation
Infection of An. coluzzii with ONNV, library preparations, and sequencing were described . Briefly, small RNA sequence reads from 2 biological replicate pools of 12 mosquitoes each fed an ONNV-infected bloodmeal (unfed mosquitoes removed), and 2 replicate control pools of 12 mosquitoes each fed an uninfected normal bloodmeal were mapped to the An. gambiae PEST AgamP4 genome assembly using STAR version 2.5 with default parameters . The resulting SAM files were analyzed using the Cuffdiff function in Cufflinks version 2.2.1 to test for differential abundance of small RNAs mapping to candidate piRNA genes, as compared between ONNV infected and control uninfected samples. This analysis yielded 86 candidate piRNA genes that were differentially represented in the small RNA sequences between the ONNV and control treatment conditions (Additional file 6: Table S4). The candidate piRNA genes used were previously described in 187 genomic piRNA clusters, and are listed in the annotation file, GOL21-bonafide-piRNAs-24-29 nt.fastq (from , publicly available from Figshare at doi https://doi.org/10.6084/m9.figshare.7308518). The piRNAs of An. coluzzii were designated in  as either novel genes (denoted XLOC loci), and as piRNAs produced from within existing genes of the PEST genome assembly (denoted AGAP loci).
Independent confirmation of the Cuffdiff analysis was obtained using BAM and BAI indices generated using Bowtie 2 version 2.3.0 from the above small RNA sequence files of ONNV infected and uninfected samples. These generated files were analyzed with the An. gambiae PEST AgamP4 genome assembly in the Integrative Genomics Viewer version 2.5 . We quantified the small RNA sequence reads mapping to the piRNA gene candidates, XLOC_012931 and XLOC_012762, identified as differentially expressed by the Cuffdiff analysis. Mapped reads to each piRNA candidate gene were normalized using the library size of each sequence. Graphpad Prism 7 was used to create graphs from normalized reads, and statistical tests were performed using R version 3.5.2 .
Availability of data and materials
All sequence files are available from the EBI European Nucleotide Archive database (http://www.ebi.ac.uk/ena/) under project accession PRJEB29993 (datasets ERR3275139-ERR3275154 for small RNA libraries; datasets ERR2939203- ERR2939210 for long RNA libraries). Sequence assemblies and annotations for the 115 novel virus contigs are available in Additional file 7: Classified Virus Sequences. Sequence assemblies of 39 unclassified viruses are available in Additional file 8: Unclassified Virus Sequences.
Cytochrome c oxidase subunit 1
Double strand RNA
Insect specific virus
Open reading frame
Phasi Charoen-like virus
RNA-dependent RNA polymerase
Reads per kilobase per million reads
Small interacting RNA
World Health Organization. World Malaria Report 2017. Geneva: World Health Organization; 2017.
Nanfack Minkeu F, Vernick KD. A systematic review of the natural virome of anopheles mosquitoes. Viruses. 2018;10(5):222.
Rezza G, Chen R, Weaver SC. O'nyong-nyong fever: a neglected mosquito-borne viral disease. Pathog Glob Health. 2017;111(6):271–5.
Carissimo G, Eiglmeier K, Reveillaud J, Holm I, Diallo M, Diallo D, Vantaux A, Kim S, Menard D, Siv S, et al. Identification and characterization of two novel RNA viruses from anopheles gambiae species complex mosquitoes. PLoS One. 2016;11(5):e0153881.
Colmant AMG, Hobson-Peters J, Bielefeldt-Ohmann H, van den Hurk AF, Hall-Mendelin S, Chow WK, Johansen CA, Fros J, Simmonds P, Watterson D, et al. A new clade of insect-specific flaviviruses from Australian anopheles mosquitoes displays species-specific host restriction. mSphere. 2017;2(4):1–19.
Fauver JR, Grubaugh ND, Krajacich BJ, Weger-Lucarelli J, Lakin SM, Fakoli LS 3rd, Bolay FK, Diclaro JW 2nd, Dabire KR, Foy BD, et al. West African Anopheles gambiae mosquitoes harbor a taxonomically diverse virome including new insect-specific flaviviruses, mononegaviruses, and totiviruses. Virology. 2016;498:288–99.
Villinger J, Mbaya MK, Ouso D, Kipanga PN, Lutomiah J, Masiga DK. Arbovirus and insect-specific virus discovery in Kenya by novel six genera multiplex high-resolution melting analysis. Mol Ecol Resour. 2017;17(3):466–80.
Tabue RN, Awono-Ambene P, Etang J, Atangana J, C AN, Toto JC, Patchoke S, Leke RG, Fondjo E, Mnzava AP, et al. Role of Anopheles (Cellia) rufipes (Gough, 1910) and other local anophelines in human malaria transmission in the northern savannah of Cameroon: a cross-sectional survey. Parasit Vectors. 2017;10(1):22.
Alam MS, Khan MG, Chaudhury N, Deloer S, Nazib F, Bangali AM, Haque R. Prevalence of anopheline species and their Plasmodium infection status in epidemic-prone border areas of Bangladesh. Malar J. 2010;9:15.
Durnez L, Mao S, Denis L, Roelants P, Sochantha T, Coosemans M. Outdoor malaria transmission in forested villages of Cambodia. Malar J. 2013;12:329.
Marasri N, Overgaard HJ, Sumarnrote A, Thanispong K, Corbel V, Chareonviriyaphap T. Abundance and distribution of Anopheles mosquitoes in a malaria endemic area along the Thai-Lao border. J Vector Ecol. 2017;42(2):325–34.
St Laurent B, Oy K, Miller B, Gasteiger EB, Lee E, Sovannaroth S, Gwadz RW, Anderson JM, Fairhurst RM. Cow-baited tents are highly effective in sampling diverse Anopheles malaria vectors in Cambodia. Malar J. 2016;15(1):440.
Service MW. Mosquito ecology: field sampling methods. 2nd ed. London: Chapman & Hall; 1993.
Li CX, Shi M, Tian JH, Lin XD, Kang YJ, Chen LJ, Qin XC, Xu J, Holmes EC, Zhang YZ. Unprecedented genomic diversity of RNA viruses in arthropods reveals the ancestry of negative-sense RNA viruses. Elife. 2015;4:1–26.
Hagiwara-Komoda Y, Choi SH, Sato M, Atsumi G, Abe J, Fukuda J, Honjo MN, Nagano AJ, Komoda K, Nakahara KS, et al. Truncated yet functional viral protein produced via RNA polymerase slippage implies underestimated coding capacity of RNA viruses. Sci Rep. 2016;6:21411.
Olspert A, Chung BY, Atkins JF, Carr JP, Firth AE. Transcriptional slippage in the positive-sense RNA virus family Potyviridae. EMBO Rep. 2015;16(8):995–1004.
Chung BY, Firth AE, Atkins JF. Frameshifting in alphaviruses: a diversity of 3′ stimulatory structures. J Mol Biol. 2010;397(2):448–56.
Firth AE, Blitvich BJ, Wills NM, Miller CL, Atkins JF. Evidence for ribosomal frameshifting and a novel overlapping gene in the genomes of insect-specific flaviviruses. Virology. 2010;399(1):153–66.
Aguiar ER, Olmo RP, Paro S, Ferreira FV, de Faria IJ, Todjro YM, Lobo FP, Kroon EG, Meignin C, Gatherer D, et al. Sequence-independent characterization of viruses based on the pattern of viral small RNAs produced by the host. Nucleic Acids Res. 2015;43(13):6191–206.
Lewis SH, Quarles KA, Yang Y, Tanguy M, Frezal L, Smith SA, Sharma PP, Cordaux R, Gilbert C, Giraud I, et al. Pan-arthropod analysis reveals somatic piRNAs as an ancestral defence against transposable elements. Nat Ecol Evol. 2018;2(1):174–81.
Leger P, Lara E, Jagla B, Sismeiro O, Mansuroglu Z, Coppee JY, Bonnefoy E, Bouloy M. Dicer-2- and Piwi-mediated RNA interference in Rift Valley fever virus-infected mosquito cells. J Virol. 2013;87(3):1631–48.
Palatini U, Miesen P, Carballar-Lejarazu R, Ometto L, Rizzo E, Tu Z, van Rij RP, Bonizzoni M. Comparative genomics shows that viral integrations are abundant and express piRNAs in the arboviral vectors Aedes aegypti and Aedes albopictus. BMC Genomics. 2017;18(1):512.
Morazzani EM, Wiley MR, Murreddu MG, Adelman ZN, Myles KM. Production of virus-derived ping-pong-dependent piRNA-like small RNAs in the mosquito soma. PLoS Pathog. 2012;8(1):e1002470.
Castellano L, Rizzi E, Krell J, Di Cristina M, Galizi R, Mori A, Tam J, De Bellis G, Stebbing J, Crisanti A, et al. The germline of the malaria mosquito produces abundant miRNAs, endo-siRNAs, piRNAs and 29-nt small RNAs. BMC Genomics. 2015;16:100.
George P, Jensen S, Pogorelcnik R, Lee J, Xing Y, Brasset E, Vaury C, Sharakhov IV. Increased production of piRNAs from euchromatic clusters and genes in Anopheles gambiae compared with Drosophila melanogaster. Epigenetics Chromatin. 2015;8:50.
Li C, Vagin VV, Lee S, Xu J, Ma S, Xi H, Seitz H, Horwich MD, Syrzycka M, Honda BM, et al. Collapse of germline piRNAs in the absence of Argonaute3 reveals somatic piRNAs in flies. Cell. 2009;137(3):509–21.
Shi M, Lin XD, Tian JH, Chen LJ, Chen X, Li CX, Qin XC, Li J, Cao JP, Eden JS, et al. Redefining the invertebrate RNA virosphere. Nature. 2016;540:539.
Webster CL, Waldron FM, Robertson S, Crowson D, Ferrari G, Quintana JF, Brouqui JM, Bayne EH, Longdon B, Buck AH, et al. The discovery, distribution, and evolution of viruses associated with Drosophila melanogaster. PLoS Biol. 2015;13(7):e1002210.
Cook S, Chung BY, Bass D, Moureau G, Tang S, McAlister E, Culverwell CL, Glucksman E, Wang H, Brown TD, et al. Novel virus discovery and genome reconstruction from field RNA samples reveals highly divergent viruses in dipteran hosts. PLoS One. 2013;8(11):e80720.
Hall RA, Bielefeldt-Ohmann H, McLean BJ, O'Brien CA, Colmant AM, Piyasena TB, Harrison JJ, Newton ND, Barnard RT, Prow NA, et al. Commensal viruses of mosquitoes: host restriction, transmission, and interaction with Arboviral pathogens. Evol Bioinformatics Online. 2016;12(Suppl 2):35–44.
Huhtamo E, Cook S, Moureau G, Uzcategui NY, Sironen T, Kuivanen S, Putkuri N, Kurkela S, Harbach RE, Firth AE, et al. Novel flaviviruses from mosquitoes: mosquito-specific evolutionary lineages within the phylogenetic group of mosquito-borne flaviviruses. Virology. 2014;464-465:320–9.
Broderick NA, Buchon N, Lemaitre B. Microbiota-induced changes in drosophila melanogaster host gene expression and gut morphology. MBio. 2014;5(3):e01117–4.
Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature. 2012;489(7415):220–30.
Schultz MJ, Frydman HM, Connor JH. Dual insect specific virus infection limits arbovirus replication in Aedes mosquito cells. Virology. 2018;518:406–13.
Hobson-Peters J, Yam AW, Lu JW, Setoh YX, May FJ, Kurucz N, Walsh S, Prow NA, Davis SS, Weir R, et al. A new insect-specific flavivirus from northern Australia suppresses replication of West Nile virus and Murray Valley encephalitis virus in co-infected mosquito cells. PLoS One. 2013;8(2):e56534.
Buchon N, Broderick NA, Lemaitre B. Gut homeostasis in a microbial world: insights from Drosophila melanogaster. Nat Rev Microbiol. 2013;11(9):615–26.
Whitfield ZJ, Dolan PT, Kunitomi M, Tassetto M, Seetin MG, Oh S, Heiner C, Paxinos E, Andino R. The diversity, structure, and function of heritable adaptive immunity sequences in the Aedes aegypti genome. Curr Biol. 2017;27(22):3511–3519 e3517.
Zakrzewski M, Rasic G, Darbro J, Krause L, Poo YS, Filipovic I, Parry R, Asgari S, Devine G, Suhrbier A. Mapping the virome in wild-caught Aedes aegypti from Cairns and Bangkok. Sci Rep. 2018;8(1):4690.
Frey KG, Biser T, Hamilton T, Santos CJ, Pimentel G, Mokashi VP, Bishop-Lilly KA. Bioinformatic characterization of mosquito Viromes within the eastern United States and Puerto Rico: discovery of novel viruses. Evol Bioinformatics Online. 2016;12(Suppl 2):1–12.
Atoni E, Wang Y, Karungu S, Waruhiu C, Zohaib A, Obanda V, Agwanda B, Mutua M, Xia H, Yuan Z. Metagenomic virome analysis of Culex mosquitoes from Kenya and China. Viruses. 2018;10(1):30.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnetjournal. 2011;17(1):3.
Giraldo-Calderon GI, Emrich SJ, MacCallum RM, Maslen G, Dialynas E, Topalis P, Ho N, Gesing S, VectorBase C, Madey G, et al. VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic Acids Res. 2015;43(Database issue):D707–13.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25.
Li H: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:13033997 2013.
Kopylova E, Noe L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. 2012;28(24):3211–7.
Ratnasingham S, Hebert PD. Bold: the barcode of life data system. Mol Ecol Notes. 2007;7(3):355–64 http://www.barcodinglife.org.
Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012;28(8):1086–92.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29(7):644–52.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.
Huson DH, Beier S, Flade I, Gorska A, El-Hadidi M, Mitra S, Ruscheweyh HJ, Tappu R. MEGAN Community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput Biol. 2016;12(6):e1004957.
Sullivan MJ, Petty NK, Beatson SA. Easyfig: a genome comparison visualizer. Bioinformatics. 2011;27(7):1009–10.
Zhu W, Lomsadze A, Borodovsky M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 2010;38(12):e132.
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068–9.
Skewes-Cox P, Sharpton TJ, Pollard KS, DeRisi JL. Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One. 2014;9(8):e105067.
Gotz S, Garcia-Gomez JM, Terol J, Williams TD, Nagaraj SH, Nueda MJ, Robles M, Talon M, Dopazo J, Conesa A. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res. 2008;36(10):3420–35.
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B. Artemis: sequence visualization and annotation. Bioinformatics. 2000;16(10):944–5.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.
Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15):1972–3.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Paradis E, Claude J, Strimmer K. APE: analyses of Phylogenetics and evolution in R language. Bioinformatics. 2004;20(2):289–90.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. Genome project data processing S: the sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.
Rice P, Longden I, Bleasby A. EMBOSS: the European molecular biology open software suite. Trends Genet. 2000;16(6):276–7.
Schliep KP. Phangorn: phylogenetic analysis in R. Bioinformatics. 2011;27(4):592–3.
Carissimo G, Pain A, Belda E, Vernick KD. Highly focused transcriptional response of Anopheles coluzzii to O'nyong nyong arbovirus during the primary midgut infection. BMC Genomics. 2018;19(1):526.
Dobin A, Gingeras TR. Mapping RNA-seq reads with STAR. Curr Protoc Bioinformatics. 2015;51:11 14 11–9.
Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92.
R: A language and environment for statistical computing. http://www.R-project.org/.
We acknowledge the assistance of Allan Dickerman, Jiyoung Lee and Song Li, Virginia Polytechnic Institute and State University, and of Silke Jensen, Clermont Université, Clermont-Ferrand, France, for helpful advice and assistance on piRNA analysis.
This work received financial support to KDV from the European Commission, Horizon 2020 Infrastructures #731060 Infravec2; European Research Council, Support for frontier research, Advanced Grant #323173 AnoPath; French National Research Agency #ANR-PRC2019-ArboVec, and French Laboratoire d’Excellence “Integrative Biology of Emerging Infectious Diseases” #ANR-10-LABX-62-IBEID, and support to IVS from USDA National Institute of Food and Agriculture Hatch project #223822. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
There were no human or animal subjects. Permission to collect wild mosquitoes was obtained by Institut Pasteur Cambodia from authorities of Ratanakiri and Kampong Chhnang, Cambodia; and by Institut Pasteur Dakar from authorities of Kedougou, Senegal.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Anopheles mosquito taxa represented in the collections from Senegal and Cambodia, as detected by comparison to Anopheles sequences from the Barcode of Life COI-5P database. Data corresponds to pie charts of Anopheles taxa by country and sample pool depicted in Fig. 1. (XLSX 29 kb)
Table S2. List of 115 assembled virus contigs displaying sequence similarity to known virus families. (XLSX 11 kb)
Table S3. Predicted open reading frames, peptide annotation and functional status of 115 assembled virus contigs displaying sequence similarity to known virus families. (XLSX 56 kb)
Figure S1. Small RNA size profiles (A) and coverage profiles (B) of 15 novel virus assemblies with classic RNAi processing pattern. Virus assemblies shown are in Fig. 5, Cluster 3, and belong to the 115 novel viruses classified by sequence similarity to known virus assemblies. Red vertical bars represent reads mapped over the positive strand of reference viral sequence, and blue bars represent reads mapped over the negative strand. (PDF 287 kb)
Figure S2. Small RNA size profiles of high-quality assemblies left unclassified by sequence similarity grouping. Thirty-nine unclassified contigs that meet the same quality criteria as the 115 classified contigs (non-redundant and > 500 nucleotides), and with at least 100 small RNA sequence reads. The 39 contigs display strong association by small RNA profile Cluster 2 and Cluster 3 (Fig. 5), which are enriched for 21 nucleotide reads mapping over both positive and negative strands, characteristic of the classical siRNA product size profile. Red bars represent reads mapped over the positive strand of reference viral sequence, and blue bars represent reads mapped over the negative strand. (PDF 29 kb)
Table S4. Anopheles coluzzii candidate piRNA genes differentially represented in small RNA sequence of ONNV infected and uninfected controls. (XLSX 18 kb)
Classified Virus Sequences. Sequence assemblies and annotations for the final 115 novel virus contigs non-redundant contigs > 500 nucleotides, classified by sequence similarity to known viruses. In EMBL flat sequence file format. (ZIP 208 kb)
Unclassified Virus Sequences. Sequence assemblies for 39 non-redundant contigs > 500 nucleotides, unclassified by sequence similarity to known viruses. The small RNA profile matched a pattern of classic RNAi processing of viral replication intermediates, suggesting they are undescribed viruses. In fasta text sequence file format. (FASTA 38 kb)