Genome sequence of Anopheles sinensis provides insight into genetics basis of mosquito competence for malaria parasites
- Dan Zhou1, 2,
- Donghui Zhang1, 2,
- Guohui Ding3,
- Linna Shi1, 2,
- Qing Hou1, 2,
- Yuting Ye1, 2,
- Yang Xu4,
- Huayun Zhou5,
- Chunrong Xiong5,
- Shengdi Li3,
- Jing Yu1, 2,
- Shanchao Hong1, 2,
- Xinyou Yu1, 2,
- Ping Zou1, 2,
- Chen Chen1, 2,
- Xuelian Chang1, 2,
- Weijie Wang1, 2,
- Yuan Lv1, 2,
- Yan Sun1, 2,
- Lei Ma1, 2,
- Bo Shen1, 2Email author and
- Changliang Zhu1, 2
© Zhou et al.; licensee BioMed Central Ltd. 2014
Received: 8 July 2013
Accepted: 16 January 2014
Published: 18 January 2014
Anopheles sinensis is an important mosquito vector of Plasmodium vivax, which is the most frequent and widely distributed cause of recurring malaria throughout Asia, and particularly in China, Korea, and Japan.
We performed 454 next-generation sequencing and obtained a draft sequence of A. sinensis assembled into scaffolds spanning 220.8 million base pairs. Analysis of this genome sequence, we observed expansion and contraction of several immune-related gene families in anopheline relative to culicine mosquito species. These differences suggest that species-specific immune responses to Plasmodium invasion underpin the biological differences in susceptibility to Plasmodium infection that characterize these two mosquito subfamilies.
The A. sinensis genome produced in this study, provides an important resource for analyzing the genetic basis of susceptibility and resistance of mosquitoes to Plasmodium parasites research which will ultimately facilitate the design of urgently needed interventions against this debilitating mosquito-borne disease.
KeywordsGenome Anopheles sinensis Malaria
Malaria is caused by infection with Plasmodium parasites, which are transmitted via the bites of infected female Anopheles mosquitoes . Malaria is prevalent and widely distributed in tropical and subtropical regions, including much of sub-Saharan Africa, Asia, and the Americas [2, 3]. Indeed, according to the latest World Malaria Report, in 2010 malaria caused an estimated 216 million clinical episodes and 655,000 deaths worldwide . Of the few available management strategies for this disease, vector control offers an important means of limiting the spread of malaria. The effective control of mosquito vectors, however, requires information on their genetic structure, because the biology and physiology of infections, the development of insecticide resistance, and the epidemiology of malaria in the human host can all be affected by genetic variation in the mosquito vector populations. To date, our understanding of the role of vector genetics in the dynamics of malaria transmission is poor. In particular, the function and evolutionary aspects of important genes, such as those associated with vector competence, remains unclear. The paucity of genetic information on Plasmodium-susceptible mosquitoes is a major obstacle to the development of appropriate diagnostic and therapeutic tools against malaria.
All malaria vectors belong to the subfamily Anophelinae. Mosquitoes of the subfamily Culicinae are not susceptible to infection by Plasmodium parasites and thus, do not transmit Plasmodium. The genomes of A. gambiae, Aedes aegypti and Culex quinquefasciatus were sequenced in 2002, 2007 and 2010, respectively. Comparative genomic studies of these three species have provided important genetic insights into this vector-disease system including the identification of conserved gene regions; the identification of highly diverged genes; recognition of gene families that have expanded or contracted; and the evolution of species-specific physiological or behavioral genetic variations. Nevertheless, information provided by these genome sequences has provided only a limited understanding of the genetic basis of species-specific susceptibility to Plasmodium.
In this study, we sequenced the genome of A. sinensis, a malaria vector within the subfamily Anophelinae. A. sinensis is an Asiatic mosquito species with a wide geographical distribution in East Asia region, ranging from the Philippines to Japan . While A. gambiae is considered to be an efficient vector of P. falciparum, A. sinensis is suspected to be the most dominant and important vector of P. vivax. In addition, A. sinensis was found to be solely responsible for the recent outbreaks of malaria in China . Contrasting the genetic composition of these two anopheline mosquitos with that of culicine mosquitos offers a means of investigating the genetic basis of their phenotypic differences to Plasmodium susceptibility, which is a critical step in developing novel ways to reduce human malaria transmission.
Traditional methods of gene detection are costly and time consuming and typically require prior knowledge of target gene regions, as they rely on specific primers. Therefore, these techniques are unsuitable for analyzing large numbers of unknown sequences. The development of next-generation sequencing (NGS) technologies provides an ideal method for rapid and reliable genomic exploration of mosquitoes.
In this study, we employed Roche/454 GS FLX sequencing technology to produce the first genome sequences of A. sinensis. A single-end 454 Jr. run combined with a paired-end 454 Jr. run (3, 8 and 20 Kb libraries) provided a cost-effective solution that produced high quality draft assemblies, and allowed us to obtain detailed gene annotations and meaningful results. Our comparative genomic analyses of the genomes of anopheline and culicine mosquitoes revealed key genetic difference that may underlie important species-specific biological functions in these two groups. This study provides critical genomic information that will pave the way for further in-depth molecular investigations into the biological and vector competency of A. sinensis.
Results and discussion
Sequencing and assembly
Summary of the raw reads of the sequencing analysis of A. sinensis
Reads used in assembly
Average read length (bp)
3 K paired end
8 K paired end
20 K paired end
Estimated genome size of A. Sinensis based on K-mer analysis
Genome size (bp)
Statistics for the assembly of the A. sinensis genome
Total Number (>100 bp)
Total Number (>2 Kb)
Characteristics of the genomes of A.sinensis , A.gambiae , Ae.aegypti , and C.quinquefasciatus
Genome size (Mbps)
Genome coverage (×)
Number of contig
Number of scaffold
Contig N50 (Kbps)
Scaffold N50 (Kbps)
Average contig size (bps)
Average scaffold size (bps)
Number of gene (protein-coding)
Percentage of gene Length (%)
Percentage of exon region length (%)
Repetitive elements analysis
Compared with published mosquito genome sequences, the TEs content of Anophelinae (A. gambiae, 11% to 16%) were far less than Culicinae (Ae. aegypti, 42% to 47%; C. quinquefasciatus, 29%) [21–23]. TEs content could be a leading factor influencing genome size in many species [24, 25]. For example, studies have shown that the genome of Ae. aegypti has doubled its size as a result of TEs . Thus, the differences in the genome size of A. sinensis and other mosquito species could in part be due to the accumulation or loss of TEs in the different species.
Although the predicted genome size of A. sinensis was smaller than that of A. gambiae, we found a greater number of predicted genes in the genome of A. sinensis (Table 4). This discrepancy was consistent with the results of the genome-wide analysis that revealed a higher percentage of exon region length and less TEs content in the genome of A. sinensis. Differences in the numbers of predicted genes in these two species may be consequence of species-specific genetic differences that have arisen from biological adaptations to the different environmental challenges faced by these two mosquitoes. However, it is also possible that gene numbers were overestimated in the genome of A. sinensis because of false-positive gene predictions. These can arise because of inaccurate annotation of the automated consensus gene set or because of putative TEs and bacterial contaminates which escaped earlier detection. Overestimation of the number of predicted genes has been reported for other mosquito species . The third possibility is genes were under-prediction in A. gambiae. It was the second insect genome to be sequenced, and like the initial D. melanogaster, the used consortia eliminated most ab initio gene models without comparative or experimental support, which may cause under-prediction. For example, a recent genomic comparative paper on orphan genes in insects does not even include the A. mellifera gene set, and notes that the A. gambiae gene count is abnormally low for orphan genes . Manual examination of the output will be required to assure the accuracy of the predicted genes of A. sinensis found in this study.
In addition to protein-coding genes, we also identified 41 microRNA (miRNA), 348 tRNA and 2017 rRNA genes in the A. sinensis genome (Additional file 1: Table S5; see the Additional file 2: “MiRNA list” for a list of all predicted A. sinensis miRNA target genes and their annotations). At present, 67 miRNAs have been described for A. gambiae, which is almost 1.5 times that found for A. sinensis. Our finding of just 41 miRNA for A. sinensis might be an underestimation as the target prediction was based on an imperfect match between known miRNA and our genomic sequence. It is also possible that some target genes were missed during the alignment due to the differences between the two genomes of these two anopheline species. Another possible factor that may contribute to lower miRNA genes in A. sinensis is that A. sinensis may have unidentified miRNAs.
Functional annotation and gene family analysis
For all predicted protein-coding genes, 93.8% had matches in the non-redundant (NR) databases, 64.6% were similar to entries in the InterPro database, 67.7% were assigned GO terms, 14.2% were mapped to known pathways, 14.0% had signal peptides and 21.4% had transmembrane regions (Additional file 1: Table S6). There were several domains (fibrinogen, protein kinase and six-bladed beta-propeller) and repeats (LDLR class B repeat and Leucine-rich repeat) overrepresented in A. sinensis compared to A. gambiae (Additional file 1: Table S7). However, no significant differences of PTMs were observed between A. sinensis and A. gambiae. Several domains (histone, F-box and Zinc finger) were down-represented in anopheline species compared with the culicine species, though no significant differences of translational modification (PTM) and repeat were observed between these two subfamilies (Additional file 1: Table S8).
Of the entire A. sinensis gene set, 2377 genes had an ortholog belonging to one of the 235 known biological pathways. There were no significant differences in the mapping of genes to pathways between A. sinensis and A. gambiae or between the anopheline and culicine subfamilies.
The frequencies of transmembrane regions in A. sinensis were relative lower than other three mosquito species (Additional file 1: Table S11). With just three exceptions (6, 7 and 10), protein numbers tended to decrease with the increasing transmembrane helices (Additional file 1: Table S12). InterPro analysis revealed that olfactory receptors (14.29%), G-protein coupled receptors (GPCRs, 34.91%) and major facilitator superfamily domain (16.25%) accounted for the largest proportion of the predicted proteins of the 6, 7 and 10 transmembrane helices, respectively.
The A. sinensis genome revealed 3,972 gene clusters containing 11,300 genes that were common to the genomes of the three previously sequenced mosquito species. There were 4,065 gene clusters containing 10,465 genes in A. gambiae, 4,064 gene clusters containing 12,608 genes in Ae. aegypti, and 4,073 gene clusters containing 14,827 genes in C. quinquefasciatus. 109 clusters found only in the four mosquito genomes, 34 clusters found specific to the Anophelinae, and 29 clusters containing 30 genes found specific to A. sinensis.
Gene orthology prediction
Analysis of InterPro in these 4727 orthologous genes revealed the most gene-enriched domain and family were peptidase (Additional file 1: Table S13), while analysis of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway revealed that genes were most enriched in metabolic pathways (Additional file 1: Table S14), both indicating functions central to mosquito biology, such as feeding behavior. Feeding releases peptidase in the midgut and assists in the degradation of blood meal proteins into peptides and amino acids .
Microsynteny with sequenced mosquito genomes
Characteristics of microsynteny blocks between A. sinensis , A. gambiae , Ae. aegypti , C. quinquefasciatus , and D. melanogaster
Numbers of microsynteny blocks
Total length of microsynteny (Mbps)
Shared gene families
A. sinensis/A. gambiae
A. sinensis/Ae. aegypti
A. sinensis/C. quinquefasciatus
A. sinensis/D. melanogaster
Few immune-related gene sets may be associated with malaria vectorial capacity
Anophelinae are recognized as major vectors of human malaria, while culicine species are the principal etiological agents of mosquito-borne viruses. It is not surprising that genetic factors play decisive roles in determining vectorial capacity . Previous studies regarding the immune system of Anophelinae have shown that changes in certain aspects can affect the development of Plasmodium either positively or negatively . As shown in Additional file 1: Table S15, relative to Culicinae, C-type lectins (CTLs), serine protease inhibitors (serpins, SRPNs) and MD2-like gene (ML) families have contracted in the Anophelinae, whereas the thioester-containing protein (TEP) and peroxidase gene families have expanded, which may result from the differential duplication and/or loss of genes among these evolutionary lineages. Although comparative immune-related gene families in C. quinquefasciatus, Ae. aegypti, and A. gambiae have been studied, limited information is available due to limited numbers of anopheline species. With the discovery of the second anopheline mosquito, A. sinensis, we may reveal the Plasmodium-susceptible genotype, which will help to understand the details of the relationships between anopheline mosquito vectors and malarial pathogens.
Both the ML and serpin gene families have been shown obviously interfere with malarial infection. AgMDL1, an MD2-like receptor, showed specificity in regulating resistance to P. falciparum and O’nyong-nyong virus [34, 35]. These overlaps between the effects of MLs on Plasmodium spp. and other viruses suggest that MLs are a universal defense mechanism for mosquitoes against invading pathogens. Expression of the SRPN6 gene can limit the number of rodent malarial oocysts in A. stephensi. Thus, the significant contraction of MLs and Serpins can help the malarial parasite to survive in Anophelinae.
The TEP  and C-type lectins (CTLs)  gene families are both involved in pathogen recognition (PRRs) and immune response activation. One TEP family gene in particular (TEP1), can be upregulated after malarial infection and strongly inhibit the development of infection in both rodents and humans by binding to Plasmodium parasite surfaces . In contrast, two circulating CTLs from A. gambiae (C-type lectin 4 [CTL4] and CTL mannose binding 2 [CTLMA2]) have been identified as agonists of the rodent malaria species, P. berghei, which can induce massive ookinete melanization when silenced . Consequently, the downregulation of CTL members and the upregulation of TEP members in Anophelinae are likely to depend on their relative roles in promoting or inhibiting the development of malarial parasites. Putative HPX (HPX2, HPX7 and HPX8) can be induced in the mosquitoes midgut in response to Plasmodium infection, in order to potentiate nitric oxide toxicity and improve antiplasmodial effects [40, 41]. Thus, HPX enzymes have been considered as key enzymes induced in the midgut cells of A. gambiae invaded by Plasmodium ookinetes.
The observed contraction of these two immune gene families (MLs and serpins) could be explained as important genetic components in the Plasmodium-susceptible phenotype. Such gene expression changes occur earlier than the invasion of Plasmodium, whereby Anophelinae are able to transmit Plasmodium. However, Plasmodium infection may induce the activation (and subsequent expansion or contraction) of immune-related genes involved in pathogen defense in mosquitoes. The two expanded gene families (TEP and peroxidases) and one contracted gene family (CTLs) observed in this study could have formed gradually as a long-term adaptive immune response against Plasmodium infection, and been expanded or contracted under positive selection in Anophelinae. Consequently, this immune-related gene set analysis at the theoretical level can provide clues for understanding the genetic basis of a Plasmodium-susceptible phenotype. These selective genes may serve as valuable potential targets for future malarial control strategies.
Strain selection and DNA extraction
The laboratory strain of A. sinensis used in this study has been inbred within the lab since 1984 and, never been exposed to pesticides. These mosquitoes were reared at 26 ± 1°C and 75 to 85% humidity, under a 10:14 h light:dark cycle. Genomic DNA was extracted from 300 adult females and 300 males (2 to 3 days post adult emergence) according to methods described in . To prevent RNA and protein contamination, extracted DNA was treated with RNase A and proteinase K and, subsequently, precipitated with ethanol.
Whole-genome sequencing and assembly
We employed a whole-genome sequencing strategy with Roche/454 GS FLX. We constructed a total of five single-end and seven mate-pair sequencing libraries with insert sizes of about 3 Kb, 8 Kb and 20 Kb from 1 μg, 5 μg, 30 μg and 60 μg of starting DNA. In total, we generated 4.16 Gb of data of sequencing reads ranging from 40 to 1196 bp. To reduce the effect of sequencing error during assembly, we undertook a series of checking and filtering steps in assembling the reads generated. By using stringent criteria, 3.34 G of high quality data were incorporated into the final de novo genome assembly.
The Lander-Waterman algorithm  were used to estimate the genome size of A. sinensis. K-mer analysis for single-end reads  revealed a frequency distribution that conformed to the Poisson expectation when K-mer was equal to 13. The value of expected depth was calculated based on the lambda, a parameter of possion distribution. The genome size of A. sinensis was then calculated using the total K-mer number divided by the expected depth value.
Whole-genome assembly was carried out with a Celera Assembler V6.1 for the remaining 454 reads . The revised pipeline (called Celera Assembler with the Best Overlap Graph, CABOG) was robust to uncertainty in homopolymer run length, high read coverage and heterogeneous read lengths. We utilized the following modules of the Celera Assembler software for successive phases of the assembly: pairwise overlap detection; initial ungapped multiple sequence alignments, called unitigs; unitig consensus calculation; combining unitigs with mate constraints to form contigs and scaffolds that were ungapped and gapped multiple sequence alignments; and, finally, scaffold consensus determination. Because the genome used for sequencing were constructed from whole adult mosquitoes, contamination from bacteria in gut or adhering on the surface were inevitable. To check for possible microbial contamination of the assembly, we screened scaffolds against the NCBI NT database using query alignment and identity cut-off of 90% and e-value cut-off of 1e-6. When the top hit was bacterial species, this scaffold was removed.
In order to assess the assembly quality, the transcriptome was sequenced and aligned to the scaffold sequences using Blat with default parameters . Assembly quality was also assessed by mapping the 454 Single reads to the scaffolds using BWA. The mapped regions (consensus sequences) with depth over 3X were extracted for SNVs and INDEL variation analysis, which represent potential base error and short indel error rate in the genome, respectively . Additionally, presence of CEGs was evaluated for the genome assembly (http://korflab.ucdavis.edu/Datasets/cegma/submit.html) [47, 48].
Identification of repetitive elements
The identification of repetitive elements is essential for genome sequencing, as unidentified repetitive elements can affect the quality of gene predictions, annotation and annotation-dependent analyses . Two methods were adopted for masking repeat regions in A. sinensis. First, RepeatMasker V3.3.0 (http://www.repeatmasker.org/) was applied against the Repbase library (species Anopheles) based on the scaffolds. Then, RepeatScout V1.0.5  software was used (with frequency set to ≥50) to build a repeat regions database by providing scaffolds and potentially repeat sequences. These results were merged with the results of the transposable elements for mosquitoes, which were downloaded from TEfam database (http://tefam.biochem.vt.edu/tefam/). Finally, these merged results were reprocessed with RepeatMasker.
To predict genes, we used two independent approaches: a homology-based method and a de novo method. The results of these two methods were integrated by the EVidenceModeler utility and then filtered multiple times and also checked manually. The reference protein sequences for protein alignment were obtained from VectorBase (for the aforementioned three sequenced mosquito species) and the NCBI database (for Culicidae species). CD-HIT software was used to cluster these protein sequences with 100% global similarity . AAT  and Genewise  software were used to align the protein data to the masked scaffolds. By comparing the databases, we obtained the number of protein distributions.
Four ab initio gene prediction programs were run on the genome: SNAP , Augustus , GlimmerHMM , and Genezilla  with the model trained using the published mosquito gene information (A. gambiae, Ae. aegypti and C. quinquefasciatus).
Quality of protein-coding gene predictions
To estimate the accuracy of gene prediction, we undertook a consistency check for the protein length of single-copy orthologs between A. sinensis and D. melanogaster. Considering the high conservation of single-copy orthologs, the protein length should have a high coherence between two species . The protein lengths of the two species were plotted as a scatter diagram and analyzed with a regression analysis. We compared the results of this regression analysis with results from the published literature.
Identification of noncoding RNA genes
tRNA genes were predicted by tRNAscan-SE-1.23 with eukaryote parameters . The rRNA fragments were identified by aligning the rRNA template sequences from the SILVA database  and RNAmmer database, by using BlastN at E-value 1e-5 with cutoff of identity ≥95% and match length ≥50 bp. It is important to note that rRNA genes in the A. sinensis genome were combined by aligning the 5.8S, 18S, 25S and 28S regions of databases using BlastN. miRNA was predicted by BlastN against the hairpin sequences from miRBase database (RELEASE 17) with E-value 1e-3, allowing no less than 70 bp alignment length, and requiring no less than 85% overall identity and 80% coverage.
Gene functions were assigned according to the best match of the alignments using Blast and BlastP (query coverage ≥50%; E-value: 1e-10) against the NCBI NR protein database. All predicted protein-coding genes were obtained with the InterProScan analysis tool . According to features of the predicted protein sequences, the InterProScan analysis was based on the active site, the binding site, the conserved site, the domain, the family, the PTM, and the repeat. Gene Ontology (GO) IDs for each gene were obtained from the corresponding InterProScan entry. All genes were aligned against the KEGG proteins, and the pathway in which the gene might be involved was derived from the matching genes in the KEGG. SignalP 4.0 server was used to predict the presence and location of signal peptide cleavage sites in the amino acid sequences . This method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks. TMHMM software  was used with default values to predict the transmembane region based on a hidden Markov model.
Gene orthology prediction
The gene orthology predictions were generated by the Ensemble Gene Tree method , which is based on the PHYML algorithm for multiple protein sequence alignments, and uses MUSCLE for each gene family that contains sequences from all five species (A. sinensis, A. gambiae, Ae. aegypti, C. quinquefasciatus and D. melanogaster). Gene trees were reconciled with the species trees using the RAL algorithm to call duplication events on internal nodes and to root the trees. The relations of orthology were inferred from the results of each gene tree.
Defining gene families
The PANTHER hidden Markov models V7.2, annotated to different functional gene families, were used with default parameters (i.e. E-value: 1e-3) to classify all gene models of A. sinensis. Immune-related gene sets were downloaded from ImmunoDB resource (http://cegg.unige.ch/Insecta/immunodb) and subjected to inspection, curation, and phylogenetic analysis. Based on these gene sets, we re-annotated the proteins in the A. sinensis genome by Blast search, and counted the number of A. sinensis genes in each functional gene set. The threshold E-value in the Blast search was set to 1e-3, while the similarity was set to 0.35.
Construction of microsyntenic blocks
CHSMiner V1.1  was used to construct the microsynteny map for A. sinensis and the other three previously sequenced mosquito species. Briefly, the program used the orthologs between two genomes as anchors, and merged two anchors into a block if they were located less than a specified gap size apart. We used default values for parameters and set the minimum length to 100 Kb. Each microsynteny detected was evaluated by corrected P-values; only those results with the P-values less than 1e-5 were preserved.
M-Coffee V9.0 program  was used to perform the multiple alignment of proteins in each family. A phylogeny tree was constructed based on the 3,470 single-copy families in the five species (A. sinensis, A. gambiae, Ae. aegypti, C. quinquefasciatus and D. melanogaster). We used the Phylip package V3.69  to build the maximum likelihood (ML) tree for each protein family under the JTT substitution model. Then the SuperTree software was used to get an integrated supertree. To evaluate the topology of the supertree, we performed a bootstrap resample analysis using 100 resamples from the original tree.
Malaria is caused by infection with Plasmodium parasites that are transmitted via the bites of infected female Anopheles mosquitoes. Vector control offers an important means of limiting the spread of malaria; however, the lack of genetic information on Plasmodium-susceptible anopheline mosquitoes is a major obstacle to the development of effective vector management. We generated the first draft genome sequence of Anopheles sinensis, an Asiatic mosquito species suspected to be the most important vector of P. vivax. We compared the genetic composition of this species to that of other sequenced mosquito species in the subfamily Anophelinae and the subfamily Culicinae (the latter are not susceptible to Plasmodium infection). The results of these comparisons provide important genetic insights into this vector-disease system. In particular,we observed the expansion and contraction of several important immune-related gene families known to influence aspects of Plasmodium development, in the anopheline species relative to the culicine species. These differences suggest that species-specific immune responses to Plasmodium infection underpin the biological differences in Plasmodium susceptibility that characterize these two mosquito subfamilies. This study provides critical genomic information that will pave the way for analyses investigating the genetic basis of mosquito susceptibility and resistance to Plasmodium parasites.
This work was supported by the National Institutes of Health of US (NIH)(Grant No. 2R01AI075746-05), the National Natural Science Foundation of China (Grant No. 30901244, 30972564, 81171900 and 81101279), the National S & T Major Program (Grant No. 2012ZX10004-219, 2008ZX10004-010, 2012ZX10004-220 and 2008ZX10004-011), Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No.20113234120007), Natural Science Foundation of Jiangsu Province (Grant No.81101279), and Priority Academic Program Development of Jiangsu Higher Education Institutions.
- Kiang KM, Bryant PA, Shingadia D, Ladhani S, Steer AC, Burgner D: The treatment of imported malaria in children: an update. Arch Dis Child Educ Pract Ed. 2013, 98: 7-15. 10.1136/archdischild-2011-301530.PubMedView Article
- Patarroyo MA, Calderon D, Moreno-Perez DA: Vaccines against Plasmodium vivax: a research challenge. Expert Rev Vaccines. 2012, 11 (10): 1249-1260. 10.1586/erv.12.91.PubMedView Article
- Eastman RT, Pattaradilokrat S, Raj DK, Dixit S, Deng B, Miura K, Yuan J, Tanaka TQ, Johnson RL, Jiang H, et al: A class of tricyclic compounds blocking malaria oocyst development and transmission. Antimicrob Agents Chemother. 2013, 57 (1): 425-435. 10.1128/AAC.00920-12.PubMed CentralPubMedView Article
- Organization WH: World malaria report 2011. 2011, Geneva: World Health Organization
- Jung J, Lee E, Kim W: Isolation and characterization of polymorphic microsatellite markers of Anopheles sinensis, a malaria vector mosquito in the East Asia region. Mol Ecol Notes. 2006, 6 (4): 1272-1274. 10.1111/j.1471-8286.2006.01512.x.View Article
- Welch SG, McGregor IA, Williams K: The Duffy blood group and malaria prevalence in Gambian West Africans. Trans R Soc Trop Med Hyg. 1977, 71 (4): 295-296. 10.1016/0035-9203(77)90102-X.PubMedView Article
- Lee WJ, Klein TA, Kim HC, Choi YM, Yoon SH, Chang KS, Chong ST, Lee IY, Jones JW, Jacobs JS, et al: Anopheles kleini, Anopheles pullus, and Anopheles sinensis: potential vectors of Plasmodium vivax in the Republic of Korea. J Med Entomol. 2007, 44 (6): 1086-1090. 10.1603/0022-2585(2007)44[1086:AKAPAA]2.0.CO;2.PubMed
- Zhou SS, Huang F, Wang JJ, Zhang SS, Su YP, Tang LH: Geographical, meteorological and vectorial factors related to malaria re-emergence in Huang-Huai River of central China. Malar J. 2010, 9: 337-10.1186/1475-2875-9-337.PubMed CentralPubMedView Article
- Rai KS, Black WC: Mosquito genomes: structure, organization, and evolution. Adv Genet. 1999, 41: 1-33.PubMedView Article
- Behura SK, Singh BK, Severson DW: Antagonistic relationships between intron content and codon usage bias of genes in three mosquito species: functional and evolutionary implications. Evol Appl. 2013, 6 (7): 1079-1089.PubMed CentralPubMed
- Haddrill PR, Charlesworth B, Halligan DL, Andolfatto P: Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content. Genome Biol. 2005, 6 (8): R67-10.1186/gb-2005-6-8-r67.PubMed CentralPubMedView Article
- Gerton JL, DeRisi J, Shroff R, Lichten M, Brown PO, Petes TD: Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2000, 97 (21): 11383-11390. 10.1073/pnas.97.21.11383.PubMed CentralPubMedView Article
- Hurst LD, Brunton CF, Smith NG: Small introns tend to occur in GC-rich regions in some but not all vertebrates. Trends Genet. 1999, 15 (11): 437-439. 10.1016/S0168-9525(99)01832-6.PubMedView Article
- Takano-Shimizu T: Local changes in GC/AT substitution biases and in crossover frequencies on Drosophila chromosomes. Mol Biol Evol. 2001, 18 (4): 606-619. 10.1093/oxfordjournals.molbev.a003841.PubMedView Article
- Birdsell JA: Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol Biol Evol. 2002, 19 (7): 1181-1197. 10.1093/oxfordjournals.molbev.a004176.PubMedView Article
- Berglund J, Pollard KS, Webster MT: Hotspots of biased nucleotide substitutions in human genes. PLoS Biol. 2009, 7 (1): e26-10.1371/journal.pbio.1000026.PubMedView Article
- Duret L, Arndt PF: The impact of recombination on nucleotide substitutions in the human genome. PLoS Genet. 2008, 4 (5): e1000071-10.1371/journal.pgen.1000071.PubMed CentralPubMedView Article
- Spencer CC, Deloukas P, Hunt S, Mullikin J, Myers S, Silverman B, Donnelly P, Bentley D, McVean G: The influence of recombination on human genetic diversity. PLoS Genet. 2006, 2 (9): e148-10.1371/journal.pgen.0020148.PubMed CentralPubMedView Article
- Riehle MM, Markianos K, Niare O, Xu J, Li J, Toure AM, Podiougou B, Oduol F, Diawara S, Diallo M, et al: Natural malaria infection in Anopheles gambiae is regulated by a single genomic control region. Science. 2006, 312 (5773): 577-579. 10.1126/science.1124153.PubMedView Article
- Lehmann T, Hume JC, Licht M, Burns CS, Wollenberg K, Simard F, Ribeiro JM: Molecular evolution of immune genes in the malaria mosquito Anopheles gambiae. PLoS One. 2009, 4 (2): e4549-10.1371/journal.pone.0004549.PubMed CentralPubMedView Article
- Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, et al: The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002, 298 (5591): 129-149. 10.1126/science.1076181.PubMedView Article
- Nene V, Wortman JR, Lawson D, Haas B, Kodira C, Tu ZJ, Loftus B, Xi Z, Megy K, Grabherr M, et al: Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 2007, 316 (5832): 1718-1723. 10.1126/science.1138878.PubMedView Article
- Arensburger P, Megy K, Waterhouse RM, Abrudan J, Amedeo P, Antelo B, Bartholomay L, Bidwell S, Caler E, Camara F, et al: Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomics. Science. 2010, 330 (6000): 86-88. 10.1126/science.1191864.PubMed CentralPubMedView Article
- Comeron JM: What controls the length of noncoding DNA?. Curr Opin Genet Dev. 2001, 11 (6): 652-659. 10.1016/S0959-437X(00)00249-5.PubMedView Article
- Fernandez-Medina RD, Struchiner CJ, Ribeiro JM: Novel transposable elements from Anopheles gambiae. BMC Genomics. 2011, 12: 260-10.1186/1471-2164-12-260.PubMed CentralPubMedView Article
- Wissler L, Gadau J, Simola DF, Helmkampf M, Bornberg-Bauer E: Mechanisms and dynamics of orphan gene emergence in insect genomes. Genome Biol Evol. 2013, 5 (2): 439-455. 10.1093/gbe/evt009.PubMed CentralPubMedView Article
- Rascon AA, Gearin J, Isoe J, Miesfeld RL: In vitro activation and enzyme kinetic analysis of recombinant midgut serine proteases from the Dengue vector mosquito Aedes aegypti. BMC Biochem. 2011, 12: 43-10.1186/1471-2091-12-43.PubMed CentralPubMedView Article
- Zdobnov EM, Von Mering C, Letunic I, Torrents D, Suyama M, Copley RR, Christophides GK, Thomasova D, Holt RA, Subramanian GM, et al: Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science. 2002, 298 (5591): 149-159. 10.1126/science.1077061.PubMedView Article
- Gaunt MW, Miles MA: An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. Mol Biol Evol. 2002, 19 (5): 748-761. 10.1093/oxfordjournals.molbev.a004133.PubMedView Article
- Krzywinski J, Grushko OG, Besansky NJ: Analysis of the complete mitochondrial DNA from Anopheles funestus: an improved dipteran mitochondrial genome annotation and a temporal dimension of mosquito evolution. Mol Phylogenet Evol. 2006, 39 (2): 417-423. 10.1016/j.ympev.2006.01.006.PubMedView Article
- Garros C, Harbach RE, Manguin S: Systematics and biogeographical implications of the phylogenetic relationships between members of the funestus and minimus groups of Anopheles (Diptera: Culicidae). J Med Entomol. 2005, 42 (1): 7-18. 10.1603/0022-2585(2005)042[0007:SABIOT]2.0.CO;2.PubMedView Article
- Beerntsen BT, James AA, Christensen BM: Genetics of mosquito vector competence. Microbiol Mol Biol Rev. 2000, 64 (1): 115-137. 10.1128/MMBR.64.1.115-137.2000.PubMed CentralPubMedView Article
- Blandin S, Shiao SH, Moita LF, Janse CJ, Waters AP, Kafatos FC, Levashina EA: Complement-like protein TEP1 is a determinant of vectorial capacity in the malaria vector Anopheles gambiae. Cell. 2004, 116 (5): 661-670. 10.1016/S0092-8674(04)00173-4.PubMedView Article
- Dong Y, Aguilar R, Xi Z, Warr E, Mongin E, Dimopoulos G: Anopheles gambiae immune responses to human and rodent Plasmodium parasite species. PLoS Pathog. 2006, 2 (6): e52-10.1371/journal.ppat.0020052.PubMed CentralPubMedView Article
- Waldock J, Olson KE, Christophides GK: Anopheles gambiae antiviral immune response to systemic O’nyong-nyong infection. PLoS Negl Trop Dis. 2012, 6 (3): e1565-10.1371/journal.pntd.0001565.PubMed CentralPubMedView Article
- Abraham EG, Pinto SB, Ghosh A, Vanlandingham DL, Budd A, Higgs S, Kafatos FC, Jacobs-Lorena M, Michel K: An immune-responsive serpin, SRPN6, mediates mosquito defense against malaria parasites. Proc Natl Acad Sci USA. 2005, 102 (45): 16327-16332. 10.1073/pnas.0508335102.PubMed CentralPubMedView Article
- Obbard DJ, Callister DM, Jiggins FM, Soares DC, Yan G, Little TJ: The evolution of TEP1, an exceptionally polymorphic immunity gene in Anopheles gambiae. BMC Evol Biol. 2008, 8: 274-10.1186/1471-2148-8-274.PubMed CentralPubMedView Article
- Cambi A, Gijzen K, De Vries JM, Torensma R, Joosten B, Adema GJ, Netea MG, Kullberg BJ, Romani L, Figdor CG: The C-type lectin DC-SIGN (CD209) is an antigen-uptake receptor for Candida albicans on dendritic cells. Eur J Immunol. 2003, 33 (2): 532-538. 10.1002/immu.200310029.PubMedView Article
- Osta MA, Christophides GK, Kafatos FC: Effects of mosquito genes on plasmodium development. Science. 2004, 303 (5666): 2030-2032. 10.1126/science.1091789.PubMedView Article
- Oliveira Gde A, Lieberman J, Barillas-Mury C: Epithelial nitration by a peroxidase/NOX5 system mediates mosquito antiplasmodial immunity. Science. 2012, 335 (6070): 856-859. 10.1126/science.1209678.PubMedView Article
- Higuchi H, Grambihler A, Canbay A, Bronk SF, Gores GJ: Bile acids up-regulate death receptor 5/TRAIL-receptor 2 expression via a c-Jun N-terminal kinase-dependent pathway involving Sp1. J Biol Chem. 2004, 279 (1): 51-60.PubMedView Article
- Collins FH, Mendez MA, Rasmussen MO, Mehaffey PC, Besansky NJ, Finnerty V: A ribosomal RNA gene probe differentiates member species of the Anopheles gambiae complex. Am J Trop Med Hyg. 1987, 37 (1): 37-41.PubMed
- Lander ES, Waterman MS: Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988, 2 (3): 231-239. 10.1016/0888-7543(88)90007-9.PubMedView Article
- Marcais G, Kingsford C: A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011, 27 (6): 764-770. 10.1093/bioinformatics/btr011.PubMed CentralPubMedView Article
- Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008, 24 (24): 2818-2824. 10.1093/bioinformatics/btn548.PubMed CentralPubMedView Article
- Cho YS, Hu L, Hou H, Lee H, Xu J, Kwon S, Oh S, Kim HM, Jho S, Kim S, et al: The tiger genome and comparative analysis with lion and snow leopard genomes. Nat Commun. 2013, 4: 2433-PubMed CentralPubMed
- Parra G, Bradnam K, Korf I: CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 2007, 23 (9): 1061-1067. 10.1093/bioinformatics/btm071.PubMedView Article
- Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, et al: Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience. 2013, 2 (1): 10-10.1186/2047-217X-2-10.PubMed CentralPubMedView Article
- Lorenzi H, Thiagarajan M, Haas B, Wortman J, Hall N, Caler E: Genome wide survey, discovery and evolution of repetitive elements in three Entamoeba species. BMC Genomics. 2008, 9: 595-10.1186/1471-2164-9-595.PubMed CentralPubMedView Article
- Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-i358. 10.1093/bioinformatics/bti1018.PubMedView Article
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22 (13): 1658-1659. 10.1093/bioinformatics/btl158.PubMedView Article
- Huang X, Adams MD, Zhou H, Kerlavage AR: A tool for analyzing and annotating genomic sequences. Genomics. 1997, 46 (1): 37-45. 10.1006/geno.1997.4984.PubMedView Article
- Birney E, Clamp M, Durbin R: Genewise and genomewise. Genome Res. 2004, 14 (5): 988-995. 10.1101/gr.1865504.PubMed CentralPubMedView Article
- Korf I: Gene finding in novel genomes. BMC Bioinforma. 2004, 5: 59-10.1186/1471-2105-5-59.View Article
- Stanke M, Schoffmann O, Morgenstern B, Waack S: Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinforma. 2006, 7: 62-10.1186/1471-2105-7-62.View Article
- Majoros WH, Pertea M, Salzberg SL: TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004, 20 (16): 2878-2879. 10.1093/bioinformatics/bth315.PubMedView Article
- Majoros WH, Pertea M, Delcher AL, Salzberg SL: Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinforma. 2005, 6: 16-10.1186/1471-2105-6-16.View Article
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25 (5): 955-964.PubMed CentralPubMedView Article
- Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glockner FO: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res. 2007, 35 (21): 7188-7196. 10.1093/nar/gkm864.PubMed CentralPubMedView Article
- Zdobnov EM, Apweiler R: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17 (9): 847-848. 10.1093/bioinformatics/17.9.847.PubMedView Article
- Petersen TN, Brunak S, Von Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011, 8 (10): 785-786. 10.1038/nmeth.1701.PubMedView Article
- Moller S, Croning MD, Apweiler R: Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics. 2001, 17 (7): 646-653. 10.1093/bioinformatics/17.7.646.PubMedView Article
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19 (2): 327-335.PubMed CentralPubMedView Article
- Wang Z, Ding G, Yu Z, Liu L, Li Y: CHSMiner: a GUI tool to identify chromosomal homologous segments. Algorithms Mol Biol. 2009, 4: 2-10.1186/1748-7188-4-2.PubMed CentralPubMedView Article
- Notredame C, Higgins DG, Heringa J: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302 (1): 205-217. 10.1006/jmbi.2000.4042.PubMedView Article
- Felsenstein J: Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 1996, 266: 418-427.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.