Initiation and genetic description of mapping populations from nature
We initiated founder colonies of A. gambiae M-form mosquitoes from the West African population (Burkina Faso and Mali) as a tool to reduce genetic complexity and enhance informativeness for mapping by pooled sequencing. The approach relies on unsupervised mating, each colony initiated with the progeny of ~10 wild-captured gravid female mosquitoes. The founder mosquitoes assortatively mated in nature, thus ensuring that the progeny reproduce the genetic subdivisions of the natural population.
The genetic composition of founder colonies was determined by microsatellite typing. First, we compared genotypes for a highly polymorphic microsatellite, H603, located at 42 Mb on the left of the chromosome 2. Each of three founder colonies tested segregate six marker alleles, as compared to 22 alleles at this locus in the wild source population (Fig. 1a). Overall, a total of 13 of the 22 wild alleles were detected among the three founder colonies. Thus, any individual founder colony maintains and segregates limited variation, but the sum of multiple such colonies may approximate and serve as a proxy for the genetic diversity of the natural population. Only one allele by state (no claim is made about descent), the 109 nt allele (turquoise in Fig. 1a), is shared by all three founder colonies, indicating that founder colonies capture largely distinct subsets of natural variation. Interestingly, alleles that are rare in nature not only persist but can achieve appreciable frequency within a founder colony. Thus, analysis of multiple founder colonies queries common and rare variants from the population.
Analysis of genetic variation and sharing between colonies using five microsatellite markers further indicates that the artificial population bottleneck creates the advantages of founder populations, which have been extensively discussed in human genetics [17, 18]. The colonies are distinct but related synthetic populations, with varying levels of Fst among colonies and the wild source population (Fig. 1b). Thus, the founder colonies perform as a complexity-reduction tool for tractable genetic query of both frequent and rare natural allele frequency classes under controlled laboratory conditions and at reasonable sample sizes, but query a greater amount of variation than single-pair crosses.
We derived an estimate of the genetic mapping resolution of the founder colonies. The crossovers from a given point (e.g., the trait locus) are exponentially distributed with rate Νρ, where Ν is the number of crossovers and ρ is the crossover rate [19], about one Mb per cM in A. gambiae [11, 20]. The expectation for the interval containing flanking crossovers around the trait locus is 1/Νρ from each side. Therefore, when 10 individuals share the same variant under a dominant model, and ignoring the few homozygotes, the maximum resolution will be 2/Νρ = 2/nkρ, where n is the number of mosquitoes, k is the number of generations since the founding event. The current founder colonies were propagated for at least 30 generations since initiation, thus the expected resolution for a fully informative phenotype is at least 2/30*10*1 = 2/300 = 0.7*10−2 = 0.7 cM. This is a genomic average estimate of resolution, which will vary empirically according to the recombinational properties of different regions of the genome. In addition, loss of information due to incomplete penetrance of a trait would also extend the resolved mapping interval.
First-stage mapping by pooled sequencing identifies candidate loci
Mosquitoes from two founder colonies, Fd03 and Fd09, were challenged by feeding on cultured P. falciparum gametocytes, and individual mosquito infection phenotypes were determined by dissection and counts of midgut oocysts. DNA from mosquitoes with similar phenotypes were combined and Illumina-sequenced as pools. A quantitative description of the pools is given in the methods. Sequences of pools were compared across the genome to detect regions displaying reduction in haplotype diversity, in order to detect candidate intervals carrying variants that underlie the phenotype. By definition, these decreased heterozygosity candidate intervals will be enriched for haplotypes carrying the causative haplotype, and other non-causative haplotypes will be simultaneously depleted from the same phenotype pool. Regions of the genome not associated with the phenotype should display random segregation of haplotypes across pools.
The first mapping stage was comprised of genome-wide ascertainment of candidate loci. Pooled heterozygosity (Hp) was calculated across sliding windows for each of the phenotype pools individually, as well as total heterozygosity for the whole founder colony combined. Relative diversity (HpR) was calculated as the proportion of heterozygosity in a phenotype pool relative to total heterozygosity within the whole founder colony after normalising for overall read-depth in each pool. Standard deviation of HpR values (SHpR) was used to identify regions with over-represented haplotypes in a given phenotype pool by comparison to the same positions in the whole founder colony. The measurement of heterozygosity within a pool was done in comparison to the same positions in the whole founder colony, and thus was normalized for local variation of heterozygosity across the genome. The analysis yielded three candidate loci in two different founder colonies, located at chromosome:coordinates, 3 L:17409-19071 kb (colony Fd03), 2R:17385-26524 kb (colony Fd09), 2R:47490-60531 kb (colony Fd09). These candidate loci were named 3.1, 9.1, and 9.2, respectively. Plots of candidate locus 3.1 are shown in Fig. 2 (chromosome 3 L) and Additional file 1: Figure S1 (chromosomes 2 and X). Plots of candidate loci 9.1 and 9.2 are shown in Additional file 2: Figure S2 (all chromosomes). Candidate locus 9.1 is coincident with the 2Rb paracentric chromosomal inversion.
In order to estimate significance values for mapped loci, a permutation analysis was carried out on the SHpR values by reselecting allele frequencies randomly from each of the phenotype pools. After 1000 tests, the selected SHpR regions were found to have median SHpR values that were within the 99.9th percentile of those selected randomly (equivalent to a P-value of 0.001).
Functional description of candidate loci
Coding sequences within candidate loci were analysed for enrichment of Gene Ontology (GO) predicted functional categories. The two candidate loci from Fd09 contain 609 genes for candidate locus 9.1 and 708 genes for candidate locus 9.2. While the large number of genes in the two Fd09 loci might reduce the probability of detecting significant enrichments, both Fd09 loci demonstrated enrichment for genes with potential immune functions, with highly significant enrichment (P = 7.8e-6) for monoxygenase function in candidate locus 9.1, and the presence of multiple peroxidases in candidate locus 9.2, consistent with either a detoxification or ROS-based immune response. Analysis of genes with significantly enriched GO terms in candidate locus 9.1, however, indicated that most of these genes belong to a single cytochrome P450 cluster between 17.4 and 21.1 Mb.
Due to the coincidence of candidate locus 9.1 with the 2Rb chromosomal inversion, molecular karyotyping was carried out on all Fd09 samples. There was only one 2Rb/b homozygote in the high pool, with heterozygotes occurring randomly between all three pools (giving 2/3/3 copies of the inversion in zero/low/high pools respectively; each pool was comprised of 20 mosquitoes, thus 40 chromosomes). Thus, there was low power to test, but also no evident support for association of the frequency of 2Rb inversion genotypes or alleles with membership in phenotypic pools. For all mapped loci, we have tested for chromosomal inversions, and the SHpR method controls internally for sequence read depth by comparing each pool to the whole founder colony. Therefore, estimates of relative diversity should be robust to potentially confounding sources of local genome variation. The Fd03 candidate locus 3.1 contains only 74 genes, the majority of them with no functional information. Genetically-based candidate gene ascertainment would require deeper pooled sequence data, or ideally sequence variation data from phenotyped individuals, resources that have not been generated. Moreover, because the haplotype contains irrelevant as well as relevant SNPs in linkage, resolution by fine mapping may not be productive without propagation to generate additional recombinations. Consequently, we performed ad hoc ascertainment of candidates based on recognizable predicted gene function and other evidence. Of those with characterized function, two encode Toll-family proteins, TOLL 10 and TOLL 11 (AGAP001187, AGAP001186).
Second-stage fine mapping of candidate loci
The second mapping stage comprised candidate locus confirmation and prioritization of candidate genes. SNPs within the candidate loci displaying the greatest difference in minor allele frequencies between any two phenotype pools were selected for genotyping in individual mosquitoes. SNPs were selected based on sequence from individual founder colonies, and were tested only within the same founder colony. A total of 44 SNPs were chosen from Fd09 across candidate loci 9.1 and 9.2, and 23 SNPs from Fd03 for candidate locus 3.1.
For each founder colony, fine-mapping was performed by genotyping all of the unpooled individual mosquitoes from the original infections from which the initial phenotype pools were constructed. For Fd03, an independent replicate infection, which did not contribute to the sequenced phenotype pools, was also genotyped. These two replicate infections were assessed by logistic regression separately, and where the odds ratio indicated the same effect, also together as one experiment (all Fd03 logistic regression values are given in Additional file 3). The individual typing of deconvoluted pools confirms that the pooled analysis results are recapitulated by individual genotyping, and serves as technical replication, while the typing of new material from the same colony but a completely independent infection serves as biological replication.
Candidate locus 3.1 contained two SNPs with significant association after permutation, influencing both oocyst infection prevalence and intensity (Fig. 3, Additional file 4). The interval of candidate locus 3.1 was thus considered confirmed as a P. falciparum control locus, and following convention [3] is named Pfin7 (Plasmodium falciparum infection locus 7). The variant at 3 L:18559884 is a C:A mutation in the intergenic region between TOLL 11 (AGAP011186) and TOLL 10 (AGAP011187) with an odds ratio of 7.79 for oocyst infection intensity (p = 0.00148, calculated across both replicates). The variant at 3 L:18552220 is a T:C mutation located immediately downstream of TOLL 10, with an odds ratio of 3.15 for oocyst infection prevalence (p = 0.002594, calculated across both replicates). Individual genotyping did not confirm association with infection phenotype for either of the chromosome 2R candidate loci, 9.1 and 9.2, and they were not analysed further. The size of the locus 3.1 interval, ~2 Mb, is broadly consistent with the above theoretical prediction of genetic resolution in the founder colonies.
TOLL 11 displays protective function against P. falciparum
Bioinformatic filtering based on interpretable predicted gene functional and other evidence of the 73 predicted coding sequences within Pfin7 (Additional file 5: Table S3) prioritized two genes encoding Toll-family proteins, TOLL 10 and TOLL 11. A functional test of TOLL 11 effect by RNAi-mediated gene silencing followed by challenge with P. falciparum reveals that TOLL 11 mediates significant protection against oocyst infection (Fig. 4). Silencing of TOLL 11 caused an increase in oocyst prevalence of 16-38 % across three replicates (mean 24.5 %). The consistent increase in infection prevalence across three replicates was highly significant (p = 0.0008, p-values combined by the method of Fisher; individual replicates p = 0.118, 0.001, 0.086 Fig. 4a). Loss of TOLL 11 function incurred a mean risk ratio for oocyst infection of 1.71 (comparison of infected and uninfected categories; individual replicates = 1.26, 1.81, 2.08 respectively). There was no effect of TOLL 11 silencing upon a distinct phenotype, oocyst intensity (p-values = 0.850, 0.848; combined = 0.957, Fig. 4b).
TOLL 10 did not show a significant effect for either infection prevalence or intensity (Fig. 4c, d). Uninfected/infected risk ratios (0.56, 0.69, mean = 0.63) did not indicate a phenotype for infection prevalence, (p = 0.117, 0.188, combined = 0.106, Fig. 4c). Results for infection intensity were also negative (p = 0.538, Fig. 4d). Although not significant, the results displayed a tendency that could be suggestive of a weak phenotype in which reduction of TOLL 10 transcript levels may lead to a lower infection prevalence, opposite to the TOLL 11 phenotype. Further work would be necessary to determine whether TOLL 10 may display a significant phenotype under other conditions or genetic backgrounds.
Toll-family proteins are defined based on shared structural features [21]. Despite structural relatedness with the Toll receptor, TOLL 1, the other Toll-family members have not been well-characterized, and are not necessarily immune receptors. Even in Drosophila, functions of Toll-family proteins or their signaling pathways, besides TOLL 1, are also largely unknown [22]. In Anopheles, TOLL 1 and the Toll pathway are required for protection against rodent malarias, P. berghei and P. yoelii, while protection against the human malaria parasite P. falciparum is dominated by the IMD pathway [23, 24]. To our knowledge, the current results are the first report of a Toll-family member displaying protective function against P. falciparum. This also represents one of the rare reports of immune function for Toll-family proteins other than TOLL 1. TOLL 11 remains a candidate gene for control of Plasmodium susceptibility, and future work will be necessary to determine whether naturally occurring genetic variants in TOLL 11 are associated with differential susceptibility to P. falciparum.