Genome-based polymorphic microsatellite development and validation in the mosquito Aedes aegypti and application to population genetics in Haiti

Background Microsatellite markers have proven useful in genetic studies in many organisms, yet microsatellite-based studies of the dengue and yellow fever vector mosquito Aedes aegypti have been limited by the number of assayable and polymorphic loci available, despite multiple independent efforts to identify them. Here we present strategies for efficient identification and development of useful microsatellites with broad coverage across the Aedes aegypti genome, development of multiplex-ready PCR groups of microsatellite loci, and validation of their utility for population analysis with field collections from Haiti. Results From 79 putative microsatellite loci representing 31 motifs identified in 42 whole genome sequence supercontig assemblies in the Aedes aegypti genome, 33 microsatellites providing genome-wide coverage amplified as single copy sequences in four lab strains, with a range of 2-6 alleles per locus. The tri-nucleotide motifs represented the majority (51%) of the polymorphic single copy loci, and none of these was located within a putative open reading frame. Seven groups of 4-5 microsatellite loci each were developed for multiplex-ready PCR. Four multiplex-ready groups were used to investigate population genetics of Aedes aegypti populations sampled in Haiti. Of the 23 loci represented in these groups, 20 were polymorphic with a range of 3-24 alleles per locus (mean = 8.75). Allelic polymorphic information content varied from 0.171 to 0.867 (mean = 0.545). Most loci met Hardy-Weinberg expectations across populations and pairwise FST comparisons identified significant genetic differentiation between some populations. No evidence for genetic isolation by distance was observed. Conclusion Despite limited success in previous reports, we demonstrate that the Aedes aegypti genome is well-populated with single copy, polymorphic microsatellite loci that can be uncovered using the strategy developed here for rapid and efficient screening of genome supercontig assemblies. These loci are suitable for genetic and population studies using multiplex-PCR.


Background
The mosquito, Aedes aegypti, is the principal global vector for the yellow fever and dengue viruses, and also one of the best genetically characterized insects [1]. Of African origin, Ae. aegypti has successfully colonized most subtropical and tropical regions of the world, largely as a consequence of human activities. This mosquito has been and remains the most commonly studied mosquito species, particularly for genetic analyses of disease vector/pathogen interactions because it breeds in small water-holding containers, its eggs are resistant to desiccation and persist in a pre-embryonated state, and it readily adapts to laboratory culture. Detailed genetic studies have emerged from linkage maps for Ae. aegypti generated from isozyme and mutant marker loci [2], RAPDs [3], RFLPs [4,5], and SSCPs [6]. Demonstration that RFLP markers based on cDNAs had inter-specific utility [7] facilitated development of comparative linkage maps for several mosquito species [8][9][10][11][12].
Microsatellites are simple sequence repeats of tandem 1-6 base motifs that are frequently distributed throughout eukaryote genomes. Because repeat number at individual loci can vary among individuals and polymorphisms can efficiently be uncovered using PCR, microsatellites have become powerful tools for genetic studies in many organisms [13][14][15]. Of interest, useful microsatellite loci in some organisms including Ae. aegypti are not abundant or are recalcitrant to common methods of identification. In Ae. aegypti, these include microsatellite enriched genomic library construction and screening [16][17][18], examinations of expressed gene coding sequences [19,20], and oligonucleotide-based screening of select cosmid genomic clones [18]. Disappointingly, the combined efforts of these studies resulted in only 20 useful microsatellite marker loci, several of which showed reduced polymorphism. These results were most likely due to their close association with repetitive elements as opposed to microsatellite frequency in the Ae. aegypti genome [18]. Availability of a partial Ae. aegypti genome sequence in 2005 provided the opportunity to perform genome scans for microsatellites and, indeed, an additional 13 polymorphic microsatellites were uncovered [21].
Here we present a systematic approach to efficient polymorphic microsatellite marker development in Ae. aegypti based on intensive scans of supercontig assemblies from the whole genome shotgun sequence (wgs) assembly for Ae. aegypti [22]. In addition, we identified multiplex combinations of microsatellite loci that facilitate rapid genome-wide genotyping and demonstrate the utility of these microsatellite loci in a preliminary investigation of Ae. aegypti population genetic structure in Haiti.

Microsatellite identification, assays and utility
Tandem Repeats Finder (TRF) [23] was used to systematically screen 42 wgs supercontig sequence assemblies in the Ae. aegypti genome for polymorphic single copy microsatellites ( Figure 1). The supercontigs were selected on the basis of containing previously characterized genetic marker loci distributed across all three Ae. aegypti chromosomes [5]. Of 75 putative microsatellite loci tested, we determined that 44 amplified as single copy sequences in all or some of the four mosquito lab strains tested, of which 33 were found to be polymorphic across the four strains with a range of 2-6 alleles per locus (Additional File 1). These included 18 loci on chromosome 1, 5 loci on chromosome 2, and 10 loci on chromosome 3. Of the remaining 31 putative loci, 28 were determined to represent multicopy sequences and four sequences failed to amplify. In addition, four supercontigs contained no useful microsatellites based on our selection criteria. Chromosome locations for supercontigs and associated microsatellites were assigned based on the linkage map positions of the previously defined genetic loci. An additional 28 putative microsatellite loci amplified as multiple copies. No microsatellite sequences were evident in four supercontigs. Thus, direct scans of Ae. aegypti supercontigs provided a rapid and efficient mechanism for developing useful microsatellite loci and also the opportunity to leverage existing information on supercontig genome positions relative to the existing genetic linkage map. When coupled with the previously described 33 microsatellite loci [16,[18][19][20][21], this effort has doubled the number of available polymorphic loci.
We tested microsatellites representing 31 motifs (1-6 bp); these included one single nucleotide (n = 3 sequences), five di-nucleotide (n = 27), 18 tri-nucleotide (n = 35), six tetra-nucleotide (n = 9), and one hexa-nucleotide (n = 1) motifs ( Table 1). The single-copy polymorphic microsatellites comprise 22 independent motifs, of which 13 were tri-nucleotide motifs and these represented the majority (18 of 33) of the polymorphic single copy loci. Of particular note, 51% (18 of 35) of the tri-nucleotide and 67% (6 of 9) of the tetra-nucleotide microsatellites were polymorphic single copy loci, while only 33% (9 of 27) of the di-nucleotide microsatellites were polymorphic single copy loci. Although a small number of polymorphic trinucleotide microsatellite loci contained within coding regions have been identified in previous studies [20], BLAST analyses against the annotated Ae. aegypti genome assembly at VectorBase [24] indicated that none of our polymorphic tri-nucleotide microsatellites were within putative coding regions.
Approach to genome-based microsatellite identification, validation, and analysis in Aedes aegypti Figure 1 Approach to genome-based microsatellite identification, validation, and analysis in Aedes aegypti.
To improve the utility and efficiency of microsatellites for genotyping applications in Ae. aegypti, we developed seven groups of 4-5 loci each for multiplex-ready PCR [25] ( Table 2). Individual loci in each group were selected to provide broad genome representation and relatively uniform amplification under the same PCR conditions when multiplexed. PCR groups 1A and 4A represent slight variants on groups 1 and 4, respectively: most of the loci are common among the respective groups with some interchange of microsatellite loci that provide for potential diversity of chromosome coverage but still amplify well as multiplex PCR groups. Primers were designed to generate amplicons from ~150-400 bp and were fluorescently labeled for analysis by capillary electrophoresis. We included four microsatellite loci described elsewhere [18,21] in some of the groups. However, in conjunction with optimizing amplicon sizes for multiplex-ready PCR, we designed at least one new primer for each of these loci (Additional File 2).

Genetic patterns of Aedes aegypti populations in Haiti
We used multiplex-ready PCR groups 1, 1B, 2, 3A and 4 to conduct investigations of seven Ae. aegypti populations sampled in Haiti during June 2008. PCR groups 1A and  note that each of these loci is located at or near the end of a linkage group; the associated supercontigs contain the genetic loci LF347, LF115, and AEGI8, respectively [5]. However, after excluding these three loci, most loci met Hardy-Weinberg (HW) expectations, with the exception of the Port au Prince population where seven of the remaining 17 loci showed significant HW deviations. The observed HW deviations across all populations were due to heterozygote deficits.
Significant population differentiation was observed with 10 of 21 (48%) pairwise F ST comparisons ( against the natural logarithm of the distance between sites (Additional File 4). We found no association between them (R 2 = 0.0355, P = 0.41). That is, while distances between sites varied from ~1.4 to 44.5 km, the observed levels of genetic differentiation between sites were sometimes high and sometimes low irrespective of distance. This result is typical for Ae. aegypti populations as adults generally travel very short distances from breeding sites in a lifetime, often ~100 m or less, with some evidence for greater but still modest dispersal (~800 m) [27][28][29][30][31].
Longer range dispersal and population differentiation are more likely to reflect the effects of mosquito transport via human activities than relative distances among breeding sites and active dispersal by individual mosquitoes.

Conclusion
We demonstrate that the Ae. aegypti genome is well-populated with microsatellite loci suitable for genotyping and outline an efficient strategy for identifying and validating microsatellites from genome supercontig assemblies. While multiple repeat motifs were evident and represented as single copy sequences, tri-nucleotide microsatellites were the most common, and with tetra-nucleotide microsatellites, the most applicable to development as genetic loci. We developed several multiplex-ready PCR groups of microsatellite loci that permit rapid genotyping, and demonstrate their utility with Ae. aegypti population samples from Haiti. We observed high polymorphism with a mean of 8.75 alleles per locus, high allelic polymorphic information content (PIC), and evidence for population differentiation even across relatively short geographic distances as is often reported for Ae. aegypti.

Mosquito strains and populations
Preliminary screens of microsatellites for single copy number and polymorphism were evaluated among individuals from four Ae. aegypti laboratory colonies, Liverpool-IB12, MOYO-R, Trinidad, and Haiti. The laboratory strains have been maintained as colonies for an unknown number of generations and likely carry reduced polymorphism compared to field-collected individuals. The Liverpool-IB12 strain was the source for the Ae. aegypti genome  project [22], details on the MOYO-R and Trinidad strains are provided elsewhere [32], and the Haiti strain was established from ovitrap samples collected in 2006.
Field samples from Haiti were collected from three localities (Port-au-Prince, Grand Goave, and Leogane) during June, 2008; samples from Leogane were collected at five different regions in the city (Barriere-Jeudy, Bino, Ca-Ira, Chawa, La Poudriere). Port au Prince and Grand Goave were separated by the greatest distance (~44.5 km). All sites in Leogane were within ~10 km of each other with La Poudriere and Ca-Ira being the closest (~1.3 km). At each site, samples included larval collections from containers around households, standard ovitrap collections with 10 traps at each site, or both larval and ovitrap collections. Ovitrap sampling and mosquito rearing were performed generally as reported previously [33]. Genotype data for all individuals obtained at each sample site were pooled for subsequent analysis.

In silico identification of microsatellites in the Aedes aegypti genome assembly
Bioinformatic analyses targeted the identification and development of useful microsatellite loci at ~10 cM intervals across each of the three Ae. aegypti chromosomes. Supercontig assemblies for microsatellite scans were identified by BLASTn analysis against the Ae. aegypti genome (version AaegL1, March 2006) at VectorBase [24] with sequences previously mapped as RFLP, SNP and SSCP genetic markers [5]. Supercontig assemblies containing individual marker loci were then downloaded from Vec-torBase and screened with the Tandem Repeats Finder (TRF) program using default parameters [23]. The TRF output was manually scanned and, in most cases, tandem repeats with a period size of 2-4 bp and repeat copy number less than 30 were arbitrarily selected for further analysis.

Primer Design
In preparation for primer design, a ~400-600 bp sequence containing a microsatellite of interest was extracted from the supercontig sequence and subjected to BLASTn analysis against the Ae. aegypti genome sequence at VectorBase to verify that the microsatellite flanking sequences were not highly repetitive. PCR primers were designed for those sequences showing minimal repetitive sequence using Primer3 v.4.0 [34], with the amplicon size target set at 150-400 bp. Individual primer pairs selected from the Primer3 output were also subjected to BLAST analysis to verify that they represented single copy sequences in the Ae. aegypti genome.

PCR Amplification
DNA extractions on individual mosquitoes were performed following a rapid, simple alkaline method [35].
DNA was suspended in a final volume of 1600 μl containing 0.01 M NaOH and 0.018 M Tris-HCl, pH 8.0. Amplification was performed in 25 μl volumes in 96-well PCR plates (Dot Scientific) in a Mastercycler thermocycler (Eppendorf). Each reaction contained 1× Taq buffer (50 mM KCl, 10 mM Tris pH 9.0, 0.1% Triton X), 1.5 mM MgCl 2 , 200 μM dNTPs, 5 pmoles of each primer, 1 unit of Taq DNA polymerase, and 1 μl of genomic DNA as prepared above. Thermocycling conditions were: 5 minutes at 94°C, followed by 30 cycles of a 1 minute denaturation at 94°C, a 1 minute anneal at 60°C, a 2 minute extension at 72°C, followed by a 10 minute final extension cycle at 72°C. PCR products were size fractionated by electrophoresis in 2% agarose gels stained with ethidium bromide, and visualized under UV light.

Polymorphism Determination and Multiplex PCR
Microsatellites with single copy amplicons based on agarose gel screens were assayed for allelic polymorphisms on 6% denaturing polyacrylamide gels using the GenePrint ® STR System (Promega). Data for single copy sequences have been submitted to the GenBank STS database (Additional File 5). Select primer pairs for loci that showed polymorphism among strains were evaluated and assembled into multiplex groups of four or five loci per group. Multiplex group criteria included efforts to combine microsatellite loci that provided broad coverage across each chromosome and exhibited detectable amplicon size differences on agarose gels. Multiplex groups were tested for amplification with DNA from single mosquitoes in 25 μl PCR reactions as outlined above.

Fragment Analysis and Genotyping
Flurochrome-labeled forward primers (6-FAM ® , HEX ® , NED ® ) were synthesized by Integrated DNA Technologies and Applied Biosystems for each primer pair that successfully amplified in the multiplex group. Multiplex PCR products were diluted 1:10 in sterile water and 1 μl of this dilution was added to 9 μl of a mixture of HiDi Formamide ® (ABI #4311320) and ROX 400HD ® standard (ABI #402985) in 96 well PCR plates. The samples were then denatured for 2 minutes at 95° and immediately placed on ice. Plates were kept covered during processing due to the light-sensitive standard and dye-labeled primers. Genotyping was performed using an ABI 3730 Genetic Analyzer with the GeneMapper ® v.4.0 software package.