Oligonucleotide Array Comparative Genomic Hybridization (oaCGH) based characterization of genetic deficiencies as an aid to gene mapping in Caenorhabditis elegans

Background: A collection of genetic deficiencies covering over 70% of the Caenorhabditis elegans genome exists, however the application of these valuable biological tools has been limited due to the incomplete correlation between their genetic and physical characterization. Results: We have applied oligonucleotide array Comparative Genomic Hybridization (oaCGH) to the high resolution, molecular characterization of several genetic deficiency and duplication strains in a 5 Mb region of Chromosome III. We incorporate this data into a physical deficiency map which is subsequently used to direct the positional cloning of essential genes within the region. From this analysis we are able to quickly determine the molecular identity of several previously unidentified mutations. Conclusion: We have applied accurate, high resolution molecular analysis to the characterization of genetic mapping tools in Caenorhabditis elegans. Consequently we have generated a valuable physical mapping resource, which we have demonstrated can aid in the rapid molecular identification of mutations of interest.


Background
A large resource of deletion strains (also known as genetic deficiencies) accounting for over 70% of the Caenorhabditis elegans genome has been generated by various research groups over the past three decades [1]. These genetic deficiencies have proven advantageous for a variety of purposes including; characterization of mutant alleles [2], identification of specific loci affecting developmental processes [3], investigation of genome replication and sta-bility [4,5] and, most significantly, as tools for positional cloning of unmapped mutations to discrete regions of the genome [1,[6][7][8].
The full potential of these biological tools has however been limited due to the lack of high resolution characterization at a genome wide scale. The mapping of physical breakpoint positions within each deficiency strain is required to allow the deleted gene complement for that strain to be precisely defined. Additionally, genetic deficiencies may exhibit molecular complexity preventing their reliable use in mapping experiments [9].
Previously, characterization of genetic deficiencies has been performed by fairly low resolution or labor intensive techniques such as genetic linkage mapping, PCR analysis [7] and, more recently, by the application of snip-SNP [9,10] and as a consequence many available deficiency strains remain poorly characterized.
Oligonucleotide array Comparative Genomic Analysis (oaCGH) is an emerging technology for high resolution mapping of chromosomal copy number changes at a genome wide scale through the comparison of the DNA ratio between two samples from the same organism [11,12]. The recent development of a C. elegans specific oaCGH platform for identification of novel single gene deletions [13] represents a powerful technology that can be adapted to the rapid and precise characterization of deficiency mapping strains.
In this study we demonstrate the successful application of oaCGH to the physical characterization of deficiency strains in C. elegans. We use this data to annotate a physical deficiency map within a 5 Mb region of chromosome III and demonstrate the application of this map to aid in the molecular identification of previously generated mutations known to reside within this region.

Results and Discussion
oaCGH mapped deletions and duplications physically define 17 zones around the dpy-17 region of Chromosome III 7 deficiencies and 2 duplications lying in the region of dpy-17 on chromosome III were chosen for oaCGH analysis (Nimblegen) as they have been previously characterized by both genetic linkage and PCR analysis, and used to roughly position a large number of unidentified EMS generated lethal mutants [7]. After oaCGH mapping had precisely defined the gene complement for each of these deficiencies a refined candidate gene approach was implemented to rapidly identify mutations in essential genes which map to this region.
Available mapping data for deficiency strains positioned deletion and duplication breakpoints with an average resolution of 117 kb. With the incorporation of the oaCGH data however, breakpoints resolved to within an average of 5.6 kb and subsequent analysis through PCR amplification and sequencing allowed for the identification of precise physical breakpoints in strains containing nDf20, sDf121, sDf125, sD128 and sDf135 ( Table 1). The remaining deficiency strains, sDf127 and sDf130, both contain Average resolution (kb) 117 5.6 0 0 a Breakpoint coordinates aligned to Wormbase (release WS170). b sDp3 structure is inferred from the background of the oaCGH data. c The left breakpoint of sDp3 is fragmented over a large region. d The right breakpoint of sDf127 falls within a 50 kb region of low complexity and therefore low probe density and could not be accurately defined. e The second deleted region in sDf130 was not be resolved by PCR analysis.
breakpoints which reside in relatively large intergenic regions and since the oaCGH array used for these samples has low probe density in intergenic regions [13] PCR and sequencing analysis was not feasible in these cases.
By using the refined mapping data obtained from the oaCGH analysis we annotated the deleted gene complement within each mapping strain, creating a physical map of zones defined by the overlap of the deficiencies and duplications within the region. The resulting map extends across almost 5 Mb of the genome and contains 17 zones with an average size of 323 kb ( Figure 1). Zone 13 is the largest region, covering 929 kb and containing 204 predicted genes, while zone 8 is the smallest at 22 kb and contains 5 predicted genes (Table 2). Finally, positional mapping data available for mutations known to fall within this region was incorporated into the map, and in this way we have been able to assign each mutation to an accurate and precisely defined list of gene candidates (Figure 1. and Additional file 1).

The free translocation sDp3 undergoes significant modification within deficiency strains
In our initial set of experiments we performed oaCGH using genomic DNA from three representative deficiency strains balanced by the free duplication sDp3 ; BC4697  wild type Bristol N2 DNA as the reference sample. The expected ratio of reference DNA to test using this approach is 1:1 over the region of the chromosome not covered by sDp3, 2:3 over the region covered by sDp3 alone and 2:1 in the region of the deficiency balanced by sDp3 (Figure 2A).
This initial analysis revealed that the extent of sDp3 varies dramatically between each of the strains tested ( Figure  2B-D). Since duplications maintained in balancer strains are known to breakdown both spontaneously and under the mutagenic conditions used to induce deficiencies [14][15][16] it is likely that the observed modifications to sDp3 are as a result of these factors.
In addition to the variability of sDp3 coverage, discrete duplications internal to sDp3 are observed within each of the strains tested ( Figure 2). This lack of uniformity makes determination of deficiency structure problematic in some cases. For example, the presence of an internal duplication in BC4697 across the region containing the left breakpoint of sDf121 results in an increase of predicted copy number in this region ( Figure 2F), while multiple variations within both sDp3 balanced strains containing sDf127 and sDf128 complicates the distinction between where DNA has been deleted or duplicated (Figure 2C-E and data not shown).

oaCGH of heterozygous deficiencies resolves copy number ambiguity and results in precise mapping of deletion breakpoints
Since heterozygous deletions can be easily resolved with oaCGH [13] we surmised that the creation of single copy deficiency strains would eliminate the ambiguity introduced through the use of the sDp3 balancer. By using this approach the expected ratio of reference DNA to test is 1:1 along the whole length of the chromosome except within the deleted region where a ratio of 2:1 will be expected ( Figure 2B).
To implement this strategy all of the sDp3 balanced deficiency strains to be tested were out-crossed to JK2689, which contains the GFP marked heterozygous balancer hT2 (qIs48) known to cover the region under investigation in this study [1]. For this analysis genomic JK2689 DNA was used as the reference sample to eliminate any genomic variability that may have been introduced into this strain through the heavy mutagenesis used for its original construction [17].
With the elimination of sDp3 from the background the ability to accurately resolve breakpoint positions in the ambiguously mapped deficiency strains improved significantly. A comparison of oaCGH data for sDf127 balanced by sDp3 and this same deficiency out-crossed to hT2 (qIs48) highlights this improvement ( Figure 3). As part of their construction, deficiency strains have been outcrossed several times and it is therefore unlikely that the process of creating the hT2 balanced strains has removed any features of these deficiencies. The oaCGH data oaCGH experimental overview and sDp3 balanced deficiency data obtained for each strain using the modified strategy is therefore likely to represent the initial structure of each of the deficiencies tested.

Combining data from multiple array experiments allows for bias removal resulting in reliable deficiency characterization
oaCGH data obtained from separate strains using the same chip design performs consistently enough to allow for the integration and direct comparison of the data generated from multiple experiments. Through this data integration, common inconsistencies that are not specific to the deletion of interest are identified and eliminated from the annotation, resulting in a reliable characterization of the whole deficiency genome ( Figure 3B and 3C). This is an important step given that small mutagenic events which may have occurred outside the known boundaries of a given deficiency may remain unidentified by traditional mapping methods and could lead to conflicting mapping data.

oaCGH analysis reveals deficiency complexity on a genome-wide scale
We have analysed a number of deficiencies generated with both GRI and UV irradiation (Table 1 and data not shown), and have been unable to detect any significant mutagen specific deletion characteristics, the majority having a simple contiguous structure ( Figure 3B and 3C, and data not shown). Exceptions to this observation are seen in the deficiencies sDf130, which contains two distinct deletions, the smaller of which defines zone 10 (Figure 1), and sDf128 which exhibits associated complexity.
In a similar case as to that seen with sDp3 balanced sDf121, several duplications of various sizes extend into the deficiency region of sDf128 ( Figure 2C and 2E). Unlike sDf121 however, these duplications are retained when sDf128 is out-crossed to hT2 (Figure 4), suggesting that they have been integrated into the deficiency genome. Attempts to further characterize the complexity of sDf128 through PCR amplification across the predicted deletion breakpoints failed, suggesting that integration of a duplicated region has occurred within the deficiency itself. Subsequent amplification of a PCR product spanning the left end of the largest duplication and the right Comparison of oaCGH data in the sDp3 and hT2 (qIs48) balancer backgrounds Figure 3 Comparison of oaCGH data in the sDp3 and hT2 (qIs48) balancer backgrounds: hT2 [qIs48]). Unambiguous deficiency mapping for each experiment is depicted as a solid black line while ambiguous regions are represented as a dashed grey line. Genes residing at the confirmed sDf127 breakpoints are shown (grey bars). Non-deficiency specific copy number changes introduced by the background strain can be seen (B and C solid arrows). Note: oaCGH array used in A is the Nimblegen design [20] while the array used in B and C is from the exon-centric array described by Mayden et.al [13] (see Methods).
end of the deleted region confirms the integration of the duplicated fragment within the deletion (Figure 4). Confirmation of the structure of the left end of the insertion, or elucidation of the position of the smaller duplicated region, could not be achieved leading to the conclusion that this region is physically complex.

Accurate identification of lethal mutations positioned within the deficiency map
A more detailed analysis of Zone 1B highlights the value of applying the physical deficiency map to the identification of uncloned mutations. This zone is defined by the deletion breakpoints of the deficiency sDf128 and contains three molecularly unknown lethal mutants; let-722, let-754 and let-766. The region is 78 kb in size, contains 21 predicted genes and is spanned by the cosmids; C29E4, F54H12, C06G4 and F44B9 ( Figure 5).
Since 39 of the 40 molecularly identified lethal mutants in Wormbase [18] exhibit a variety of lethal or sterile RNAi phenotypes (Additional file 1), genes within each zone which display these RNAi phenotypes make good candidates for the lethal mutations mapping to the same region. Of the 21 genes within zone 1 B, only 6 exhibit lethal phenotypes by RNAi ( Figure 5 and Additional file 1) and these 6 genes were consequently taken as initial candidates of the three lethal mutants in this region.
To identify DNA lesions the coding region of each candidate gene was PCR amplified and sequenced using homozygous mutant DNA from a representative lethal strain as template. In this way both let-722 and let-754, were identified as being alleles of aco-2 and C29E4.8 respectively (summarized in Table 3). let-766 however was not identified from this approach and we conclude that this mutation either maps into one of the remaining 19 genes within this region, or a gene outside the deficiency which may have be disrupted through integration of the small unmapped duplication present in this strain.
Finally, through the incorporation of mapping data generated by our group from ongoing efforts to clone lethal mutations, we have been able to further identify the mutations let-716 and let-768 as alleles of the genes C16A3.3 and fum-1, respectively (our unpublished data and summarized in Table 3).
sDf128 associated complexity

Conclusion
We have demonstrated that oaCGH can be successfully applied to the rapid and precise characterization of existing C. elegans genetic deficiencies. This study highlights how this type of analysis can transform low resolution genetic mapping tools into a precise physical mapping resource with which to accurately position molecularly unidentified mutations.
The implementation of oaCGH technology for this purpose is straightforward, requiring only the preparation of high quality genomic DNA from the deficiency strains of interest. Data is generated in a short time and can be veri- Schematic representation of zone 1 B as defined by the deficiency sDf128 fied experimentally by standard molecular techniques. Some caveats to the application of this methodology have been highlighted by the initial difficulties we describe resulting from the use of animals balanced by a free duplication, though we have demonstrated that by considered evaluation of the experimental approach these issues can be circumvented easily.
We envisage that this resource will be instrumental in resolving previously ambiguous genetic mapping data. Additionally, we propose that integration of a physical deficiency map into a high-throughput molecular mapping strategy such as snip-SNP will improve the efficiency of positional cloning in C. elegans. In such an approach snip-SNP would be employed to map the mutant of interest into a sub-chromosomal region, this region could then be further broken up by physically defined deficiencies, and a candidate sequencing pipeline, such as the one we describe here, employed to identify the molecular position of the mutation of interest.

Nematode culture, harvest, and DNA preparation
Nematodes were grown as previously described [ DNA preparation for oaCGH samples was done as previously described [13]. For mutant sequencing ~50 homozygous lethal animals were picked from the progeny of sDp3 balanced strains into 5 ul of lysis buffer (50 mM KCl, 10 mM Tris pH 8.3, 2.5 mM MgCl2, 0.45% Tween-20, 0.01% Gelatin) and freeze cracked in Liquid N2 followed by digestion (1 hour at 60°C followed by 15 min at 95°C). 1 ul of this DNA preparation was subsequently used in a 25 ul PCR reaction.

oaCGH Data analysis
For the analysis of sDp3 balanced strains a C. elegans whole genome array was used, based on Wormbase release WS120, and available from NimbleGen Systems Inc. [20]. Analysis of the hT2 (qIs48) balanced strains was performed with a whole genome C. elegans array designed with overlapping 50-mer probes targeting primarily annotated exons and micro-RNAs [13]. oaCGH sample preparation, hybridization and analysis was done as previously described [13]. Copy number aberrations were detected by visual inspection using the SignalMap™ browser software [20]. The data discussed in this publication have been deposited in NCBIs Gene Expression Omnibus (GEO) [21] and are accessible through GEO Series accession numbers GSE9214 and GPL6047.

Molecular identification of deficiency breakpoints and mutations
PCR amplification across the region of the breakpoint was performed with nested primers (Primer sequences and amplification conditions available upon request) and sequenced using standard molecular methods.