Large-scale mapping of mutations affecting zebrafish development

Background Large-scale mutagenesis screens in the zebrafish employing the mutagen ENU have isolated several hundred mutant loci that represent putative developmental control genes. In order to realize the potential of such screens, systematic genetic mapping of the mutations is necessary. Here we report on a large-scale effort to map the mutations generated in mutagenesis screening at the Max Planck Institute for Developmental Biology by genome scanning with microsatellite markers. Results We have selected a set of microsatellite markers and developed methods and scoring criteria suitable for efficient, high-throughput genome scanning. We have used these methods to successfully obtain a rough map position for 319 mutant loci from the Tübingen I mutagenesis screen and subsequent screening of the mutant collection. For 277 of these the corresponding gene is not yet identified. Mapping was successful for 80 % of the tested loci. By comparing 21 mutation and gene positions of cloned mutations we have validated the correctness of our linkage group assignments and estimated the standard error of our map positions to be approximately 6 cM. Conclusion By obtaining rough map positions for over 300 zebrafish loci with developmental phenotypes, we have generated a dataset that will be useful not only for cloning of the affected genes, but also to suggest allelism of mutations with similar phenotypes that will be identified in future screens. Furthermore this work validates the usefulness of our methodology for rapid, systematic and inexpensive microsatellite mapping of zebrafish mutations.


Background
Large-scale mutagenesis screens in the zebrafish employing the mutagen ENU have isolated several hundred mutant loci that represent putative developmental control genes [1,2]. In order to realize the potential of such screens, systematic genetic mapping of the mutations is necessary. Genome scanning by bulked segregant analysis with microsatellite markers is the method of choice for such purposes, as a rough map position can be quickly obtained [3,4]. In the zebrafish it is easy to perform mapcrosses against a polymorphic reference line, followed by brother-sister matings among the F1 generation. Linkage to a microsatellite marker can then be found by comparing the band intensities of marker alleles in a pool of mutant F2 individuals with a pool of their wildtype siblings. Because full sibships are analyzed the genetic distance between the mutant locus and a microsatellite can be determined by a simple count of recombinations.
The established reference map for the zebrafish genome is the MGH map [5][6][7] which was generated by scoring 3,881 microsatellite markers (all of them CA repeats) on a panel of 48 diploid F2 fish of an India × AB reference cross. It covers 2,295 centimorgans (cM) at a resolution of 1.2 cM. Because the MGH markers do not necessarily show a usable polymorphism in reference crosses of Tü × WIK our first task was to identify markers that could be used in such a cross.

Selection of markers for genome scanning
Two sets of microsatellite markers for scanning the genome were developed in parallel with the mutant mapping effort. The starting point was the testing of 314 markers for polymorphism in Tü × WIK crosses [8]. 72 markers (3 per chromosome) were selected that showed a polymorphism between Tü and WIK, with bands easily distinguishable on agarose gels, in at least three out of five reference crosses (zebrafish genome scan set version 1, or G1). Additional markers from the MGH map that had shown a robust polymorphism in fine-mapping experiments were subsequently added, while markers that never gave any confirmed linkage in our experiments or that were omitted from the MGH map were removed from our set, eventually resulting in the G4 set of 192 markers [9]. An alternate set of markers was generated by testing another 1,092 microsatellite markers from the MGH map in five reference crosses. 178 of these markers were polymorphic in all five reference crosses. Together with 14 additional markers these were selected for the H2 set of 192 markers ( Table 1).
The average distance between markers of the G4 set is 11.6 cM, and all distances are smaller than 36 cM, except for a 71.1 cM interval on LG21 (between Z4425 and Z1497). Within this particular interval few MGH markers are available, and no suitably polymorphic marker could be identified in our reference crosses. For the H2 set the average distance is 11.5 cM, and all distances are smaller than 53.8 cM, except for a 83.3 cM interval on LG21. The more uneven chromosomal distribution of markers in the H2 set reflects the fact that frequently the best markers available were already used in the G4 set.
Our mapping methodology as described below can theoretically detect significant linkage over a distance of approximately 36 cM (assuming the genotyping of 48 mutant individuals). However, since the LOD score is proportional to the number of individuals scored, this range can be easily increased by adding more mutant individuals if a linkage is questionable. Our marker sets therefore cover the genome adequately to detect significant linkage with the great majority of mutant loci. All the mutant loci mapped in this work have confirmed linkage to at least one G4 or H2 marker (not shown if the closest flanking markers were selected from outside the sets).

Mapping of mutant loci
We report here on the mapping of 319 mutant loci identified in the ENU-based Tübingen I mutagenesis screen [1,2] and subsequent screening among the mutant collection (Additional file 1). For 42 of the loci the corresponding genes have already been identified by other researchers, as listed by the ZFIN database [10]; they are included as controls for our mapping procedure (see below). Not included are 70 successfully mapped loci for which the corresponding genes were already published by The G4 marker set [9] and the newly developed H2 marker set each consist of 192 microsatellite markers from the MGH map [5][6] which we selected for genome scanning in Tü × WIK crosses and electrophoresis on agarose gels. Up to two genome scans per mutation were performed with the G4 set (or its earlier versions). If no linkage could be confirmed and sufficient material was available, another two scans were subsequently performed with the H2 set. Additional markers from the MGH map were occasionally employed for scoring of mutant individuals.
ourselves or such a publication is in preparation, or the carriers of which were lost after mapping.
For each mutation we crossed mutant carriers against the polymorphic reference line WIK which was established in our lab for this purpose [8]. Brother-sister matings were performed in the F1 and the F2 progeny was sorted by phenotype. DNA was prepared on 96-well plates, and aliquots of 36 -48 mutant F2 individuals and their wildtype siblings were pooled. Genome scanning was performed by PCR of the mutant and sibling pools with the markers of the G4 marker set, and the band intensities on agarose gels were quantified semi-automatically using NIH Image software as well as visually assessed to identify potential linkages. Mutant and sibling pools representing up to 24 different mutations were tested in parallel. Verification of the best potential linkages (up to six) for each mutation was then attempted by performing PCR of the respective marker with the individual mutants and siblings that had been used for pooling, and counting the recombinant genotypes (for the genotype data see Additional file 2). Siblings were always included on the same gel as a control to confirm that the marker is polymorphic and the two polymorphic bands appear at the proper frequency. If no potential linkage could be verified for a mutation and sufficient material was available, the procedure was repeated once with the G4 marker set, and another two times with the H2 marker set. If possible, DNA was prepared from a different F1 pair for each genome scan, since the Tübingen and WIK lines used are not isogenic and markers that show no usable polymorphism in progeny of one F2 pair are therefore sometimes usable in progeny of another one.
A potential linkage was considered confirmed if it had a two-point LOD score equal or greater than 3. The individuals were then genotyped for all polymorphic markers from the same marker set and chromosomal region in order to identify, if possible, a pair of markers flanking the mutation, and if that was not possible, the two closest markers on one side of the mutation. Occasionally additional markers not in the chosen marker set were also included in the genotyping. Decisions on whether or not a mutation was flanked by two markers were based on whether recombinations with the markers occurred independently. For details of the mapping procedure and the calculation of map positions see the Methods section and [9].
In total, mapping was attempted for 486 mutations from the Tübingen I screen and subsequent screens of the mutant collection and successful for 389, giving a success rate of 80 %. 12 of these could be mapped only with the H2 set. Unsuccessful mapping experiments were due to difficulties in obtaining sufficient F2 individuals and to PCR problems as well as to a lack of polymorphic markers in our marker set. Among the mutations to be mapped, a group of 63 was prioritized based on interest in their phenotypes. For each of these several additional mapcrosses were set up (data not shown). 56 mutations of this group, or 89 % were successfully mapped, providing a lower limit for the percentage of mutations that our marker sets and methodology is capable of mapping if sufficient F2 individuals are available. The biggest distance to markers on either side at which we could confirm linkage was 31.9 cM (for the mutation spt), approaching the theoretical cutoff of 36 cM.

Chromosomal distribution of mutant loci
Between 1,400 and 2,400 zebrafish genes have been estimated to have visible mutant phenotypes in embryonic and early larval development [1,11]. Therefore the loci reported in this work represent at least one eighth and possibly as much as quarter of all the loci that can be mutated to give a visible phenotype.
The number of mapped loci assigned to each chromosome is between 6 and 32 (on average 12.8 ± 5.8) ( Figure  1). These numbers are not significantly correlated with the number of mutant loci per chromosome identified by insertional mutagenesis in the laboratory of N. Hopkins ( [11,12] and unpublished data, available from ZFIN [10]) (R 2 = 0.02 assuming a linear regression relationship) or with the number of Ensembl genes per chromosome in the Ensembl Zv6 assembly [13] (R 2 = 0.19); by comparison, the values of Amsterdam et al. have a slightly stronger correlation to the number of Ensembl transcripts (R 2 = 0.28). Because mapping with our methodology was successful for 80% of all mutations for which it was attempted, possible deficiencies of the mapping method cannot fully account for this low correlation. Rather, it probably reflects an uneven distribution of genes with specific, visible phenotypes in embryos or early larvae as identified in ENU mutagenesis screening, and the absence of such selectivity in the insertional mutagenesis experiment, demonstrating that both types of mutagenesis experiments complement each other in their coverage of their genome. Moreover, we cannot rule out region-specific differences in ENU mutagenesis efficiency.

Assessment of mapping quality
In order to assess the quality of our mapping data we looked at the 42 mutant loci that were cloned by other researchers. For 21 of these independently derived map positions of the affected gene are publicly available on ZMAP (an integrated map produced by intercalating data from several mapping panels into the MGH genetic map, available from ZFIN [10]; Allen Day, Tom Conlin and John H. Postlethwait, unpublished) ( Table 2).
A comparison of the linkage group assignments shows that two of the 21 genes (frs/slc25a and ovl/ift88) are assigned to a different linkage group by ZMAP, in both cases based on results from the Heat Shock (HS) panel [14][15][16]. However, several published linkages to genetic markers support our linkage group assignment of frs/ slc25a [17] while our assignment of ovl/ift88 is supported by the T51 panel (as shown on the ZFIN website) and by the latest version of the HS map [18]. In conclusion, none of our linkage group assignments is conclusively contradicted by gene mapping.
Next we compared the map positions of the mutations with those of the genes on ZMAP (using the median of the ZMAP positions if a gene was placed on more than one mapping panel). If we assume the gene positions to be correct, we obtain a standard error of our mutant map positions of 6.1 cM. Further assuming a normal distribution of errors, we can predict that approximately 95 % of the genes should be within 12.2 cM (two standard errors) of the rough mapping position of the mutation. Indeed, 17 out of the 19 genes mapped on the same chromosome (90 %) are within two standard errors of the mutation, and 16 out of 19 (84 %) within one standard error. Actually both mutation and gene mapping contribute to the observed errors to an unknown degree, so that 6.1 cM merely represents an upper limit for the standard error of our mapping procedure.  [12] and unpublished data, available from ZFIN [10]), shown for comparison. Numbers for insertional mutations were obtained by searching ZFIN for mutations with a "hi" designation assigned to each linkage group and eliminating multiple hits of the same gene as well as mutations with ambiguous chromosomal assignments. Yellow, Ensembl genes predictions for each chromosome (× 100) (Ensembl release Zv6, available from Ensembl [13]) The number of mapped mutations or genes is indicated on the vertical axis, the linkage group (LG) number on the horizontal axis. LG

Conclusion
We have obtained rough map positions for over 300 zebrafish mutants with an accuracy of approximately 6 cM and thereby validated the usefulness of our methodology for rapid, systematic and inexpensive microsatellite mapping of zebrafish mutations. The dataset that we have produced is a first step towards identification of the genes affected by the 277 mutations that are not yet cloned.
In candidate gene approaches, our data can substantially narrow down the number of candidate genes, since on the order of 99 % of the genome are outside the two-standarderrors confidence limit of our map positions. Positional cloning approaches in the absence of obvious candidate genes will still require fine mapping by genotyping of additional individuals and identification of more closely linked markers, using the flanking markers identified by us as starting points. Particularly thorough fine-mapping is required in centromeric regions because the genetic recombination rate is often several-fold reduced in such regions [19], an effect that can be easily observed in the zebrafish by comparing the genetic map and the radiation hybrid map [9]. Nevertheless, we expect our map positions to be useful even without knowledge of the affected genes, as they can suggest allelism of mutations with a similar phenotype identified in future screens.
We have found that a relatively small number of microsatellite markers is sufficient to scan almost the entire genome and that the experimental procedures are robust and easy to perform. Other methods that have been proposed for the mapping of mutant loci in the zebrafish include half-tetrad analysis with microsatellite markers, genome scanning with SNPs and microarray based SNP mapping. While half-tetrad analysis requires only 25 markers to obtain a linkage group assignment [20][21][22], it has the disadvantage that gynogenetic diploid fish must be generated first, which makes this approach less convenient for high-throughput analysis. In the course of the ongoing zebrafish genome project, more than 50,000 SNPs have been identified [23] offering an enticing alternative to microsatellite markers, but SNP genotyping is far more costly than the agarose based method employed by us. Genotyping of SNPs in a bulked segregant panel is also  LG8 supported by published marker linkages [17]. b Assignment to LG9 supported by the T51 panel (as shown on the ZFIN website) and by the latest version of the HS map [18].
possible by microarray hybridization [24]. However, the SNPs identified to date are specific to the strains they were developed from and may not be informative in mapcrosses performed with different strains (such as ours). Furthermore such a microarray experiment replaces only two steps in our mapping procedure, namely the pooled PCR and its associated gel run, which represent only a minor part of the total mapping effort, as compared to fish breeding, sorting of F2 embryos and confirmation of the bulked segregant results by genotyping of F2 individuals. Future microarray based approaches may make it possible to dispense with the genotyping of individuals entirely, provided that a very large number of SNPs can be multiplexed in a single microarray hybridization such that it immediately provides a reliable map position. Meanwhile, genome scanning with microsatellite markers remains the method of choice as it is equally suitable for the mapping of individual mutations by laboratories with limited genomics resources, and for high throughput projects such as ours.

Fish breeding
Mapcrosses were set up between mutant carriers and the laboratory reference line WIK [8] and brother-sister matings were performed between F1 individuals following standard laboratory procedures [25].

Calculation of map positions
Distances between mutations and markers were calculated by determining the recombination fraction in the mutant F2 individuals and applying the Kosambi mapping function. Linkages with a two-point LOD score equal or greater 3 were regarded as significant.
In order to place a mutation in the genetic interval between the closest marker and another linked marker we determined whether recombinations for both of them were correlated. For this purpose we considered only single recombinants for the closest marker, i.e. heterozygotes. If the majority of these were heterozygous for the second marker we regarded the recombinations as uncorrelated and placed the mutation in the interval between the markers. Otherwise we placed the mutation outside the interval in the direction opposite from the second marker.
Assuming complete meiotic interference, i.e. only a single recombination event per chromosome, all recombinants for the first marker should be either non-recombinant for the second marker if the markers flank the mutation, or heterozygous if both markers are on the same side of the mutation. In our data approximately half of the mutations gave results in between these extremes. This may be due to occasional contaminations of the PCR assays but also to less than complete meiotic interference, which would allow a second recombination in the same individual. We therefore did not eliminate any contradictory individuals from the calculation of genetic distances as they may represent a genuine second recombination.
If a mutation could be placed in an interval between two markers, a map position was calculated by scaling the observed distances between the mutation and the markers so as to fit into the published distance between the markers. In the remaining cases only the distance to the closest marker was used to calculate the map position. A File-Maker Pro 5 database was used to store the scoring data and perform the calculations [9]. The latest version of the MGH map, available through ZFIN [10], was used as a reference for calculating map positions.

Authors' contributions
RG implemented the mapping approach, supervised the project, analysed the results and drafted the manuscript.