A new implementation of high-throughput five-dimensional clone pooling strategy for BAC library screening
© You et al. 2010
Received: 21 August 2010
Accepted: 6 December 2010
Published: 6 December 2010
Skip to main content
© You et al. 2010
Received: 21 August 2010
Accepted: 6 December 2010
Published: 6 December 2010
A five-dimensional (5-D) clone pooling strategy for screening of bacterial artificial chromosome (BAC) clones with molecular markers utilizing highly-parallel Illumina GoldenGate assays and PCR facilitates high-throughput BAC clone and BAC contig anchoring on a genetic map. However, this strategy occasionally needs manual PCR to deconvolute pools and identify truly positive clones.
A new implementation is reported here for our previously reported clone pooling strategy. Row and column pools of BAC clones are divided into sub-pools with 1~2× genome coverage. All BAC pools are screened with Illumina's GoldenGate assay and the BAC pools are deconvoluted to identify individual positive clones. Putative positive BAC clones are then further analyzed to find positive clones on the basis of them being neighbours in a contig. An exhaustive search or brute force algorithm was designed for this deconvolution and integrated into a newly developed software tool, FPCBrowser, for analyzing clone pooling data. This algorithm was used with empirical data for 55 Illumina GoldenGate SNP assays detecting SNP markers mapped on Aegilops tauschii chromosome 2D and Ae. tauschii contig maps. Clones in single contigs were successfully assigned to 48 (87%) specific SNP markers on the map with 91% precision.
A new implementation of 5-D BAC clone pooling strategy employing both GoldenGate assay screening and assembled BAC contigs is shown here to be a high-throughput, low cost, rapid, and feasible approach to screening BAC libraries and anchoring BAC clones and contigs on genetic maps. The software FPCBrowser with the integrated clone deconvolution algorithm has been developed and is downloadable at http://avena.pw.usda.gov/wheatD/fpcbrowser.shtml.
Integrated genetic and physical maps are extremely valuable resources for map-based gene cloning, comparative genome analysis, and sequencing and assembly of large and complex genomes. Screening bacterial artificial chromosome (BAC) libraries is an indispensable step for integration of genetic and physical maps, by which BAC clones and contigs can be placed and ordered on a genetic map. The use of an appropriate BAC pooling strategy [1, 2] maximizes work efficiency. Two different approaches, hybridization-based and PCR-based, are available for BAC library screening. The hybridization-based approach is based on multi-dimensional pools of molecular markers or probes hybridizing with high-density BAC library screening membranes to identify the BAC clones associated with specific nucleotide sequences or genes. Overgo probes have been used for large scale physical mapping of plant and animal genomes, such as those of soybean , maize  and human . Because a single overgo probe may hybridize with clones in multiple contigs and several probes may hybridize to the same clone, the overgo probes often fail to unequivocally associate a contig with a locus on a genetic map. This may arise from gene duplication, repeat sequences in a BAC clone or probe, or false positives.
A PCR-based six dimensional (6-D) BAC clone pooling strategy has been successfully used by other groups for BAC library screening in sorghum , maize  and soybean . This pooling strategy includes a conventional three-dimensional (3-D) stack, called plate pool, face pool and side pool, and an additional three types of pools, called row pool, column pool and diagonal pool. A positive clone in a BAC plate is located by only three types of pools; the other three types of pools are used only for verification. Hence, this strategy not only uniquely defines individual clones and efficiently eliminates false positives but also reduces the tedious task of individual clone verification. However, the six dimensions result in a large number of pools for PCR screening, and limit the size of BAC libraries to be screened. For example, a total of 184 pools were generated for 24, 576 BAC clones (~4× genome equivalent) in sorghum , 288 pools for 110,592 BAC clones (~6× genome equivalent) in maize , and 208 pools for 49,152 BAC clones (~6.6× genome equivalent) in soybean . Because the number of pools that must be screened is a function of the BAC library size, the workload for screening larger BAC libraries, such as those of the grasses in the tribe Triticeae that includes wheat, barley and rye, will be unacceptably high. For example, in the physical mapping project (http://wheatdb.ucdavis.edu) of Aegilops tauschii, the diploid ancestor of the wheat (Triticum aestivum) D genome, five BAC libraries comprise a total of 302,976 clones (in 789 384-well plates), which equal to ~8.5× Ae. tauschii genome equivalents. If the same pooling strategy as had been used in maize were used, 454 pools (131 × 48 × 48 for the basic 3-D stacks plus three additional dimensions with an equal number of pools) would be needed. This would be extremely laborious in DNA pool preparation and PCR screening. In order to reduce pool number and relieve PCR workload, Luo et al. (2009)  evaluated a high-throughput five-dimensional (5-D) clone pooling strategy based on both Illumina's GoldenGate assay and PCR screening of Ae. tauschii BAC clones. The major points of this strategy include: (1) Conventional 3-D grid design (plate, row, and column pools corresponding to plate, face, and side pools respectively in the 6-D strategy [6–8]) with the plate pools further grouped into two-dimensional (2-D) pools, referred to as super pools in Luo et al. 2009 . Because the pooling procedure involves five different DNA pool sets, this design was called a 5-D clone pooling strategy . (2) Super pools screened by Illumina's GoldenGate assay and row and column pools screened by PCR. (3) Positive plate pools at 3-D intersections further verified by PCR to find positive plate pools among the putative positives. This strategy reduces pool number and adopts the highly parallel GoldenGate assay for clone screening, and makes high-throughput clone screening possible for large genomes. With this strategy, 95% of Illumina's GoldenGate EST-based oligonucleotide markers unequivocally assigned BAC clones to loci on the genetic map .
The 5-D clone pooling strategy still requires a considerable effort to identify positive plates among the putative positive plates in super pool screening by PCR assays. In addition, clone row pools and column pools also need to be screened by PCR. Here we describe a new implementation of this strategy, in which clone row and column pools are further divided into sub-pools with 1~2× genome coverage to minimize the number of positive pools among the putative positive pools. The rationale for limiting the number of genome equivalents screened at a single time is that the number of false positives increases exponentially with the number of genome equivalents screened. If a 1× genome equivalent is screened, then there is an average of one plate row and plate column intersection and hence no false positive plate. If a 2× genome equivalent is screened, then there are on average four plate row and plate column intersections with four putative positive plates, of which two are false positive. If a 3× genome equivalent is screened, then there are on average nine plate row and plate column intersections with nine putative positive plates, of which six are false positive.
In the implementation of the screening strategy pursued here, all pools are screened with the Illumina GoldenGate assays. The assay data is then combined with BAC contig data and overlapping clones in BAC contigs are employed as additional information to discriminate between pool intersections that harbour positive clones and intersections that are false positive. A computational algorithm was designed for this implementation of the 5-D pooling strategy, which was integrated into a newly developed software tool, FPCBrowser, for analyzing pooling data.
Clone deconvolution of 55 SNP markers mapped on the genetic map of Ae. tauschii chromosome 2D with the 5-D clone pooling strategy.
No. of contigs
No. of markers
No. of TP markers
No. of F+ markers
No. of markers without solution
To evaluate the accuracy of the clone deconvolution algorithm, two performance metrics, recall and precision, were used. The recall is defined as the number of TP markers deconvoluted by the algorithm divided by the total number of markers analyzed, and the precision is defined as the number of TP markers divided by the total number of markers with solutions deconvoluted by the algorithm. The TP markers are those for which TP clones were assigned by the algorithm. Different versions of contig maps resulted in significantly different deconvolution results. In Assembly 1.1, 48 (87%) out of 55 markers were successfully associated with TP clones (0.87 recall) with a precision of 91% (Table 1). In Assembly 2, only 15% of markers were correctly assigned to TP clones. Therefore, a relatively "perfect" contig map is a prerequisite for clone deconvolution. In our example, Assembly 1.1 approximated the "perfect" assembly the best.
For clone deconvolution, FPCBrowser needs a relatively "perfect" contig map as input (*.fpc, an output file from the FPC software), a 2-D super pool matrix design file and a formatted pool hit file based on the GoldenGate genotyping assay (see details in the FPCBrowser user's guide at http://avena.pw.usda.gov/wheatD/fpcbrowser.shtml) . The clone deconvolution module generates as output a summary file, an anchored positive marker file used for convenient conversion to an ACE file to merge markers into a FPC contig map, and a deconvolution result file for each marker for further manipulation. A Java tool for Ace file conversion directly from the result file is also available in the FPCBrowser package.
Performance of this module depends on several major factors, such as the genome coverage of pools, the number of contigs and contig lengths. When the pooling data of 55 SNP markers and Assembly 1.1 of the contig map were used (Table 1), only 2.1 minutes were needed by a desktop computer (Asus P6T, Intel core i7 920, 12 GB of RAM, and an Ubuntu Linux 9.04 64 bit operating system) to execute the operation. The executable binary version of FPCBrowser is freely available at http://avena.pw.usda.gov/wheatD/fpcbrowser.shtml .
The new implementation of the 5-D clone pooling strategy employs the GoldenGate genotyping assay to screen BAC clones in 4 types of pools, RSP, CSP, RP and CP (four dimensions), which replace PCR screening and verification for positive plates in RSP and CSP and for positive clones in RP and CP in the previous implementation . The RP and CP pools are further divided into sub-pools with 1~2× genome coverage to reduce the number of false positive clones in clone deconvolution. Although the total number of screened pools increases, this does not significantly raise the cost for the high-throughput GoldenGate genotyping assay. The fifth dimension is the information about overlapping BAC clones in a contig map which is used to detect and verify truly positive clones among a pool of candidate clones at the intersections of the preceding four dimensions. A clone deconvolution algorithm and corresponding software FPCBrowser have been developed for this purpose. This new implementation provides a high-throughput and low cost approach to BAC library screening and deconvolution of clone pools, and tremendously reduces work load otherwise required for PCR screening and verification of a large number of pools [6–9]. In a test with 55 SNP markers previously associated with Ae. tauschii contigs via the GoldenGate assay and manual PCR , this implementation yielded a 87% success rate with 91% precision. Some markers resulted in F+ or had no solutions compared to the previous implementation . Improving contig assembly and increasing genome coverage of pools can reduce no-solution and F+ markers.
A "perfect" contig map and adequate pool coverage are two critical components for the clone deconvolution algorithm. Although contig assembly never results in an absolutely perfect contig map because of assembly errors, substandard fingerprinting, chimeric clones, and other reasons, relatively "perfect" contigs can be obtained by an appropriate contig assembly strategy. As long as truly positive clones associated with the marker are overlapping each other in a contig, the contig map for that marker is "perfect", irrespective of the status of the rest of the contig. In addition, if the markers used have been mapped on a genetic map, mapping data can be used for the verification of pool deconvolution. Collocation of markers in a single contig and on a genetic map can validate deconvolution. If a contig is anchored with only a single marker, additional PCR verification of deconvolution may be prudent.
BAC libraries and clones used in the 5-D clone pooling design
No. of clones
No. of plates
No. of clones in a row sub-pool
No. of clones in a column sub-pool
A new implementation of 5-D BAC clone pooling strategy employing both the GoldenGate assay to screen BAC pools and the use of previously assembled BAC contigs is suggested. The implementation is shown to be a high-throughput, low cost, rapid, and feasible approach to screening BAC libraries and anchoring of BAC clones and contigs on genetic maps. The software FPCBrowser with the integrated clone deconvolution algorithm has been developed and is downloadable at http://avena.pw.usda.gov/wheatD/fpcbrowser.shtml .
As described in , a conventional 3-D grid pooling is used as part of our algorithm. This pooling strategy includes three different types of pools: plate pools (PP), clone row pools (RP) and clone column pools (CP). For PP, DNAs of 384 clones present in a plate are pooled. To make the screening of PP more efficient, the plate pools are further pooled as a 2-D array. In this 2-D array, PPs are pooled into plate row super-pools (RSP) and plate column super-pools (CSP). For RP and CP screening, the entire stack of 384-well plates is subdivided into N smaller pools (sub-pools) with 1~2× genome coverage. For RP screening, a total of 16 clones × N row pools are generated for each row. Similarly for CP, a total of 24 × N column pools are generated for each column. The four types of pools (RSP, CSP, RP, and CP) are screened with Illumina GoldenGate genotyping techniques  in this new implementation of the pooling strategy. Details of DNA pooling methods, Illumina GoldenGate genotyping and scoring of genotyping results were described in .
Of 302,976 Ae. tauschii clones contained in BAC and BiBAC libraries , a total of 270,720 were fingerprinted and automatically edited with GenoProfiler , and 199,190 were ultimately used for assembly [18, 19] with FPC [12, 13]. Using different stringencies and end merges, eight Phase I contig maps were generated [18, 19]. These draft maps are available at http://wheatdb.ucdavis.edu .
All of the 302,976 Ae. tauschii clones (789 384-well plates) (Table 2) were pooled into 789 plate pools (each pool containing 384 clones). Plate pools were arranged into a 2-D array, consisting of 27 rows and 30 columns (See Table S1 in Additional file 1). This generated 57 super-pools (27 RSPs, each containing 11,520 clones and 30 CSPs, each containing 10,368 clones), which decreased the number of plate pools for screening to 7%. A total of 190 clone pools, consisting of 80 RPs (5 sub-pools each row, one sub-pool per library) and 110 CPs (5 sub-pools each column, one column sub-pool per library) were generated across all 789 plates. Each clone row sub-pool contained 3,264-4,800 clones and each clone column sub-pool contains 2,176-3,200 clones (Table 2). DNA samples of a total of 217 pools were generated. The pools were screened with Illumina GoldenGate genotyping techniques  for 1,384 SNP markers which have been mapped on the Ae. tauschii D genome genetic map . Among them, 55 SNP markers mapped on chromosome 2D genetic and physical maps and verified by PCR assay  were utilized as test data.
Theoretically the 5-D clone pooling is an unbalanced design because a truly positive clone (a well in a plate) for a molecular marker cannot be uniquely determined from the four types of positive pool hits. Positive plate hits are obtained as intersections of positive plate RSP and plate CSP in the 2-D super pool array (Figure 1 and Table S1 in Additional file 1). Candidates of positive clones are at intersections of the positive plates and positive clone RPs and clone CPs. Truly positive intersections must be distinguished from false positives (F+), and such F+ must be removed. For example, pools of clones equivalent to 4× genome coverage are screened with a marker. Four true positive (TP) clones are therefore expected in the pools. Assuming that these clones are in different plates, different rows, and different columns within a plate, there would be a maximum of 4 TP intersections among a total of 256 experimental intersections generating 256 candidate clones. If all clones in a specific row or column of all stack of plates are divided into 4 sub-pools with 1× genome coverage, we can get 4 RPs of 1× genome coverage in a row and 4 CPs of 1× genome coverage in a column, and 1 row pool hit in each sub-row-pool and 1 column pool hit in each sub-column-pool. The total number of candidate intersections will be only (4×4×1×1)×4 = 64. This improved clone pooling strategy for clone row and column pools can dramatically reduce the number of F+ clones. However, 60 of the 64 clones are still F+ clones. A large number of dedicated PCR  or extra pools [6–8] are required to eliminate the F+ clones and detect the TP clones. Rather than using PCR, we use information about clone overlaps in the existing contig maps to find the TP clones among the candidates.
. For each marker, find all positive candidate clones C at the intersections of the four types of positive pools (Figure 1).
. Search all contigs in the contig map and in each contig find a clone subset C 1 which belongs to C if any.
. If the size of C 1 (the number of clones) for a contig ≥ 2, find a clone subset C 2 in which clones have spanning or inclusion relation, or overlap each other.
. If the size of C 2 for a contig ≥ 2, calculate the number of F- clones (FN). F- clones are the clones which are located between left-most position and right-most position of C 2 in a contig but are not included in C.
. Calculate a discrimination score for each of all clones sets of C 2 : Score = 0.8*NC/MC + 0.2 * (1-FN/NC), where NC is the number of clones in C 2 ; MC is the maximum number of clones in sets of C 2 , i.e., the number of clones in the top set of C 2 after sorting by the number of clones.
. Sort all clone sets of C 2 by the score of C 2 in a descending order. Choose the clone set with the highest score. If the score ≤ 0, no solution is found. If there are two or more clone sets which have the same scores, non-unique solutions are obtained for further manual verification.
bacterial artificial chromosome
polymerase chain reaction
single nucleotide polymorphism
column super pool
row super pool
clone column pool
clone row pool
Authors are grateful to Alina. R. Akhunova for preparing BAC pools and Vanessa Rashbrook for performing the Illumina GoldenGate assays. This work is supported by the US National Sciences Foundation grant IOS 0701916.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.