The multi-tier reduced representation library successfully took advantage of the strengths of two next generation sequencing methods. The main advantage of the 454-FLX system is the generation of longer reads than the GA system. Maughan et al.,  were able to sequence a reduced representation library using only the 454-FLX system for SNP discovery in Amaranth in which they estimated that their RRL represented 10 Mbp of the 466 Mbp genome. The sequencing of the first tier in the common bean RRL produced 67,340 contigs and 92,696 singleton sequences. After elimination of chloroplast and mitochondrial DNA a total of 36 Mbp of unique sequence was obtained. The high number of singleton sequences and the lack of read-depth in the contigs likely indicated that this 36 Mbp did not include all fragments that were in the 300 to 350 bp size range. Since our reduced representation library likely contains a larger proportion of the estimated 600 Mb common bean genome  than Maughan et al.,  isolated from Amaranth an additional sequence run of the 454-FLX system on a second genotype was unlikely to be sufficient for SNP discovery in common bean. The isolation of a larger proportion of the genome was expected since three restriction enzymes were used in the first restriction digestion instead of a single enzyme as has been used in previous studies [7, 11].
The advantage of the GA system is that it produces millions more reads than the 454-FLX system at a much lower cost but these reads are considerably shorter. While sequencing the first tier with 454-FLX system alone was inefficient for SNP discovery in common bean, it did produce 300 to 350 bp sequences which we were able to utilize as a reference sequence to align the shorter, but much more numerous, GA sequences. The further reduction of the 300 to 350 bp first-tier fraction to 110 to 140 bp fragments allowed the end sequencing of those fragments with the GA system. This smaller second-tier fraction derived from the 300 to 350 bp fragments ensured that many of the GA reads occurred at various positions within the 454-FLX fragments giving ample flanking sequence on either side of the predicted SNP.
This process predicted SNPs at an 86% success rate when GA reads of both BAT 93 and Jalo EEP 558 were used to predict the SNP. This is similar to the 92.5% obtained in soybean using two or more reads to predict a SNP  and 91% obtained in cattle when using two or more reads to predict a SNP  when sequencing a reduced representation library with GA sequencing aligned to a reference genome. Barbazuk et al.,  obtained an 85% validation rate when using a 454 GS-20 run to sequence the transcriptome of two inbred maize lines when there was no reference genome sequence available. The 86% success rate would likely be increased with additional sequencing which could help identify paralogous sequence variants and would help eliminate SNPs called due to sequencing errors. Longer sequencing reads that can now be obtained with the GAIIx or the Illumina HiSeq 2000 should also allow for better identification of paralogous sequence causing a false positive SNP call. Even with the longer reads of 100 bp available for the GAIIx or HiSeq 2000 it is likely that a reference sequence of at least 200 bp in the form of a 454 sequence would still be needed to obtain enough context sequence surrounding the SNP to have a high probability of converting that SNP into a usable assay. Once the whole genome sequence of common bean is available, a reanalysis of the data should also increase the success rate of SNP prediction.
While using both BAT 93 and Jalo EEP 558 GA sequences gave a high validation rate, requiring a Jalo EEP 558 GA read to validate a SNP considerably reduced the final number of predicted SNPs. However, this step was necessary in order to eliminate paralogous sequence variants. This large decrease in predicted SNPs also indicated that with the mtRRL constructed here, many more SNPs could potentially be confirmed with additional Jalo EEP 558 sequencing. Even without additional sequence runs we were able to design 1050 GoldenGate assays from the sequence data produced from the 454-FLX system and obtained working GoldenGate assays for 79% of the predicted SNPs. It has been shown in soybean that the conversion of confirmed SNPs into working GoldenGate assays is approximately 90% . Using unconfirmed SNPs as discovered in this study the percent of working GoldenGate assays should be the product of the validation rate by the conversion rate of confirmed SNPs. If the 86% validation rate obtained by Sanger sequencing is used than obtaining a 79% rate of working GoldenGate assays would suggest that for common bean the GoldenGate assay had a 92% conversion rate for real SNPs which is slightly higher than what has been obtained with soybean.
All the SNP resources developed in the present study have not been exploited in the GoldenGate assay: 1,205 SNPs are still available with predicted success rates equal to the 1,050 SNPs used for the GoldenGate assay development. In addition there are another 540 SNPs with predicted success rates that should be slightly lower than the 79% conversion rate that we obtained that could still be developed into assays. Each of the 3,487 SNPs has sufficient flanking sequence that a variety of other SNP detection methods could be used in place of the GoldenGate assay .
It is expected that the 3,487 SNPs should randomly distribute throughout the genome since the enzymes chosen were not chosen to enrich for genic sequence. Because of this random distribution, it is expected that when these SNPs are genetically mapped they will cluster depending upon the amount of heterochromatic DNA present in common bean. It has been shown in the closely related legume soybean, that 57% of the genome is heterochromatic DNA and that recombination is severely suppressed . It has been estimated that in common bean approximately 48% of the genome that is heterochromatic . While this predicts that half of the SNPs discovered will genetically cluster, they will still be useful in assembling the genome sequence of common bean  which is currently in progress (Scott Jackson, personal communication). It is interesting to note that in soybean, 21.6% of high-confidence genes are located in the heterochromatic DNA  and that a SNP discovery method using the transcriptome would likely demonstrate some clustering in the heterochromatic DNA. This can be observed in the recent SNP discovery project in barley which only used cDNA for SNP discovery and still demonstrates some clustering around the pericentromeric regions on the barley chromosomes which are likely to be heterochromatic .
Other methods that have used next generation sequencing for SNP discovery in organisms without a whole genome sequence reduced the complexity of the genome through the creation of normalized cDNA libraries or through capture arrays that were then sequenced using a 454-FLX system [19–21]. While these methods can be very successful for SNP discovery, they still require the creation of normalized cDNA libraries or a capture array which can be expensive and time consuming. Another drawback with SNP discovery using the transcriptome is that genes are more conserved than non-coding DNA which will lead to the discovery of fewer SNPs . The more conserved sequence will also lead to primers or probes hybridizing to both the gene sequence that contains the SNP as well as any conserved paralogous sequence, thereby decreasing the success rate of assays for such SNP . In addition, without a genomic reference sequence, the proportion of successful SNP assays designed to cDNA sequence will also be decreased due to the present of introns interfering with oligo hybridization.