Efficacy of SSH PCR in isolating differentially expressed genes

Background Suppression Subtractive Hybridization PCR (SSH PCR) is a sophisticated cDNA subtraction method to enrich and isolate differentially expressed genes. Despite its popularity, the method has not been thoroughly studied for its practical efficacy and potential limitations. Results To determine the factors that influence the efficacy of SSH PCR, a theoretical model, under the assumption that cDNA hybridization follows the ideal second kinetic order, is proposed. The theoretical model suggests that the critical factor influencing the efficacy of SSH PCR is the concentration ratio (R) of a target gene between two cDNA preparations. It preferentially enriches "all or nothing" differentially expressed genes, of which R is infinite, and strongly favors the genes with large R. The theoretical predictions were validated by our experiments. In addition, the experiments revealed some practical limitations that are not obvious from the theoretical model. For effective enrichment of differentially expressed genes, it requires fractional concentration of a target gene to be more than 0.01% and concentration ratio to be more than 5 folds between two cDNA preparations. Conclusion Our research demonstrated theoretical and practical limitations of SSH PCR, which could be useful for its experimental design and interpretation.


Background
Alterations in gene expression are associated with a large spectrum of biological and pathological process [1]. The identification of differentially expressed genes often leads to greater insight into the molecular mechanisms underlying disease progression or biological development. To facilitate the discovery of differentially expressed genes, a variety of methods have been developed in recent years including Differential Display PCR [2], RNA fingerprinting [3], SAGE [4], Real-time Quantitative PCR (TaqMan) [5][6][7], Subtractive Suppression Hybridization PCR (SSH) [8], and hybridization to gene arrays of various formats [9,10]. Although each method has advantages and drawbacks, the general methodology for identification of differentially expressed genes has progressed from laborintensive procedures, such as polyacrylamide gel-based differential display, to automatic high throughput methods such as hybridization-based gene arrays. Commercial gene arrays, which contain probes bound to small glass plates or chips representing many genes and ESTs, provide simultaneous measurement of gene abundance and have greatly accelerated the search for differentially expressed genes. However, such arrays and associated equipment are expensive and beyond the access of most academic laboratories. Commercial arrays also suffer by being restricted to available gene sequences to serve as templates for probe design. They generally only cover human and the most common model organisms. Thus, to identify novel genes or to study other organisms such as agricultural crops and live stocks, it is still necessary to utilize additional methods beyond such gene chips and arrays.
Subtractive hybridization is an attractive method for enriching differentially expressed genes. This method was first used by Bautz and Reilly to purify phage T4 mRNA in the mid-1960's [11]. Pure subtractive methodologies are of limited use due to the need for a large quantity of mRNA to drive hybridization to completion as well as the difficulty in cloning the tiny amount of cDNA remaining after hybridization. The method was greatly improved when Duguid and Dinauer adapted generic linkers to cDNA [12] allowing the selective PCR amplification of tester cDNA between hybridization cycles. Diatchenko et al . further introduced the technique of Suppression Subtractive Hybridization PCR (SSH PCR) in which differentially expressed genes could be normalized and enriched over 1000-fold in single round of hybridization [8]. The recent commercialization of an SSH PCR kit by Clontech (CLONTECH Laboratories, Palo Alto, CA, USA) has lead to its increasing popularity in biological research laboratories [13][14][15][16][17].
Despite the popularity of SSH PCR, this complicated method has not been thoroughly studied for its practical efficacy and potential limitations. In this work, we have proposed a theoretical model of SSH PCR based on the assumption that cDNA hybridization follows the ideal second kinetic order. We further tested the theoretical predictions by several SSH experiments.

Theoretical model of SSH PCR
The strategy of SSH PCR to enrich differentially expressed gene is depicted in Figure 1. The procedure consists primarily of two substrative hybridizations and a single PCR amplification. In the first hybridization step, tester cDNA fitted with adapter 1 or 2R is mixed with a large excess of driver cDNA and denatured separately. They are then subjected to limited renaturation, also separately. Because the renaturation process, which is random collision of complementary strands, obeys the ideal second-order kinetics, the rate of the reaction can be described by Equation 1 [18,19]: where C is molar concentration of a single-strand target gene, t is time and k is the rate constant.

Equation 1 can be integrated and solved yielding Equation 2:
where C 0 is the starting concentration of the single-strand DNA, and C t is the concentration of the remaining singlestrand DNA at time t. When C 0 kt>>1, Equation 2 simplifies to Equation 3: implies that when hybridization time is long enough, or when C 0 kt>>1, the concentration of remaining single-strand DNA is determined mainly by its hybridization rate constant k and hybridization time t, and is inde- pendent of its starting concentration C 0. This is the basis of normalization in the first hybridization reaction.
Because single-strand cDNAs consist of both tester cD-NAs, which are fitted with adapter, and driver cDNAs, which are not fitted with adapters, and if we further assume that DNA with and without adapter have the same hybridization kinetics or to say simply that adapter will not interfere with DNA hybridization, then the concentration of the PCR amplifiable cDNA (those with adapters) can be calculated from Equation 4: where C t' is the concentration of a target single-strand cDNA with adapter, N is the ratio of the driver to tester in the first hybridization, and the R is the concentration ratio of the target cDNA in tester to that in driver.
In the first hybridization none of the double-strand cDNA can be amplified by PCR because it either lacks adapter sequences for binding of PCR primer(s) or PCR is suppressed by a so-called "panhandle" structure that is formed by long complementary sequences of 5' and 3' ends of adapters [21]. Therefore, only the single-strand cDNAs containing adapters are of consequence in the second hybridization.
In the second hybridization, the single-strand cDNAs from the first hybridization are mixed with new denatured driver cDNAs to form double-strand cDNAs. The second hybridization is carried out over a longer time period to ensure that all cDNAs become double-stranded. This reaction can be described by Equation 5: where A and B are a single-strand cDNA with its complementary strand respectively. A' and A" are strands fitted with adapter 1 and 2R respectively. B' and B" are fitted with adapter 1 and 2R respectively. In the second hybridization, only the double-strand cDNAs with two different adapters at each end (A'B" and A"B') can be amplified by PCR. The amount of product (A'B"+A"B') available for amplification can determined by Equation 6: Given that A = B = MC 0 /R, where M is the ratio of driver to tester in the second hybridization and R is the concen-tration ratio of a target cDNA of tester to driver and given Equation 4 the following hold true: Thus the concentration of target doublestrand cDNA with hetero-adapters can be calculated by  Equation 7: where C t is the concentration of remaining single-strand cDNA after the first hybridization, N is the ratio of driver to tester in the first hybridization (30 in our experiments), M is the ratio of driver to tester in the second hybridization (5 in our experiments), and the R is the concentration ratio of the target cDNA in tester to that in driver.
If we make some simple approximations by a. ignoring the cDNAs that cannot be amplified by PCR, which is logical considering the exponential amplification by PCR which results in unamplified cDNA comprising only a tiny portion of the total final cDNA, b. ignoring differences in PCR efficiency between amplifiable cDNAs, which is reasonable considering that all cDNAs have identical adapters, then Equation 7 gives the relative amount of all cDNAs after SSH PCR.
Thus, several predictions can be directly made by Equation 7. 1. when R = ∞, meaning that the target cDNA is an 'all or nothing' differentially expressed cDNA due to its presence only in tester and not in driver cDNA, then A'B" + A"B' = C t = 1/kt (Equation 4), then every 'all or nothing' differentially expressed cDNA will be enriched to a fixed level irrespective of its starting concentration; 2. when R is a small number (<10 for example), meaning the target is a ratio differentially expressed cDNA present both in tester and driver cDNA but at different concentrations, then C 0 >>C t and N>>R. Equation 7 can therefore be simplified to: demonstrates that the enrichment of a ratio differentially expressed gene is proportional to the cube of R, implying that the greater the expression ratio is between a cDNA in driver vs. tester the more likely it is to be detected by SSH PCR.

Experimental Test of SSH PCR
To experimentally test the two predictions of the theoretical model we designed a series of experiments. First, we tested SSH PCR for enrichment 'all or nothing' differen- tially expressed genes. We prepared a series of tester cD-NAs by artificially adding φx174 DNA to fibroblast cDNA to simulate differentially expressed genes and extracted the tester cDNAs by using fibroblast cDNA as driver. The results (Fig 2) demonstrated that 'differentially expressed' φx174 DNA can be enriched to clearly visible bands when its fractional concentration is more than 0.01% of tester cDNA (Fig 2. lanes 2,3). When the starting fractional concentrations of φx174 were 1.0% and 0.1% respectively in the tester cDNA preparations, the SSH PCR φx174 bands were of similar intensity after SSH PCR as shown in lanes 2 and lane 3 (Fig 2), indicating their enrichment to the same level. This is in consistent with the theoretical prediction. Fig 2 also revealed a practical limitation of SSH PCR not obvious from the theoretical model. When φx174 DNA is less than 0.01% of tester cDNA, no clearly visible bands of φx174 are apparent after agarose gel electrophoresis (lane 4,5,6), indicating that most of SSH PCR cDNAs are not the 'differentially expressed' target φx174, but are predominantly randomly amplified fibroblast cD-NAs.
We also tested SSH PCR for efficacy in enriching ratio differentially expressed genes. We prepared a series of tester and driver cDNAs by adding different amounts of φx174 DNA to fibroblast cDNA. In the first series, the tester cDNA contained a fixed amount (1.0%) φx174 DNA added to fibroblast cDNA while a series of driver cDNAs were made by adding φx174 DNA ranging from 1% to 0% to fibroblast cDNA. Then we enriched the 'differentially expressed' φx174 DNA by SSH PCR. The results (Fig 3) demonstrated that 'differentially expressed' φx174 DNA can be enriched to clearly visible bands only when it is 5fold or more concentrated in tester compared to driver cD-NAs (lane 4, 5, 6 and 7). When the differentially expressed cDNA is less than 5-fold concentrated in the tester, no distinguishable φx174 DNA bands were seen (lane 2 and 3), suggesting that the "differentially expressed' φx174 DNA was not enriched enough by SSH PCR and that the resulting SSH library consists mainly of randomly amplified fibroblast cDNAs.
To further examine the role of the concentration ratio R and the effect of target abundance on efficiency of SSH PCR, we made a second series of tester and driver cDNA for SSH PCR. Tester cDNA contained 0.1% φx174 DNA, one-tenth the amount in the previous experiment added to fibroblast cDNA and the series of driver cDNAs was also reduced by 10-fold driver ranging from 0.1% to 0% φx174 DNA in fibroblast cDNA. Thus, the absolute amount of 'differentially expressed' φx174 DNA is one tenth of the amount in the previous experiment, however, the corresponding concentration ratios are identical. We again enriched the 'differentially expressed' φx174 DNA using SSH PCR and the results are shown in Fig 4. The results of this experiment were almost identical to the previous experiment in that the dependence on concentration ratio for effective enrichment was similar requiring more than fivefold more φx174 DNA in tester than in driver (lane 3, 4, 5 and 6). The results in Fig 3 and Fig 4 together demonstrate that effective enrichment by SSH PCR is highly dependent on concentration ratio of the differentially expressed gene. Enrichment is far more effective for genes that are highly differentially expressed. These results are consistent with the theoretical prediction described in Equation 8.

Discussion
We presented a theoretical model to describe SSH PCR based on the well-established second order kinetic of DNA hybridization [18,19]. Recent kinetic modeling and computer simulation of subtractive hybridization based    [20,22]. Our mathematical calculations described in Equation 7 and 8 reveal the relative importance of factors such as concentration ratio (R) and target abundance for any specific cDNA to be present in an SSH PCR library. When R→∞, that is when differentially expressed genes are 'all or nothing', they are effectively enriched to a fixed concentration of 1/kt. When R is a small number, enrichment is proportional to R 3 , favoring highly differentially expressed genes. Our experiments confirmed the theoretical prediction that the primary factor influencing enrichment is the concentration ratio R and not the absolute difference. This was supported by the similar enrichment of 1.0% and 0.1% φx174 DNA shown in Fig 3 and 4. On the contrary side, SSH PCR cannot exclude all non-differentially expressed gene from a library. This was demonstrated the evenly distributed DNA surrounding the φx174 DNA bands which are evidently derived from 'non-differentially' expressed fibroblast cDNA. Contrary to the theoretical prediction, however, our SSH PCR experiment failed to enrich φx174 DNA when less than 0.01% (Fig 2 lane 4, 5 and 6). A possible explanation is that target cDNA less than 0.01% is too low to drive hybridization to completion in the second hybridization. Because formation of double-stranded cDNA is required for PCR amplification in SSH PCR, the result will be low representation of the rare target cDNA in the SSH PCR library even if it is of the 'all or nothing' differentially expressed cDNAs.
Practical factors, such as PCR amplification efficiency, have not been taken into our theoretical consideration. As note before, the PCR amplification efficiency is sequencedependent, which may result in fortuitous over-representation or under-representation of certain sequences in SSH PCR library. The factors may change the outcomes of SSH PCR experiments serendipitously. They, however, don't constitute the basis for SSH PCR to enrich differentially expressed genes. For simplicity, they are not included in our theoretical consideration.
Our results have a significant bearing on the use SSH PCR application and the interpretation of experimental results. Because SSH PCR favors highly differentially expressed genes, the primary application of SSH PCR should be to detect dramatic alteration of gene expression, such as comparison of gene expression after viral infection or gene expression profiling of two different tissues. In profiling gene expression differences in diseased vs. normal tissues or over an experimental time course where small changes in gene expression are more likely to be physiologically relevant, SSH PCR would be highly ineffective in profiling gene expression changes. In such situations, differential screening of very large SSH PCR libraries can potentially compensate but at high costs in time and labor. In addition, for effective enrichment by SSH PCR the target mRNA must be at least 0.1% of the total mRNA, thus low abundance genes such as transcription factors, cytokines, and receptors which are key regulators of many pathological processes would not be detected by this method.
Care must be also be taken in the interpretation of SSH PCR results. The presence of many non-differentially expressed genes in an SSH PCR library may not result from experimental error but maybe due to the absence of significantly differentially expressed genes between the chosen driver and tester samples. The failure of a SSH PCR library to include a known differentially expressed mRNA may

Conclusions
Our theoretical model suggests that effective enrichment of a target gene by SSH PCR is determined by its concentration ratio (R) between tester and driver. The enrichment is far more efficient for differentially expressed genes with a large value for R. Our experiments validate the theoretical predictions that enrichment by SSH is greatly influenced by concentration ratio R. They also revealed practical limitations: for effective enrichment of 'all or nothing' differentially expressed genes, the fractional concentration of a target gene needs be more than 0.01%. For effective enrichment of ratio differentially expressed genes, the concentration ratio needs to be more than 5fold.