A highly sensitive and specific system for large-scale gene expression profiling
BMC Genomics volume 9, Article number: 9 (2008)
Rapid progress in the field of gene expression-based molecular network integration has generated strong demand on enhancing the sensitivity and data accuracy of experimental systems. To meet the need, a high-throughput gene profiling system of high specificity and sensitivity has been developed.
By using specially designed primers, the new system amplifies sequences in neighboring exons separated by big introns so that mRNA sequences may be effectively discriminated from other highly related sequences including their genes, unprocessed transcripts, pseudogenes and pseudogene transcripts. Probes used for microarray detection consist of sequences in the two neighboring exons amplified by the primers. In conjunction with a newly developed high-throughput multiplex amplification system and highly simplified experimental procedures, the system can be used to analyze >1,000 mRNA species in a single assay. It may also be used for gene expression profiling of very few (n = 100) or single cells. Highly reproducible results were obtained from duplicate samples with the same number of cells, and from those with a small number (100) and a large number (10,000) of cells. The specificity of the system was demonstrated by comparing results from a breast cancer cell line, MCF-7, and an ovarian cancer cell line, NCI/ADR-RES, and by using genomic DNA as starting material.
Our approach may greatly facilitate the analysis of combinatorial expression of known genes in many important applications, especially when the amount of RNA is limited.
Biological processes are underlain by interactions between various genes and their products through defined pathways in the molecular network, in which molecules cross communicate in hitherto unknown ways under both healthy and disease conditions. Learning gene expression patterns on a genomic scale would substantially help deconvolute these complex processes. Exhaustive identification of human genes during the Human Genome Project has made such studies possible. By global gene expression profiling in cells and tissues under either physiological or in vitro conditions, our understanding of the correlation between gene functions and their phenotypic effects could be significantly enhanced.
The advent of the microarray-based high-throughput RNA detection system [1, 2] has made it possible to profile gene expression patterns for the entire transcriptome. However, to detect gene transcripts very specifically, one needs to discriminate them from closely related sequences including: (1) the corresponding gene sequences. Although contamination of gene sequences may not be a concern for applications using purified mRNA, gene sequences must be taken into consideration for applications directly using cell lysate without RNA extraction. This becomes especially important when the studied transcripts are present at low abundance; (2) pseudogenes and their possible transcripts. The number of pseudogenes in the human genome was estimated to be 20,000 to 33,000, which are widely expressed [3, 4]. These sequences usually share a high degree of sequence identity with the closely related genes; (3) unprocessed RNA containing the same exons as those of the corresponding mRNA. So far, no system has addressed the above issue very effectively.
Among the microarray-based platforms, GeneChip is a commonly used system and has been improved significantly since it was invented, and has contributed to understanding the complex gene expression network in a great deal. However, since this technology is limited by its high degree of nonspecificity and insensitivity, its application has been limited in molecular network integration. Results from a recent analysis  indicated that on the Affymetrix GeneChip U95A/Av2 array, 20,696 (10.5%) probes were nonspecific, which could cross-hybridize to multiple genes, and 18,363 (9.3%) probes missed the target transcript sequences. The numbers of nonspecific and mis-targeted probes on the U133A array were comparable, which were 29,405 (12.1%) and 19,717 (8.0%), respectively . These ~20% of problematic probes certainly and substantially compromise the data accuracy, decrease the value of microarray data, and are not acceptable for the studies of molecular network integration. It was also found that some probe sets representing the same genes on Affymetrix microarrays could show significant discrepancy because of the non-specific hybridization [6, 7]
In most applications, gene expression profiling with microarrays including GeneChip requires amplification of sample RNA, regardless of how much material is available. Normally, 1 to 3 μg of RNA is required for each assay . However, high-throughput gene expression profiling with superior sensitivity is becoming more and more demanded, and has its wide applications. For example, in breast cancer research, analysis of specimens from microdissection may provide important information about genes involved in different cancer development stages and for understanding the molecular mechanisms underlying cancer development . Specimens from fine needle biopsy are also important in diagnostic procedures and in evaluating therapeutic effects. The ability to analyze a large number of genes in single cells may help understand the origin and clonality of cancer development and learn the molecular details involved in different stages of the cell cycle.
Current methodologies for gene expression profiling in small RNA samples, especially those from single cells, are very limited. Many of these protocols [2, 10, 11] require multiple enzymatic reactions that may seriously reduce the sensitivity and compromise the specificity. RNA preparation in most of applications also involves a number of steps, which is rather lengthy, tedious, and requires highly skilled personnel.
To solve the above problems, we have developed a highly specific and sensitive gene expression profiling system. With this system, primers are specially designed to amplify mRNA sequences very specifically. Probes used for microarray detection are designed only to hybridize to sequences amplified from mRNA. In conjunction with the high-throughput multiplex amplification protocol developed in our laboratory lately , a large number of mRNA species directly released from very few cells or even single cells can be amplified to a detectable amount without RNA isolation. Amplified products can then be detected by the single-base extension assay on an oligonucleotide microarray .
Experimental system used in the study
To establish a cancer gene expression array, a panel of cancer-related genes were selected based on their known functions and/or cancer-associated expression patterns from published literature [15–28]. All amplicon sequences were subjected to computational screening to ensure their uniqueness. Primers and probes were selected according to a series of criteria as specified in Materials and Methods. Most primer pairs amplify sequences in two neighboring exons separated by large introns. The intron lengths ranged from 79 bp to 90 kb with an average of 2.0 kb and 97% of the introns are longer than 200 bp. Initially 1,445 genes were used as the input for the primer and probe design program. Primers and probes were selected for 1,120 (77.5%) of these genes. The remaining 22.5% had either no introns or no suitable sequences for primers and/or probes. Fifteen of these remaining genes with important functions in cancer development were included in the panel. Primers and probes were designed based on the unique sequences in these genes, and were not required to have introns internally located within the amplified sequences. Therefore, a total of 1,135 genes were included in our multiplex assay. (Details about these genes, and their corresponding primers and probes used for the study are listed in Additional files 1 and 2.)
Microarray-based single-base extension (SBE) assay has been used to genotype single nucleotide polymorphisms (SNPs) [12, 29, 30] in our laboratory. In the present study, SBE was adapted for gene expression profiling. To simplify the analysis, all probes were designed to terminate immediately before a 'G' base in the templates. In this way, the probes were extended by a single base, dideoxynucleoside triphosphate (ddCTP) that was fluorescently labeled. By using one color, the bias associated with different dyes was also eliminated. The detection procedure is schematically illustrated in Fig. 1. Resulting data have been deposited to the NCBI's Gene Expression Omnibus (GEO)  and are accessible through GEO Series accession number GSE5920.
Reproducibility of the high-throughput gene expression profiling system
To test the reproducibility of our system, gene expression was profiled for three duplicated 100-cell samples from an ovarian cancer cell line, NCI/ADR-RES  and two 100-cell samples from a breast cancer cell line, MCF-7. Resulting microarray data are supplied in Additional file 3. Table 1 summarizes the numbers of gene transcripts detected from different samples. As shown, 660 (58.2%), 663 (58.4%), and 662 (58.3%) gene transcripts were detected from the three 100-cell samples of NCI/ADR-RES, respectively. Of these transcripts, 650 (>98%) were detected from all three duplicates. Signal intensities for the 1,135 genes were strongly correlated between the duplicates (Pearson's r = 0.977, 0.974, and 0.949, respectively). Fig. 2A shows a scatter plot of two duplicates. Of the 650 transcripts detected in all three NCI/ADR-RES 100-cell samples, only 6 (0.9%), 17 (2.6%), and 1 (0.2%) transcripts had their signal intensities differing by >2 fold between each two of these three duplicates. Twenty-six transcripts were detected from only one or two of the three samples. The signal intensities for these transcripts were low. Only one transcript in one sample had its signal intensity >1,000, indicating that the inconsistence among the duplicates was due to low signals of these transcripts.
For the two 100-cell samples from MCF-7, 615 (54.2%) and 614 (54.1%) gene transcripts were detected, respectively, with 597 (>97%) detected in both. Of these 597 transcripts, 562 (94.1%) had signal intensities differing less than two fold. Similar to the situation with NCI/ADR-RES samples, all 34 transcripts that were detected in only one sample but not the other had low signal intensities with only nine genes whose signal intensities were >1,000 in one of the two samples.
Because samples prepared from a large number of cells are usually associated with high reliability, we further compared the microarray results of the NCI/ADR-RES 100-cell samples with those from a 10,000-cell sample of the same cell line. Resulting data also show a high degree of correlation (r = 0.961, Figure 2B). As shown in Table 1, 630 (96.7%) of the 650 gene transcripts detected from all the 100-cell samples were also detected from the 10,000-cell sample. Sixty-three gene transcripts were detected in at least one of the three 100-cell samples but not in the 10,000-cell sample, or vice versa. Of these 63 gene transcripts, 61 had signal intensities below 1,000 in all the three 100-cells. However, the change from 100 to 10,000 cells did enhance the detection of 21 gene transcripts whose signal intensities were >2 fold greater in the 10,000-cell sample than those in the 100-cell samples. Among these 21 transcripts, six had signal intensities in the 10,000-cell sample more than 15 fold greater than the average intensities of the corresponding genes in the three 100-cell samples, indicating that using 10,000 cells may have significantly increased the copy numbers of these transcripts or changed their absence status to presence. These data indicate that our system not only can produce very reliable results even with as few as 100 cells but also is very sensitive to the copy number change for the low-copy-number gene transcripts.
Sensitivity of the high-throughput gene expression profiling system
To further test the sensitivity of our high-throughput gene expression profiling system, single NCI/ADR-RES cell samples were prepared and used for multiplex gene expression assay of the 1,135 mRNA species. Microarray results from three of these samples are listed in Additional file 3. The numbers of gene transcripts detected from the three single-cell samples were 590, 576, and 614, respectively. Of these transcripts, 503 were detected from all single cells. Of the 503, 463 (92.0%) were also detected from all non-single-cell (100-cell and 10,000-cell) samples, indicating a prevalent expression of these genes in most, if not all, cells at relatively high levels.
On the other hand, the detection range of gene transcripts from the three single-cell samples was wider compared to the non-single-cell samples. As shown in Table 1, 449 transcripts were undetectable in all three single cell samples, a number which is not greater than that (459) for the three 100-cell samples and is comparable to that (442) for all non-single-cell samples. The number of undetectable gene transcripts from all single and non-single cell samples is 357. This number means that from single cells, we not only detected a comparable number of genes, but also detected a new set of 449-357 = 92 genes that could not be detected with non-single-cell samples of the same cell line.
The robustness of gene expression profiling with single-cell samples was also demonstrated by the signal intensities. As described above, most transcripts that were detected from some but not all non-single-cell samples had low signal intensities and very few were >1,000. The scenario with single cells is very different. Of the 503 gene transcripts detected from all single cells, 40 were detected in one to three non-single-cell samples but not all four. All 40 but one have signal intensity >1,000 in at least one of the three single-cell samples. Of the 183 transcripts that were only detected from one or two single-cell samples, 108 (59.0%) had signal intensity >1,000. The strong and robust signal intensities detected from single-cell samples indicate that our system is very sensitive.
Unlike the gene transcripts detected from all non-single-cell samples which account for more than 95% of gene transcripts detected from each of these samples, the 503 gene transcripts detected from single cells only account for 85. 3%, 87.3% and 81.9% of the transcripts detected from individual single-cell samples, respectively. Pairwise comparison of the results from the single-cell samples yielded correlation coefficients of 0.780, 0.700, and 0.711, respectively, compared with = 0.949 for the non-single-cell samples. From all single and non-single-cell samples, 778 gene transcripts were detected, of which 315 (40.5%) were detected from some but not all samples. This is in contrast with the scenario of non-single-cell samples from which gene transcripts that were only detected from some but not all samples were a very small portion (Table 1). Furthermore, of these 315 transcripts, 177 (56.2%) were either detected from only single cells or from non-single cell samples.
The high degree of concordance among the results from the non-single-cell samples, and the significant differences among those from single cells, and between single cells and non-single-cell samples indicate that most, if not all, of these differences are real. As mentioned above, this is further supported by the robustness of the signal intensities detected from single-cell samples for the gene transcripts that were detected differently between the single cells and non-single-cell samples. It is conceivable that heterogeneity in clonality and/or genetic alterations in the cells of a cell line could be major factors contributing to the differences. In addition, a considerable portion of the cells may be at different cycle stages during which groups of genes are expressed differently. Therefore, while gene expression in single cells could differ in various aspects, 100 cells may well represent the entire cell population because, after all, the cell line cells are from the same tissue and the same donor. Therefore, genes that are detectable in a cell population may not be expressed or expressed at very low levels in certain single cells. Conversely, genes that are detectable in particular single cell samples may not be expressed or expressed at very low levels in the majority of the cell population.
Differential gene expression in the two cell lines, NCI/ADR-RES and MCF-7
When the gene expression profiles of NCI/ADR-RES were compared with those of MCF-7, a considerable number of genes were shown to be expressed differentially in these two cell lines. Of the 1,135 gene products, 531 (46.8%) were detected from samples of both cell lines (not including single cell samples). Seventy-five gene transcripts were detected in all NCI/ADR-RES non-single-cell samples, but not in the MCF-7 samples, and 43 were detected in the opposite way.
Of the 118 differentially expressed genes, 69 were shown to be expressed with more than 10-fold difference (Table 2). Of the 69 genes, 37 (53.6%) were detected as strongly or relatively strongly expressed in MCF-7, but weakly or not expressed in NCI/ADR-RES, and 32 were detected in the opposite way. To validate the gene expression data, 22 of these 69 genes, and another 46 gene transcripts detected with various microarray signal intensities different between the samples of the two cell lines were randomly selected (Table 3) and subjected to RT-PCR amplification individually. The amplified products were resolved by gel electrophoresis. The signal intensities of the respective bands were quantified with a gel documentation system. Part of results from microarrays and gel assays are shown in Fig. 3.
Table 3 summarizes the results from both microarray and gel assays. Based on the results from microarray, genes in Table 3 are subdivided into four groups. Transcripts of Group I genes were detected from all samples, while no transcripts were detectable from all samples for Group IV. Transcripts of Group III genes were detected only from the NCI/ADR-RES samples but not from the MCF-7 samples, and those of Group II genes were detected in an opposite way. As shown, the signal ratios between NCI/ADR-RES and MCF-7 from microarray for the Group I genes are well in concordance with the ratios from gel assay, with a correlation coefficient of 0.679, indicating results from microarray and gel assays match very well. Because signals of genes in Groups II to IV were below the background for at least one of the two cell lines, ratio comparison may not be meaningful. However, it is clear that the microarray signals detected from MCF-7 are all greater than those from NCI/ADR-RES for the Group II genes, and vice versa for the Group III genes. Results from the gel assay are very well in concordance with this correlation. The only exception is HDAC5 whose microarray signal from MCF-7 is approximately 5 times that from NCI/ADR-RES, while its gel signal is more than 2 times that of the latter. Since the microarray signal intensity from MCF-7 for this gene is the lowest among the Group II genes, this discrepancy could be caused by wider variation of the low signal intensity. For all genes in Groups II to IV, if the microarray signals are lower than background, the corresponding gel signals are also low (<5,000) except for three genes, HDAC5 in Group II, and DAB2 and CTSZ in Group III. The fact that low or relatively low signals were detected by the gel assay for the genes whose array signals were weaker than background may be a reflection of the difference between the two assays. For microarray, all 1,135 transcripts were amplified in the same tube, while all transcripts analyzed by gel assay were amplified individually. It is known that during PCR, after the reaction reaches a saturation point, very little additional products may be generated. When the gene transcripts are amplified in a multiplex way, certain low-copy-number sequences may not be amplified to the detectable amounts when the reaction reaches a saturation point.
Specificity of the high-throughput gene expression system
The specificity of our high-throughput gene expression system was demonstrated by the results from different cell line samples and by those from different single cells as described above. To further demonstrate the specificity of our system, human genomic DNA samples were amplified with the same multiplex RT-PCR procedure and analyzed by microarray. Very few probes (<0.2%) were shown to have signals above background (data not shown), indicating that our system is very specific and can discriminate between the target mRNA sequences from their genomic counterparts, and therefore, the unprocessed transcripts. Our previous experience also showed that in the absence of specific templates, a few primer sets may amplify non-specific sequences. However, such non-specific amplification may become undetectable in the presence of specific templates because the specific sequences are much stronger in competition. In addition, using specially designed probes also enhanced the specificity.
Compared with other existing gene expression profiling methods, our approach has the following advantages:
(1) Highly specific
To date, no other high-throughput system has been reported to be highly discriminative of mRNA from other related DNA and RNA sequences. Using primers amplifying sequences across intron(s) and probes consisting of sequences in adjacent exons is a critical enhancement to achieve such high specificity. Furthermore, all primer, probe and amplicon sequences were subjected to exhaustive searches against the databases of the entire human genome and transcriptome to ensure these sequences are unique. Such a step was proven very effective for enhancing the specificity . Experimentally, when genomic DNA was used as samples, signals were only detected for only 2 or 3 genes (0.2%) out of the 1,135 genes. Based on our previous studies, these signals may become undetectable in the presence of specific sequences which may compete out the nonspecific amplification.
(2) Highly sensitive
We showed previously that our multiplex amplification system could detect >1,000 single-copy sequences simultaneously from single haploid sperm cells . The fact that >90% of these sequences are detectable indicates that with our specially designed primers, most, if not all, sequences may be well amplified parallelly with very limited, if any, interaction among the primers. Since the primers used for gene profiling are designed in the same way, it is reasonable to believe that most gene transcripts are also amplified parallelly. However, since the copy number of different gene transcripts in the cells varies in a wide range, the outcome of amplification would be different from that using single-copy sequences. When only single-copy sequences are used in multiplex amplification, most, if not all, sequences may reach the detectable amount before the system is saturated. However, when gene transcripts are amplified, whether a transcript reaches a detectable amount before the system is saturated depends on its copy number in the sample, and not all sequences may reach a detectable amount at the end of amplification. This is probably why some sequences were undetectable by microarray but detectable by gel assay.
With our system, a total of 686 gene transcripts were detected from three single cells, which is comparable to 676 for the three 100-cell samples and 693 for all non-single-cell samples from the same cell line. The sensitivity of our system is further proved by the facts that results from 100-cell samples are very similar to each other and to those from 10,000 cells, and that specific gene expression profiles were obtained from different cell lines using as few as 100 cells.
The sensitivity of our system is further illustrated by the results that a significant portion of transcripts that could not be detected from the NCI/ADR-RES samples but were detected from the MCF-7 samples or single cell samples, and vice versa. This also indicates that low microarray intensities for these transcripts were not false negatives, and they were either not present or present in very low abundance in the respective samples.
(3) Very simple
Unlike other methods that involve multiple steps and use multiple enzymes, our method allows a large number of gene products amplified by a single RT-PCR step directly from cell lysates without RNA extraction. In this way, a large number of samples may be analyzed easily and cost-effectively. Our simple experimental procedure is also the basis of the high degree of sensitivity since it avoids complicated mRNA extraction and processing procedures before and during amplification, which may cause mRNA degradation or loss.
(4) Very safe for RNA samples
When working with RNA, one has to take extra precaution to prevent mRNA from degradation. Our method does not need RNA extraction. Once cells are lysed, RNA is directly released to the RT-PCR buffer and used as template immediately. There is almost no chance for RNase to degrade the mRNA templates.
(5) Highly flexible
Many studies may not need to analyze all genes in the human genome and may often need to focus on different gene groups. Therefore, flexibility of the experimental system would be highly desirable. With our computer program, a large number of gene products can be designed into a single multiplex group. Genes can be easily organized into different subgroups upon need, and can also be re-grouped at any time without altering the reaction conditions. New gene products can be added to an existing set easily.
The capacity of multiplex RT-PCR is another concern for high-throughput gene expression profiling because it not only makes the amplification of a large number of gene products affordable and cost-effective, but also eliminates challenges involved in quality control of RT-PCR for a large number of genes individually [33, 34]. However, the capacity of multiplex amplification was limited by interaction between primers. A previous study reported a screening of 29 expressed genes using multiplex RT-PCR, but was unable to reduce the number of the reaction tubes less than eight . Other studies multiplexed up to nine genes with nonspecific RT primers [36, 37]. Studies using multiple sets of gene-specific primers in single reactions were also reported, but none of these generated enough products for the analysis of all expressed genes in the samples [34, 38]. In the present study, we report our success with multiplex RT-PCR for 1,135 mRNA species. Such a success was based on a combination of several technological developments, including computerized primer design with predicted minimal interaction, a narrow primer Tm range, small amplicon sizes, and optimization of amplification conditions based on our previous experience [12, 29, 30]. With our current protocol, it is possible to include two thousand or more gene transcripts in a single multiplex amplification group, and to analyze all human gene transcripts using several multiplexing amplification groups. After pooling amplified products from the multiplexing groups, all genes may be analyzed with a single microarray. With our system, large-scale gene expression profiling becomes highly affordable and cost-effective. If the primers and probes used in the high-throughput analysis are made accessible to the research community through a distribution system, large- and genome-scale gene expression profiling may be even more affordable and cost-effective.
A major limitation of our system is the requirement of presence of large introns in genes under study. When the introns are small, discrimination between mRNA and closely related DNA and RNA sequences is still possible by using probes consisting of sequences in the neighboring exons. For genes with no introns, primers and probes can be designed only to discriminate mRNA sequences from related pseudogenes and their transcripts but not the corresponding gene sequences. In this case, discrimination between mRNAs and their gene sequences is only possible when the mRNAs are present abundantly.
An extreme and possible application of our highly sensitive gene expression profiling system is the analysis of disseminated tumor cells in cancer research. Analysis of individual cells is necessary for understanding the early dissemination of tumor cells. Disseminated tumor cells remain in the patient bodies even after complete resection of the primary tumor, and can be obtained by bone marrow aspirates . With our highly sensitive system, genetic signature in these cells may be detected. The resulting information may provide molecular basis for new therapeutic targets. For example, ERBB2 expression has been found to be a therapeutic target for metastatic breast carcinoma . Identification of mRNA like that of ERBB2 in micrometastatic cells may help develop effective therapeutical approaches to preventing further development of these cells into incurable metastasis. Using mRNA from a small number of microdissected frozen tissue sections without RNA isolation has been demonstrated with a small number of genes . Our system should be capable of using both microdissected and biopsy specimens for gene expression analysis on a much larger scale.
High-throughput gene expression profiling with single cells is also interesting for most laboratories studying molecular neurophysiology, but has been hampered by the capacity of multiplex PCR. Our approach can be used to examine the expression of many genes within individual neurons or other cells. The gene expression profiles can also be correlated to the phenotypes of these cells such as morphological, electrophysiological and pharmacological features to understand the underlying molecular mechanisms.
This report describes a high-throughput gene expression profiling technology, which is simple, highly reproducible, specific and sensitive, and may greatly facilitate gene expression profiling of a small number of or even single cells. It may also be applicable to many applications where the amount of material is limited, and to diagnostic assays that identify the onset of cancer and monitor its progression, remediation or response to treatments.
Data discussed in this publication have been deposited in the NCBI's Gene Expression Omnibus  and are accessible through GEO Series accession number GSE5920.
Cell lines and single cell preparation
Human breast cancer cell line MCF-7 and ovarian cancer cell line NCI/ADR-RES were kindly provided by Drs. Jinming Yang, Hao Wu and William Hait . The cell lines were maintained in RPMI 1640 medium containing 10% fetal bovine serum, 100 units/ml penicillin, and 100 μg/ml streptomycin at 37°C in a humidified atmosphere containing 5% CO2. After counting with a hemacytometer, cells were suspended in PBS (phosphate buffer solution) to 1000 cells/μl or other desirable densities. Two μl was dispensed into an Eppendorf tube containing cell lysis buffer (1.5 μl RNase inhibitor, 4 μl of 5× QIAGEN OneStep RT-PCR buffer, 12.5 μl H2O). Single cells were prepared from a diluted cell suspension of 2 cells/μl in 1 × PBS. About 0.5 μl of the suspension was pipetted onto a small piece of glass coverslip, and was checked under a microscope. If the droplet contained only one cell, the piece of the coverslip was then transferred into an Eppendorf tube containing the cell lysis buffer. The tube was immediately frozen in an ethanol/dry ice bath and stored at -80°C until use.
Selection of genes for mRNA profiling
Genes used in the present study were selected based on previous publications [15–28], and are those involved in fundamental cell functions such as cell cycle, apoptosis, cell matrix, DNA repair, DNA replication, somatic recombination, RNA transcription and regulation, and protein translation and regulation. The borders between exons and introns for the selected genes were determined by aligning of the mRNA to genomic sequences using the BLAT program  maintained by the University of California, Santa Cruz.
Primer and probe design
A computer program was written for primer and probe selection. Each pair of PCR primers was designed to amplify sequences in two adjacent exons flanking a large intron and to ensure specific amplification of the desirable mRNA sequences rather than the respective gene or unprocessed RNA sequences. To enhance the amplification specificity, the program always searches for candidate amplicon sequences separated by large introns in each gene. The melting temperatures (Tm's) for all selected primers ranged from 50.1°C to 61.6°C, and the GC-contents ranged from 32% to 70%. The lengths of the amplicons ranged from 72 to 150 bases.
Each oligonucleotide probe for microarray analysis was designed to consist of sequences of two adjacent exons to specifically interrogate the cDNA from corresponding mRNA sequence, but not the corresponding gene sequences or cDNA from unprocessed RNA. To facilitate microarray analysis, the 3'-ends of all probes terminated before a "G" base in the template sequence so that they can be labeled with the same fluorescent color by incorporating fluorescently labeled Cy5-ddCTP. The lengths of the probes ranged from 22 to 31 bases, and the GC-content of the probes ranged from 30% to 70% with their Tm's from 54.4°C to 65.2°C.
The BLAST executable program and sequence databases were downloaded from NCBI website  and installed to a local server. All the primers were subjected to BLAST search both in the human genome and the transcriptome databases to avoid amplification of nonspecific genomic or RNA sequences including pseudogenes and their RNA products. In addition, all primers and probes were subjected to interaction analysis with a computer program developed for designing high-throughput multiplex nucleotide acid detection . Probes complementary to intron regions of some genes were also designed as negative controls. All amplicon sequences were subjected to BLAST search to ensure their uniqueness. Details about the primer and probe design for the high-throughput multiplex nucleic acid detection may be found in our previous publication .
Gene-specific reverse transcription and multiplex RT-PCR
Cells in the lysis buffer described above were lysed with three repeating cycles of alternating one-min incubations from the ethanol/dry ice mix to a 37°C water bath before RT-PCR. One-step RT-PCR was carried out in a 50-μl reaction containing primers (20 nM each) for all the 1,135 mRNA species, 2.5 mM MgCl2, the four dNTPs (400 μM each), and 2.0 μl QIAGEN OneStep RT-PCR Enzyme Mix without degenerated primers. The samples were first incubated at 50°C for 40 min for cDNA synthesis, and then were heated to 95°C for 15 min to inactivate the reverse transcriptase and activate the Taq DNA polymerase followed by 45 PCR cycles. Each PCR cycle consisted of 40 sec at 94°C for denaturation, and 1 min at 55°C and 5 min of ramping from 55°C to 70°C for annealing and extension. A final extension step was carried out at 72°C for 3 min at the end of the PCR. All PCRs were performed with the PTC100 Programmable Thermal Controllers (MJ Research). Single-stranded DNA (ssDNA) was generated by using the same conditions in multiplex PCR except for the templates that were 10 μl of the multiplex RT-PCR product. Only one primer for each sequence was used, and 40 thermal cycles were carried out.
RT-PCR with individual gene transcripts
RT-PCRs with individual gene transcripts were performed for a group of genes with different amounts of signal intensities detected from the two cell lines, NCI/ADR-RES and MCF-7. For each gene, an aliquot (equivalent of 100 cells) from the same cell lysate used for multiplex gene expression profiling was used. Conditions for one-step RT-PCR were similar to those for multiplex one-step RT-PCR. mRNAs transcribed from β-actin and α-tubulin genes served as internal controls. The PCR products were assayed by gel electrophoresis. Gels were imaged using an Image Station (Model 440, Kodak, New Haven, CT, USA). Gel band intensities were digitized with the software, Kodak 1D 3.5.
Microarray design, hybridization, and probe labeling by single-base extension assay
Oligonucleotide probes were printed onto glass slides in duplicate with a spot diameter of 160 μm and a center-to-center distance of 250 μm by using the OmniGrid Accent Microarrayer (Gene Machines, CA). One hundred fourteen spots with only microarray printing buffer without probes were used as negative controls and were distributed spatially evenly across each array. Microarray analysis was performed according to a four-step procedure established in our laboratory . Briefly, (1) preparation of microarray slides: Pre-cleaned Gold Seal Micro slides (Becton Dickinson) with no scratch were chosen, and were soaked in 30% bleach with shaking for 1–2 hrs followed by rinsing five times with deionized H2O and three times with MilliQ H2O. Slides were then sonicated in 15% Fisher brand Versa-Clean Liquid Concentrate with heat on for 1–2 hrs followed by rinsing with shaking 10 times in deionized H2O and five times in MilliQ H2O. Slides were dried by centrifugation at 1,000 rpm for 5 min with a slide holder in a GS-6 Beckman centrifuge, and then were baked at 140°C in a vacuum oven for 4–5 hrs (Fisher Scientific Model 280A); (2) microarray preparation: each oligonucleotide probe was mixed with the Microarray Printing Solution (GenScript, Piscataway NJ) at a 1:5 ratio (v/v) to a final concentration of 50 μM in a well of a 384-well plate. Probes were then arrayed onto the washed glass slides with humidity between 50% and 55%, and temperature between 24.5°C to 26.5°C; (3) hybridization: Each glass slide with probe arrays was placed into a Corning slide cassette. Hybridization was performed in 30 μl of 1× hybridization solution (5× Denhart's solution, 0.5% SDS, 3 × SSC, 20 μl of ssDNA at 56°C for 2 hrs. The cassette was briefly soaked in iced water before opening. The slide was then washed with 1 × SSC and 0.1% SDS at 56°C for 10 min, rinsed twice with 0.5 × SSC for 30 sec and twice with 0.2 × SSC for 30 sec; and (4) probe labeling by single base extension: microarrays consisting of oligonucleotide probes were covered with 25 μl 1× labeling solution containing 20 units of Sequenase, 1× Sequenase buffer (GE Healthcare Life Sciences, Piscataway, NJ), and 750 nM Cy5-ddCTP (Applied Biosystems, Foster City, CA). The labeling reaction was performed at 70°C for 10 min. The slide was washed again under the same conditions used after hybridization.
Microarray scanning and data analysis
Microarrays were scanned with a GenePix 4000 scanner (Axon Instruments, Foster City, CA). The resultant images were digitized with the accompanying software Genepix Pro (version 4.0). The mean values of the signals from the duplicate spots of each probe were used for the analysis in Tables 1 and 2. Background signal was determined by using negative control probes that were complementary to the intron sequences of the corresponding genes or random sequences, and was subtracted from the sample signals. For the comparative expression analysis of the cell lines MCF-7 and NCI/ADR-RES in Table 1, the array data were normalized by the Lowess smoothing method [45, 46]. After background subtraction, genes with negative values of signal intensities in both duplicated samples were excluded for further analysis. The log ratios of the intensities of the remaining genes in two cells lines were used to make calls and to identify the differentially expressed genes in the samples.
Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a cDNA microarray. Science. 1995, 270: 467-470. 10.1126/science.270.5235.467.
Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996, 14 (13): 1675-1680. 10.1038/nbt1296-1675.
Goncalves I, Duret L, Mouchiroud D: Nature and structure of human genes that generate retropseudogenes. Genome Res. 2000, 10 (5): 672-678. 10.1101/gr.10.5.672.
Harrison PM, Hegyi H, Balasubramanian S, Luscombe NM, Bertone P, Echols N, Johnson T, Gerstein M: Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res. 2002, 12 (2): 272-280. 10.1101/gr.207102.
Zhang J, Finney RP, Clifford RJ, Derr LK, Buetow KH: Detecting false expression signals in high-density oligonucleotide arrays by an in silico approach. Genomics. 2005, 85 (3): 297-308. 10.1016/j.ygeno.2004.11.004.
Stalteri MA, Harrison AP: Interpretation of multiple probe sets mapping to the same gene in Affymetrix GeneChips. BMC Bioinformatics. 2007, 8: 13-10.1186/1471-2105-8-13.
Elbez Y, Farkash-Amar S, Simon I: An analysis of intra array repeats: the good, the bad and the non informative. BMC genomics. 2006, 7: 136-10.1186/1471-2164-7-136.
Singh R, Maganti RJ, Jabba SV, Wang M, Deng G, Heath JD, Kurn N, Wangemann P: Microarray-based comparison of three amplification methods for nanogram amounts of total RNA. Am J Physiol Cell Physiol. 2005, 288 (5): C1179-1189. 10.1152/ajpcell.00258.2004.
Lander ES: Array of hope. Nat Genet. 1999, 21 (1 Suppl): 3-4. 10.1038/4427.
Symmans WF, Ayers M, Clark EA, Stec J, Hess KR, Sneige N, Buchholz TA, Krishnamurthy S, Ibrahim NK, Buzdar AU: Total RNA yield and microarray gene expression profiles from fine-needle aspiration biopsy and core-needle biopsy samples of breast carcinoma. Cancer. 2003, 97 (12): 2960-2971. 10.1002/cncr.11435.
Gustincich S, Contini M, Gariboldi M, Puopolo M, Kadota K, Bono H, LeMieux J, Walsh P, Carninci P, Hayashizaki Y: Gene discovery in genetically labeled single dopaminergic neurons of the retina. Proc Natl Acad Sci USA. 2004, 101 (14): 5069-5074. 10.1073/pnas.0400913101.
Wang HY, Luo M, Tereshchenko IV, Frikker DM, Cui X, Li JY, Hu G, Chu Y, Azaro MA, Lin Y: A genotyping system capable of simultaneously analyzing >1000 single nucleotide polymorphisms in a haploid genome. Genome Res. 2005, 15 (2): 276-283. 10.1101/gr.2885205.
Shumaker JM, Metspalu A, Caskey CT: Mutation detection by solid phase primer extension. Hum Mutat. 1996, 7 (4): 346-354. 10.1002/(SICI)1098-1004(1996)7:4<346::AID-HUMU9>3.0.CO;2-6.
Pastinen T, Raitio M, Lindroos K, Tainola P, Peltonen L, Syvanen AC: A system for specific, high-throughput genotyping by allele-specific primer extension on microarrays. Genome Res. 2000, 10 (7): 1031-1042. 10.1101/gr.10.7.1031.
Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004, 351 (27): 2817-2826. 10.1056/NEJMoa041588.
Hu Y, Hines LM, Weng H, Zuo D, Rivera M, Richardson A, LaBaer J: Analysis of genomic and proteomic data using advanced literature mining. J Proteome Res. 2003, 2 (4): 405-412. 10.1021/pr0340227.
Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286 (5439): 531-537. 10.1126/science.286.5439.531.
Perou CM, Jeffrey SS, van de Rijn M, Rees CA, Eisen MB, Ross DT, Pergamenschikov A, Williams CF, Zhu SX, Lee JC: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA. 1999, 96 (16): 9212-9217. 10.1073/pnas.96.16.9212.
Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA: Molecular portraits of human breast tumours. Nature. 2000, 406 (6797): 747-752. 10.1038/35021093.
van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002, 347 (25): 1999-2009. 10.1056/NEJMoa021967.
Sotiriou C, Powles TJ, Dowsett M, Jazaeri AA, Feldman AL, Assersohn L, Gadisetti C, Libutti SK, Liu ET: Gene expression profiles derived from fine needle aspiration correlate with response to systemic chemotherapy in breast cancer. Breast Cancer Res. 2002, 4 (3): R3-10.1186/bcr433.
Sotiriou C, Neo SY, McShane LM, Korn EL, Long PM, Jazaeri A, Martiat P, Fox SB, Harris AL, Liu ET: Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci USA. 2003, 100 (18): 10393-10398. 10.1073/pnas.1732912100.
Hedenfalk I, Ringner M, Ben-Dor A, Yakhini Z, Chen Y, Chebil G, Ach R, Loman N, Olsson H, Meltzer P: Molecular classification of familial non-BRCA1/BRCA2 breast cancer. Proc Natl Acad Sci USA. 2003, 100 (5): 2532-2537. 10.1073/pnas.0533805100.
Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin S, Osborne CK, Chamness GC, Allred DC: Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet. 2003, 362 (9381): 362-369. 10.1016/S0140-6736(03)14023-8.
Coradini D, Daidone MG: Biomolecular prognostic factors in breast cancer. Curr Opin Obstet Gynecol. 2004, 16 (1): 49-55. 10.1097/00001703-200402000-00010.
Wajapeyee N, Somasundaram K: Pharmacogenomics in breast cancer: current trends and future directions. Curr Opin Mol Ther. 2004, 6 (3): 296-301.
Dowsett M: Designing the future shape of breast cancer diagnosis, prognosis and treatment. Breast Cancer Res Treat. 2004, 87 (Suppl 1): S27-29. 10.1007/s10549-004-1580-9.
Piccart MJ, Sotiriou C, Cardoso F: New data on chemotherapy in the adjuvant setting. Breast. 2003, 12 (6): 373-378. 10.1016/S0960-9776(03)00139-5.
Greenawalt DM, Cui X, Wu Y, Lin Y, Wang HY, Luo M, Tereshchenko IV, Hu G, Li JY, Chu Y: Strong correlation between meiotic crossovers and haplotype structure in a 2.5-Mb region on the long arm of chromosome 21. Genome Res. 2006, 16 (2): 208-214. 10.1101/gr.4641706.
Hu G, Wang HY, Greenawalt DM, Azaro MA, Luo M, Tereshchenko IV, Cui X, Yang Q, Gao R, Shen L: AccuTyping: new algorithms for automated analysis of data from high-throughput genotyping with oligonucleotide microarrays. Nucleic Acids Res. 2006, 34 (17): e116-10.1093/nar/gkl601.
Gene Expression Omnibus (GEO) Main page. [http://www.ncbi.nlm.nih.gov/geo/]
Liscovitch M, Ravid D: A case study in misidentification of cancer cell lines: MCF-7/AdrR cells (re-designated NCI/ADR-RES) are derived from OVCAR-8 human ovarian carcinoma cells. Cancer Lett. 2007, 245 (1–2): 350-352. 10.1016/j.canlet.2006.01.013.
Aguilar JC, Perez-Brena MP, Garcia ML, Cruz N, Erdman DD, Echevarria JE: Detection and identification of human parainfluenza viruses 1, 2, 3, and 4 in clinical samples of pediatric patients by multiplex reverse transcription-PCR. J Clin Microbiol. 2000, 38 (3): 1191-1195.
Cerveira N, Ferreira S, Doria S, Veiga I, Ferreira F, Mariz JM, Marques M, Castedo S: Detection of prognostic significant translocations in childhood acute lymphoblastic leukaemia by one-step multiplex reverse transcription polymerase chain reaction. Br J Haematol. 2000, 109 (3): 638-640. 10.1046/j.1365-2141.2000.02051.x.
Pallisgaard N, Hokland P, Riishoj DC, Pedersen B, Jorgensen P: Multiplex reverse transcription-polymerase chain reaction for simultaneous screening of 29 translocations and chromosomal aberrations in acute leukemia. Blood. 1998, 92 (2): 574-588.
Malhotra K, Foltz L, Mahoney WC, Schueler PA: Interaction and effect of annealing temperature on primers used in differential display RT-PCR. Nucleic Acids Res. 1998, 26 (3): 854-856. 10.1093/nar/26.3.854.
Tietjen I, Rihel JM, Cao Y, Koentges G, Zakhary L, Dulac C: Single-cell transcriptional analysis of neuronal progenitors. Neuron. 2003, 38 (2): 161-175. 10.1016/S0896-6273(03)00229-0.
Clipsham RC, McCabe ER: Single-tube gene-specific expression analysis by high primer density multiplex reverse transcription. Mol Genet Metab. 2001, 74 (4): 435-448. 10.1006/mgme.2001.3261.
Pantel K, Cote RJ, Fodstad O: Detection and clinical importance of micrometastatic disease. J Natl Cancer Inst. 1999, 91 (13): 1113-1124. 10.1093/jnci/91.13.1113.
Menard S, Pupa SM, Campiglio M, Tagliabue E: Biologic and therapeutic role of HER2 in cancer. Oncogene. 2003, 22 (42): 6570-6578. 10.1038/sj.onc.1206779.
To MD, Done SJ, Redston M, Andrulis IL: Analysis of mRNA from microdissected frozen tissue sections without RNA isolation. Am J Pathol. 1998, 153 (1): 47-51.
Wu H, Hait WN, Yang JM: Small interfering RNA-induced suppression of MDR1 (P-glycoprotein) restores sensitivity to multidrug-resistant cancer cells. Cancer Res. 2003, 63 (7): 1515-1519.
Human BLAT Search. [http://www.genome.ucsc.edu/cgi-bin/hgBlat?db=hg8]
blast Basic Local Alignment and Search Tool. [http://www.ncbi.nlm.nih.gov/BLAST/]
Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30 (4): e15-10.1093/nar/30.4.e15.
Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 (Suppl): 496-501. 10.1038/ng1032.
The authors thank Drs. Jinming Yang, Hao Wu, and William Hait for providing MCF-7 and NCI/ADR-RES cells. This work was supported in part by Grants R01 HG02094 from the National Human Genome Research Institute, and R33 CA 96309 and R01 CA77363 from the Nation Cancer Institute, National Institutes of Health, USA, to H.L.
GH and QY designed and carried out the study, contributed to the probe design concept, and manuscript preparation. HL and GY contributed to the concept for designing primers. MA and HL contributed to the algorithm development in primer design for multiplexing amplification. HW contributed to the establishment of the microarray system. XC contributed to the establishment of the multiplex amplification. HL directed the study and contributed to manuscript preparation. All authors read and approved the final manuscript.
Guohong Hu, Qifeng Yang contributed equally to this work.
Electronic supplementary material
Additional file 1: The 1,135 genes and probes used for expression profiling assay. The 1,135 genes included in the study and their names, chromosomal locations, probes, and their characteristics used for the high-throughput gene expression profiling of these genes. (XLS 351 KB)
Additional file 2: Primers used for expression profiling of the 1,135 genes. primers and their characteristics used for the high-throughput gene expression profiling of the 1,135 genes. (XLS 392 KB)
Additional file 3: Signal intensities from expression profiling of the 1,135 genes. Signal intensities from expression profiling of the 1,135 genes in the two cell lines MCF7 and NCI/ADR-RES and a single cell from NCI/ADR-RES. (XLS 674 KB)
About this article
Cite this article
Hu, G., Yang, Q., Cui, X. et al. A highly sensitive and specific system for large-scale gene expression profiling. BMC Genomics 9, 9 (2008). https://doi.org/10.1186/1471-2164-9-9
- Gene Transcript
- Ovarian Cancer Cell Line
- Multiplex Amplification
- Neighboring Exon
- Microarray Signal Intensity