Expression analysis of secreted and cell surface genes of five transformed human cell lines and derivative xenograft tumors

Background Since the early stages of tumorigenesis involve adhesion, escape from immune surveillance, vascularization and angiogenesis, we devised a strategy to study the expression profiles of all publicly known and putative secreted and cell surface genes. We designed a custom oligonucleotide microarray containing probes for 3531 secreted and cell surface genes to study 5 diverse human transformed cell lines and their derivative xenograft tumors. The origins of these human cell lines were lung (A549), breast (MDA MB-231), colon (HCT-116), ovarian (SK-OV-3) and prostate (PC3) carcinomas. Results Three different analyses were performed: (1) A PCA-based linear discriminant analysis identified a 54 gene profile characteristic of all tumors, (2) Application of MANOVA (Pcorr < .05) to tumor data revealed a larger set of 149 differentially expressed genes. (3) After MANOVA was performed on data from individual tumors, a comparison of differential genes amongst all tumor types revealed 12 common differential genes. Seven of the 12 genes were identified by all three analytical methods. These included late angiogenic, morphogenic and extracellular matrix genes such as ANGPTL4, COL1A1, GP2, GPR57, LAMB3, PCDHB9 and PTGER3. The differential expression of ANGPTL4 and COL1A1 and other genes was confirmed by quantitative PCR. Conclusion Overall, a comparison of the three analyses revealed an expression pattern indicative of late angiogenic processes. These results show that a xenograft model using multiple cell lines of diverse tissue origin can identify common tumorigenic cell surface or secreted molecules that may be important biomarker and therapeutic discoveries.


Background
The process of tumorigenesis has long been recognized to depend upon complex interactions of a tumor with its non-transformed tissue environment [1]. Beyond trans-formation and increased proliferation, many pathways are activated both in the growing tumor and its environment to culminate in an established solid tumor. For example, adhesive pathways are activated to enable transformed cells to aggregate and form a microtumor. Subsequently, microtumors must avoid destruction by the immune system and elicit vasculature formation for continued growth [2,3]. In support of these events, cell-matrix adhesion proteins, cell surface antigens, angiogenic factors and modulatory agents have been found differentially expressed in several experimental models of tumorigenesis [4][5][6] and in tumor biopsy samples relative to control tissues [7,8]. Experimental models with established tumorigenic human cell lines have compared the gene expression profiles between the cultured parental cells and after implantation into immune-deficient murine hosts [6]. In this study, we examined this problem with a more focused approach with respect to the transcripts as well as a broader survey by examining multiple tumor sources in order to identify differential genes common to multiple solid tumors in a xenograft model of tumorigenesis.
To recapitulate the attachment and growth of a micro-or metastatic tumor, our experimental tumorigenesis model examined human xenograft tumors in nude mice. It is believed that primary or metastatic microtumors about 1 mm 3 in size are metastable; they are either (i) resolved by the immune system, (ii) remain in a steady-state with balanced proliferation and apoptosis or (iii) undergo aggressive growth as long as a vasculature is developed to provide nutrients to the growing mass [9]. Since the endpoint of the xenograft assay is the formation of a solid tumor, genes supporting vasculogenesis and angiogenesis are likely differentially expressed relative to the parental cell lines that were adapted to culture in vitro. However, the extent of vascularization to support an established tumor will vary according to the tumor type and tissue environment as a result of variable levels of proteases, receptors or regulators of pericyte and/or endothelial migration, proliferation, and differentiation [3,10]. Additionally, some tumors such as early grade astrocytomas can leverage existing normal brain blood vessels without substantial vasculogenesis for subsequent angiogenic sprouting of new vessels from preexisting vessels [11]. Further, vascularization depends upon a tuned interaction in the tissue microenvironment between endothelial cells and pericytes [12,13]. Vascularization of solid tumors may also be heterogeneous with a rapidly growing margin surrounding a hypoxic core following regression of coopted vessels that supported early tumor growth [10]. Complicating this picture is the potential for Vascular mimicry' where breast tumor derived cells express endothelial markers and may serve as rudimentary channels [14].
Many angiogenesis studies have used cultured primary vascular endothelial cells and shown the significant roles of VEGF, FGF, PDGF, chemokines and cell-matrix adhesion proteins [3,15,16]. These assays for endothelial cell migration include the chorioallantoic membrane [17], matrigel migration assays [18] or 3D-collagen assays [19]. However, the limits of studying the angiogenic process with established endothelial cells in vitro have been recognized. Tumorigenesis involves both heterophilic and homophilic cellular communication and adhesion between not only endothelial cells but also pericytes and smooth muscle cells; hence other cell surface proteins and secreted factors are absent from such assays [3].
A search for tumorigenic genes common to tumors of diverse origin should be as broad as possible and hence should not be limited to a single tumor type or tissue source. In order to find common tumorigenic genes regardless of tissue origin, we chose to study a panel of 5 adenocarcinoma cell lines from breast, colon, and lung, ovarian and prostate tumors. These cell lines reproducibly yield solid tumors in a standard xenograft assay in immuno-compromised mice [20][21][22]. While there may be individual differences in capillary branching or density between tumor types, the xenograft assay requires vascular development to support solid tumor formation in a relatively avascular subcutaneous site.
Since the early tumorigenic events largely rely upon secreted factors, cell surface receptors or integral membrane proteins, we devised a strategy to employ a custom microarray to focus on the expression of genes chosen on the basis of their cellular localization. Hence, we implemented an experimental microarray strategy with high replication and coverage of all possible secreted and cell surface proteins. Also, focusing on all known and predicted cell surface and secreted genes allowed us to design more intra-chip replicates for improved data reliability. While prioritizing on the 'Function' category of the Gene Ontology [23], the range of 'Biological Processes' covered by the gene selection remained broad. In contrast to early concerns that a sub-selection of genes might result in a systemic bias, relatively small numbers of genes were found to be common to all xenograft tumors due to the robust experimental design and statistical analysis.

Results
We developed a custom 60-mer oligonucleotide microarray to focus on an ontologically restricted set of secreted and cell surface genes for higher data reliability using a matrix design with intra-chip replicates in addition to replicate chips. Due to the limits of the Gene Ontology classification, multiple strategies had to be used to derive a relatively complete collection of secreted and cell surface genes. For example, some proteins have multiple localization sites on the basis of newer experimental evidence absent from curated databases; e.g. SORCS3, HDGF. For proteins with multiple cellular localizations, the literature (PubMed, NCBI) was the annotation source for finding other secreted and cell surface proteins. Finally, other putative secreted and transmembrane-encoding genes and exons were analyzed from hypothetical predictions from the UCSC Human Genome. Redundant genes were removed by a combination of blastn/blastp comparisons and manual curation, but many putative membraneencoding exons of potential proteins were included. A final tally of 3531 genes was composed of 1057 secreted genes, 1338 G-protein coupled receptor (GPCR) genes with the remainder classified as various integral membrane proteins and cell surface proteins. An ontological view of the custom chip's content is shown in Fig. 1. Finally, in consideration of potential global changes of a selected set of genes, numerous positive and negative controls were included in the array design; including genes characteristic of some tumors (e.g. the estrogen receptor for a subset of breast tumors) and many 'housekeeping' transcripts (e.g. β-actin) commonly used to normalize quantitative PCR studies. However, co-hybridizing all samples with a reference cDNA derived from a mixture of 10 human cell lines enabled 'normalization' with respect to feature, chip, and dye for the MANOVA analysis. This strategy minimizes the potential concern for a skewed normalization by a sub-selected gene population or possible differential behavior of the included 'housekeeping' genes in the xenograft tumors.
Gene ontology of custom chip probes Figure 1 Gene ontology of custom chip probes. The ontological classification of 3531 cell surface or secreted genes was extracted from the Gene Ontology at the third level. Genes lacking GO annotations at this level were derived from level 2.

Identification of characteristic tumor-specific genes by all tumor data or individual tumor types by multivariate analysis of variation
We performed several multivariate analyses of the microarray data to find characteristic tumorigenic genes. The MAANOVA tools [24] were chosen for their sensitivity and robustness in measuring differential expression versus previous T-test and log-ratio methods using thresholds for induction or suppression. This was particularly important in these studies that used a relatively complex design with on-chip and inter-chip probe replication, multiple tumor samples and tumor types, dye-swap and a common reference RNA sample for all hybridizations. Thus, this strategy helps avoid any systematic bias from using a chip containing probes for only secreted and cell surface genes. We developed a custom database [25] to allow dynamic re-grouping of data to facilitate multiple analytical models such as all tumor data or individual tumor types and their parental cell lines.
Initially, we identified the differentially expressed genes in all tumors relative to all parental cells regardless of tissue origin. Hence compared all the xenograft tumor data to all the parental cell line data without regard to tumor type. Similarly, both the tumor and parental cell line data were compared to the all reference cDNA hybridization data. These data were analyzed by both principal components analysis (PCA) and multivariate analysis of variance (MANOVA).

Principal components analysis
To visualize all tumor and parental cell data and assess overall quality, we subjected the entire dataset to principal components analysis. As shown in Fig. 2, a discrete segmentation of the data into 3 major aggregates corresponding to xenografts (circles), parental cell lines ("X's") and the universal reference cDNA (solid dots) can be identified. The third principal component shown by the vertical Y-axis provided the best separation between parental cell data and the xenograft tumor data, Fig. 2.

Linear discriminant analysis
In order to identify a profile characteristic of xenograft tumors where the combination of multiple genes might be more predictive than any single gene, we performed a linear discriminant analysis. Hence, we iteratively 'trimmed' versions of the third principal component since it had the highest correlation to sample type. The 'trimmed' list of coefficients were tested to determine their accuracy in assigning samples to either the tumor or cell line categories. This analysis retained 70 of the largest coefficients of the third principal component and represents a simple linear discriminant (LD) of 70 probes that corresponds to 54 genes. The profile of 70 probes fairly accurately distinguishes between the two sample types of parental cell lines and xenograft tumors, Fig. 3A. In 'leaveone-out' testing where each of the 99 samples was removed in separate analyses, this method generated a profile that was 79.8% accurate in predicting a xenograft tumor. The same method applied to 1000 label-permuted datasets never exceeded 65% accuracy with a median and minimum accuracy of 49% and 39.3% respectively. This suggests that the gene profile generated by our analysis can distinguish between the xenograft data and the cell line data in a verifiable manner.

Ontological classification of genes identified by a linear discriminant
The 54-gene profile derived from the linear discriminant (LD-p54) was distributed amongst numerous biological processes using the Gene Ontology classification terms, Table 1. Many genes were classified in multiple biological process categories as a result of their biological complexity; e.g. fibronectin (FN1) is classified into 8 biological processes including cell motility, response to stress, cell communication, response to external stimuli, extracellular matrix structural constituent, protein binding and glycosaminoglycan binding. Other genes are involved with cell adhesion or extracellular matrix, cellular growth or the regulation of cellular proliferation, various membrane proteins with known or inferred functions, transporters or channels, and proteases or protease inhibitors. A nonredundant ontological classification of the genes identified by the linear discriminant is shown with a graphical representation of their behavior across all tumor types, Fig  3B. Since the linear discriminant analysis uses a weighted sum, not all of the identified genes behaved consistently across all xenograft tumors; e.g. CD164 or COL4A1, Fig  3B.

Analysis of variation of all xenograft data
The expression data was also subjected to ANOVA using all xenograft and parental cell line data. In this analysis, the type of tumor or parental line was ignored. This analysis identified 156 probes representing 149 differentially regulated genes at the 99.9% confidence level, Table 2. The range of induction or suppression of this set of genes (ANOVA-p149) was 6-fold induction and 5-fold suppression. Twenty-nine of the 54 genes found by the above linear discriminant analysis were found in the list of 149 ANOVA-qualified probes. An ontological clustering of the ANOVA-p149 genes revealed patterns of proteases and protease inhibitors, cell-matrix adhesion genes, receptors, ion channels, various ligands including chemokines and interleukins, additional angiogenic genes and several genes of unknown function, Tables 3 and 4 show the major ontological groups.

Verification of selected genes by quantitative PCR analysis
The differential expression of selected genes was confirmed by quantitative real-time PCR using the same RNA samples subjected to microarray hybridization. The vast majority of the genes tested by PCR validated the array analysis, Fig. 5. In some instances, discrepancies in foldinduction can be explained by methodological differences since the array data were all normalized to the co-hybridized universal-RNA sample, while the PCR data were nor-malized to a β-actin probe (data not shown). Differential expression of ANGPTL4, GP2, GNAO1, CCR4, FGF23, SPP1 and COL1A1 were qualitatively consistent in both the PCR and array analyses. However, two of the downregulated genes identified by the array analysis, both Gprotein coupled receptors, were found by PCR to be elevated, albeit with large variability; GPR10 was induced 281-fold SD = 469 and GPR110 induced 50-fold SD = 105. Of the two down-regulated genes examined by Principal components analysis of array data quantitative PCR, CD81 was consistent in both assays, while CD44 was measured by PCR as unchanged or minimally induced yet array analysis indicated CD44 was suppressed. However, the aggregate 2-fold CD44 induction as measured by quantitative PCR is the threshold of what is considered significantly distinguishable from unchanged. Finally, while we did not perform PCR with species-specific probes for every gene present in the ANOVA-p149 list, we were able to confirm differential expression of several human genes from mouse genes such as the osteopontin genes, Fig. 5. While this analysis does not rule out the possibility of partial contamination of the array results by some weak cross-hybridization, to guard against this possibility we carefully designed probes to be species-spe-cific under the stringent hybridization conditions used in this study.

ANOVA analysis of individual tumor types
To accommodate the possibility that tumor type was an important contributor to differential gene behavior, we performed a third analysis by examining the intersection between the differential genes of each individual tumor type. For this restrictive analysis, we simply examined each tumor type relative to its parental cell line by ANOVA. Approximately 91-312 genes were differentially expressed at 99.9% confidence for each cell line: SKOV-3, 125 differential genes; MDA, 312 differential genes; HCT116, 124 differential genes; A549, 159 differential genes; and PC3, 91 differential genes (data not shown). Twelve genes were found in common amongst these separately analyzed tumor types, ANGPLT4, COL1A1, epithelial membrane protein 3 (EMP3), GNAO1, glycoprotein 2 (GP2), GPR57, HAS1, HLAA, laminin beta 3 (LAMB3), PCDHB9, protease inhibitor 3 (PI3), and PTGER3, Table  2.

Comparison of multiple analyses
In a typical analysis of multivariate data, a particular method is often chosen as a filter for subsequent analyses. In this study, due to the high statistical reliability imparted by the high replicate probe count (n = 18 to 30) enabled by the custom array design, we chose to compare the results of 3 different approaches to the intact dataset A level 3 annotation of the biological process Gene Ontology terms was applied to the list. Due to biological complexity, a gene can occur in more than one category.

2: Differentially expressed genes from three analyses. ANOVA of xenograft data vs parental cell lines found 149 differential genes (designated ' Ap'), Linear discriminant analysis found 54 genes (designated 'LD') and ANOVA of individual xenograft tumors yielded a consensus of 12 genes (designated 'Ai'). For each gene, its presence is denoted by '1' and its absence noted by '0'. The maximum MANOVA Pvalue is reported along with the aggregate ratio (designated by 'R'). For genes with multiple independent probes, the probe reporting the maximum Pvalue is shown. Seven genes common to all three lists are in bold text. (Continued)
but modeled as either all data or individual tumor types. An estimate of the statistical significance of the overlap in differentially expressed genes common to the three analytical methods gave a Pvalue of < 1 × 10 -6 as described in the legend to Fig. 6. As shown in Fig. 6, seven of the twelve differential genes found amongst individual tumor ANOVA analyses were common to the linear discriminant gene profile (LD-p54): ANGPLT4, COL1A1, GP2, GPR57, LAMB3, PCDHB9, and PTGER3.
Real-time PCR analysis generally confirmed either induction or suppression in multiple tumor samples but with higher induction ratios; e.g. from Fig. 5, the level of ANGPTL4 was measured by PCR as induced 19 to 453 fold with a average fold induction of 185 SD = 170 for 10 tumors (2 of each type). The aggregate induction of ANGPTL4 in the array analysis was 2.09 fold (Pcorr < 2e-9). Similarly, COL1A1 was measured by PCR as induced in most tumors with an average 9.8-fold (SD = 9.1) versus a 3.64-fold induction found by microarray analysis. Finally, in ovarian and prostate tumors, angiopoietin 2 (ANGPT2) measured by PCR was elevated 6-fold (data not shown) versus the 2.2-fold induction found by microarray analysis.

Discussion
Overall, the pathways represented by the differential genes in xenograft tumors support a model for late angiogenic expression patterns. In light of the collection of xenografts after 28-29 days post-implantation, is not surprising to find patterns of differential gene expression that reflect a portion of the tumorigenic process rather than a preponderance of early transforming events. This premise is largely supported by the abundance of extracellular matrix, cell adhesion and angiopoetic genes common to the three analyses.
Ten of the 12 induced genes identified by the ANOVA of xenografts were either well-characterized functions or biological roles, particularly angiogenesis (ANGPTL4), morphogenesis (LAMB3, COL1A1, PCDHB9, or cellular mobility or communication (HAS1, PTGER3, PCDHB9, and LAMB3). The role of extracellular matrix genes in tumor growth has been previously noted [7,8]. Interestingly, five of the extracellular matrix genes from the linear discriminant analysis were collagens (COL1A1, COL4A1, COL5A1, COL5A2 and COL12A1) and four of these collagens (COL1A1, COL4A1, COL5A1, and COL5A2) have been previously found induced in primary renal cell carcinomas (4.8, 5.0, 3.25 and 3.6 fold respectively [26]. COL1A1 has also been found induced in most breast carcinomas [27,28], and a subset of ovarian and colon carcinomas [28].
Consistent with an overall pattern of late-stage angiogenesis in xenograft tumors, ANGPTL4 was found consistently induced relative to the parental cell lines by all analyses. ANGPTL4 originally was described as an induced target of peroxisome proliferator-activated receptor gamma that is involved in glucose homeostasis and differentiation of adipose tissue [29]. Subsequently ANGPTL4 was shown to possess angiogenic activity in the chick allochorionic migration assay [30]. More recently, ANGPTL4 was shown to bind and inhibit lipoprotein lipase [31], a function consistent with the cachexia induced by tumors, where a reduction of fatty acid incorporation into fat cells serves the energy needs of the tumor rather than the host. ANGPTL4's angiogenic action has been reported to be independent of VEGF in a renal carcinoma model [30]. Similarly to previous observations of induced angiopoietins in primary renal cell carcinomas (ANGPT2 8.18-fold induced and ANGPTL4 18-32-fold induced [26], we found both ANGPTL4 induction (2.09

ANOVA of xenograft data vs parental cell lines found 149 differential genes (designated ' Ap'), Linear discriminant analysis found 54 genes (designated 'LD') and ANOVA of individual xenograft tumors yielded a consensus of 12 genes (designated 'Ai'). For each gene, its presence is denoted by '1' and its absence noted by '0'. The maximum MANOVA Pvalue is reported along with the aggregate ratio (designated by 'R'). For genes with multiple independent probes, the probe reporting the maximum Pvalue is shown. Seven genes common to all three lists are in bold text. (Continued)
fold, Pcorr < 2e-9), and ANGPT2 induction (2.23-fold Pcorr < .005).

Other post-VEGF angiogenic pathways
The role of other elevated angiogenic genes downstream of VEGF bears discussion. The induction of the prostaglandin E receptor 3 (PTGER3-6.4-fold, Pcorr < .001) is of interest since prostaglandins can induce VEGFA production [32,33] via a hypoxia-induced pathway [34]. Coinci-dent with these observations, IGFBP7 which was differential by ANOVA and linear discriminant analysis, modulates IGF mitogenic activity [35]. IGFBP7 also stimulates prostacyclin synthesis [36] perhaps to take advantage of our observed 6-fold increased PTGER3 expression. Similarly, a human-specific probe for TEM5, a marker of tumor endothelial angiogenesis [37], was also found mildly increased (1.37-fold Pcorr < .001) possibly as a result of vasculogenic mimicry [14,38]. Other factors such as FGF can play an angiogenic role. One FGF isoform was found significantly differential in some tumor combinations; FGF7 was elevated in colon and prostate xenograft tumors (1.5-fold, Pcorr < 8.7e-6 and 3.7-fold, Pcorr < 7.5e-7) respectively but 2-fold suppressed in ovarian tumors (Pcorr <.006), Fig. 4. FGF7 was previously shown to stimulate the growth of endothelial cells of small but not large vessels in the rat cornea [39] and hence supports the notion of vascular remodeling versus vasculogenesis. That differential expression of this gene was found only in some tumor combinations is consistent with the concept that each type of tumor will display individual differences in the balance angiogenic activators and inhibitors, yet the end physiological result, increased tumor vascularization, is the same [3]. Finally, as noted above, genes that help destabilize or remodel vessels such as ANGPT2 and ANGPTL4 were induced, consistent with an overall pattern of late-stage angiogenesis.

Linking angiogenic pathways to neuropeptide signaling pathways
Additional support for the late, post-VEGF angiogenic pattern of gene expression in xenografts froms from the observed 5-fold induction of NPY1R by both ANOVA and linear discriminant analyses. NPY1R has been reported to play a role downstream of VEGF in vasoconstriction [40] and capillary sprouting and differentiation [41]. Consistent with the observation of NPY1R induction, the potent effect of ligand neuropeptide (NPY) upon angiogenesis was shown to yield branching vasodilated structures distinct from those generated by VEGF [17]. Similarly, neuropeptide Y has been reported to trigger angiogenesis via the NPY2 receptor in ischemic muscle of mice [41]. Interestingly, neuropilin 1 (NRP1) which can act as a co-receptor with VEGFR2 [42] was found suppressed (1.31-fold, Pcorr < .006) while other VEGF receptor levels were not significantly altered. Finally, previous expression profile studies have found NPY1R to be substantially induced in many breast, prostate and pancreatic carcinomas [28].
Additionally, two other differential genes involved in neuropeptide signaling were observed: melanocortin-2 receptor (MC2R)and SORCS3/neurotensin receptor gene. Both MC2R and the SORCS3 were found differentially expressed by ANOVA. MC2R is a GPCR that binds the ACTH peptide while SORCS3 is a homolog of the rat sortilin gene with VPS10 domains characteristic to neuropeptide-binding proteins [43][44][45]. ACTH has been found to increase angiogenesis of cultured endothelial cells in a 3D-collagen assay [19] and other neuropeptides have been implicated in stimulating VEGF in prostate cancer cells [46].

Conclusion
In this study we compared the expression profiles of secreted and cell surface genes from five different tissue sources. Multiple tumors were derived from each parental cell line to examine the potential for tumor heterogeneity arising from the primary isolate, but we found relatively consistent behavior within any tumor group. However, we also found tumor-specific genes for each tumor type while identifying a profile of genes shared amongst all tumor types by multiple analytical approaches. Overall, our results comprise a foundation of commonly regulated tumorigenic genes across tissues such as fundamental angiogenic inducers and regulators. Given the diverse and complex expression behavior of primary human tumors from any single tissue source [27,28], in the future it will be necessary to examine several established lines from many histologically similar primary tumors as well as different tumor types from the same tissue. Similarly, it will be important to compare the effect of orthotopic implantation sites to the subcutaneous injection site in these preliminary studies. To resolve xenograft microheterogeneity, microarray analysis of micro-dissected xenograft or primary tumors can be used. Micro-dissection will also allow the assessment of potential vasculogenic mimicry by aggressive tumor cells that can express endothelial genes [38]. Additionally, the xenograft model can be more readily extended to monitor time-dependent expression profile changes in the development of tumors.
Such results can be used in combination or as a filter with other biomarker technologies such as tissue arrays [47] or mass spectroscopy [48] to fully characterize clinical specimens for diagnostic or prognostic purposes. By identifying genes known to participate in angiogenesis and tumorigenesis, our work establishes a baseline to evaluate and compare the full spectrum of gene profile changes in xenografts and clinical specimens. Hence, time and tissuespecific gene and protein profiles may be useful for the discovery of both biomarkers and new therapeutic strategies.
Quantitative PCR analysis of selected genes Figure 5 Quantitative PCR analysis of selected genes. Two tumors of each tumor type were analyzed by quantitative PCR. The measured fold change relative to cell line was determined. RNA amounts per well being normalized by betaactin signal. In general <2-fold changes are not significant. Hence a call of 1.5 fold down may not actually differ from 1.5 up. Specific tumor types are indicated by the first initial followed by the tumor number: i.e. C1 = colon tumor #1, O1 = ovary tumor #1, L1 = lung tumor #1, B1 = breast tumor #1, P1 = prostate tumor #1.
Overlap of differentially expressed genes identified by three analyses Figure 6 Overlap of differentially expressed genes identified by three analyses: ANOVA-p149 = 149 genes derived from the ANOVA analysis of all data, LD-p54 = linear discriminant list of 54 genes from all data, and ANOVA-i12 = twelve genes resulting from a comparison of differentially expressed genes from the ANOVA analysis of individual tumors compared to parental cell lines. An estimate for the statistical significance for the overlap of differentially expressed genes by the 3 analytical methods was estimated by calculating the product of individual probabilities for the results of each analytical method applied to 3531 genes. The null hypothesis in this case is that each method's "call" as to a given gene's differential expression is independent of the call made by the other two methods. Thus if p1, p2, and p3 represent the chance that each method calls a given gene as differentially expressed (easily estimated as number of genes called/ number of total genes), the chance that all three methods do so is simply pAll = p1*p2*p3 = (54/3531)*(149/3531)*(12/3531) = 2.193e-5. Under our null hypothesis, the total number of genes called by all three methods k will follow a binomial distribution with parameters p = pAll, n = 3531 where P(k = L) ~ Bin(pAll, N). Standard calculation techniques allow us to calculate a p-value for this; i.e. p = P(k > = K) -the chance under the null hypothesis we see as much or more overlap than was actually observed. For our data, we thus have p = P(k> = 7) < 1E-6. Thus, if the methods identified random noise as differential expression, they would be very unlikely to produce the overlap observed, thus supporting the statistical significance of the results. The heat maps indicate relative fold-induction or suppression in a linear color-encoded scale shown at the bottom. Mean ratios are indicated by X, C = colon, B = breast, L = lung, P = prostate, O = ovary.

Custom array design
A two-stage strategy was employed to design the custom oligonucleotide microarray chip. First, for the known secreted and cell surface proteins, we performed keyword filtering of the gene descriptions and annotations of curated public databases such as SwissProt/Trembl [49], the Gene Ontology tables [23], the UCSC Human Genome assembly (hg13, NCBI Build 31 [50]), the GPCR database [51] and public gene tables from technical supply vendors (Affymetrix, Agilent and Illumina). Some of the keywords used were "secreted", "trans-membrane", "glycosylated" and "olfactory". Redundancies and false positives were removed by manual curation.
In order to accommodate continued optimization of a custom chip design, we chose a chip platform that met several criteria: it must allow rapid changes to the master template even for small production batches, possess relative high density, exhibit strong signal-to-noise properties and have high reproducibility (CV < 10%). Hence, a custom oligonucleotide microarray chip (Agilent, Palo Alto, CA) was designed using the curated collection of secreted and cell surface proteins with human-specific 60-mer probes derived from the 3' 1500 nt region of each mRNA sequence. The custom chip was designed with a matrix of technical probe replicates and multiple probes for some genes; e.g. 2 or 3 probes with 1, 3 or 5 copies each per array represented some genes. All probes were curated by elimination of sequences with unfavorable Tm properties, predicted secondary structure or homo-polymer regions. Finally, Blastn [52] analysis was used to confirm human specificity by comparison to mouse sequences. labeled cDNA made from universal RNA. For the cognate dye-swap experiment on the right array, a Cy-5 labeled biological specimen was co-hybridized with Cy3-labeled cDNA made from universal RNA. No tumor samples were mixed with any other tumors.

Cell lines and mice
To enable identification of differentially expressed genes with higher statistical reliability, we performed a matrix of hybridizations. The hybridization matrix follows: for the 5 SK-OV-3, A549 and PC3 tumor specimens, 3 of the tumor samples were hybridized to 2 chips each (hence 4 arrays per tumor sample) while 2 tumor samples were hybridized to a 1 chip each (hence 2 arrays for each of these tumors). For the 4 MDA MB-231 tumor specimens, 3 of the tumor samples were hybridized to 2 chips each and 1 tumor was hybridized to a single chip of 2 arrays. For the 3 HCT-116 tumor specimens, all 3 tumors were hybridized to 2 chips each (4 arrays each). For the parental cell lines, HCT-116 cells were hybridized to 6 chips (12 arrays) while the other cell lines were hybridized to 2 chips each (4 arrays). Since most probes were present minimally in triplicate on each array, whenever a tumor sample was hybridized to 2 chips n = (3*4) = 12 per probe. However, since dye-swap hybridizations were routinely performed, n = 6 for the Cy3 and Cy5 signals respectively.

Ontology annotation
Unigene Gene names were classified by the consistent terms of the Gene Ontology(tm) consortium [23] and the fatiGO interface to the Gene Ontology [56].