Molecular signature of clinical severity in recovering patients with severe acute respiratory syndrome coronavirus (SARS-CoV)

Background Severe acute respiratory syndrome (SARS), a recent epidemic human disease, is caused by a novel coronavirus (SARS-CoV). First reported in Asia, SARS quickly spread worldwide through international travelling. As of July 2003, the World Health Organization reported a total of 8,437 people afflicted with SARS with a 9.6% mortality rate. Although immunopathological damages may account for the severity of respiratory distress, little is known about how the genome-wide gene expression of the host changes under the attack of SARS-CoV. Results Based on changes in gene expression of peripheral blood, we identified 52 signature genes that accurately discriminated acute SARS patients from non-SARS controls. While a general suppression of gene expression predominated in SARS-infected blood, several genes including those involved in innate immunity, such as defensins and eosinophil-derived neurotoxin, were upregulated. Instead of employing clustering methods, we ranked the severity of recovering SARS patients by generalized associate plots (GAP) according to the expression profiles of 52 signature genes. Through this method, we discovered a smooth transition pattern of severity from normal controls to acute SARS patients. The rank of SARS severity was significantly correlated with the recovery period (in days) and with the clinical pulmonary infection score. Conclusion The use of the GAP approach has proved useful in analyzing the complexity and continuity of biological systems. The severity rank derived from the global expression profile of significantly regulated genes in patients may be useful for further elucidating the pathophysiology of their disease.

controls to acute SARS patients. The rank of SARS severity was significantly correlated with the recovery period (in days) and with the clinical pulmonary infection score.

Conclusion:
The use of the GAP approach has proved useful in analyzing the complexity and continuity of biological systems. The severity rank derived from the global expression profile of significantly regulated genes in patients may be useful for further elucidating the pathophysiology of their disease.
SARS-CoV appears to be new to humans, as supported by the finding that human sera collected before the SARS outbreak did not contain antibodies against this virus [3,9]. After an incubation period from 2 to 10 days, SARS patients might develop fever (>38°C), headache, dry cough, and pneumonia [3,5,[9][10][11][12][13][14]. Most patients gradually recovered while some progressed to respiratory distress syndrome with ~10% mortality rate. The genomewide changes in human gene expression when challenged by this novel pathogen are essentially unknown.
Profiles of gene expression patterns help define the complex biological processes associated with both health and disease in vivo. Investigation of host responses to infection with in vitro models have offered insights into mechanisms of pathogenesis, and have highlighted the potential for applications of microarray technology to diagnose infection in vivo [15]. Whitney et al. observed that the variation in gene expression patterns in the blood of healthy subjects was strikingly smaller than the significant changes induced by diseases either in patients with cancer or with bacterial infections [15]. It was conceivable that microarray profiling of gene expression in whole bloods exhibits the potential in monitoring the patients' responses to a disease, especially a novel infection such as SARS.
Many discriminative methods have been developed for analysis of microarray gene expression data in cancer patients and the resulting classifications have been correlated closely with clinical parameters [16][17][18][19]. For instance, the discovery of signature genes for breast cancers through microarray analysis of gene expression has provided us with a more precise clinical staging that will improve the outcome of treatment [20,21]. However, clinical parameters are not always in a discrete pattern but more likely in a continuous fashion, where an absolute classification may not be achievable. Herein we present the use of cDNA microarray analysis of gene expression in whole blood from a cohort of recovering SARS patients, of whom the disease severity appeared to be a continuum. After we had identified the molecular signature of 52 genes that accurately discriminated acute SARS patients from non-SARS controls, we ranked the disease severity of these patients using a generalized association plot (GAP) elliptical seriation algorithm [22] based on the expression profiles of the 52 genes. The derived severity rank of the patients proved to be closely correlated with their clinical parameters, namely, the recovery period (in days) and the clinical pulmonary infection score.

Patient information
Using the cDNA microarrays spotted with duplicated 7,334 cDNA clone, we analyzed RNA specimens successfully amplified in 44 peripheral blood collected from 25 confirmed SARS patients (age ranged from 23 to 80 years old, mean = 41.8, SD = 17.2, median = 34), of whom 24 survived. Except for one patient who died on the 4th day, duration of hospitalization in this cohort ranged from 12 to 51 days (n = 24, mean = 24.5, SD = 10.1, median = 21) (Additional file 1). We defined 11 specimens as acute SARS (AS) using the following criteria: (i) the whole blood RNA from a hospitalized patient was PCR positive for SARS-CoV, or (ii) the specimen was collected within 10 days after the disease onset in patients whose blood was later diagnosed ELISA-positive for anti-SARS IgG. The rest of 33 RNA specimens from SARS patients were labelled as recovering SARS (RS). Our study included 11 normal control (NC) volunteers and 11 patients with bacterial infections (IN) as healthy and non-SARS infection controls, respectively (Additional files 2 and 3).

cDNA microarray analysis
When we compared the gene expression profiles among acute SARS (AS), recovering SARS (RS), bacterial infection (IN), and normal control (NC) groups, we observed the variances of gene expression in both SARS (AS, AS+RS) and bacterial (IN) groups to be equally higher than that in healthy controls (NC) (Fig. 1a). This result indicates that gene expression profiles of either SARS or bacterial groups differed significantly from that of normal controls.
A probe set of 885 genes with standard deviations greater than 0.5 across 66 arrays was selected for further analyses. An average linkage hierarchical clustering tree with Pearson correlation proximity was built on the 33 arrays (11 NC, 11 IN, and 11AS) using these 885 genes (Fig. 1b). The AS and NC groups were well separated into two opposite coherent clusters. Singular value decomposition (SVD) analysis, a dimension reduction method to project gene expression profiles to fewer representative eigenvectors [23], also successfully separated AS, IN, and NC specimens into three clusters with first two eigenvectors (Fig.  1c). Interestingly, the recovering SARS (RS) samples are interspersed among the AS and IN samples.

Significant differences in gene expression profiles in patients with SARS or bacterial infection
To identify which genes were specifically regulated by SARS-CoV, we performed two sets of two-sample Student t-test for means with an unequal-variance assumption. In the first set, we contrasted 11 AS versus 22 non-AS (NC and IN) specimens on all 885 genes. The genes with significant testing results were considered to be specifically induced by SARS-CoV ( Fig. 2a,b). For the second set of ttests, we compared 11 NC with 22 non-NC (IN and AS) specimens. We considered that the change in significant genes identified by the second t-test was induced by both bacterial and viral infections (Fig. 2c,d). Genes identified from these two sets of test were then ranked separately according to the corresponding sets of P-values. Gene expression profiles for the top 20 and the bottom 20 genes from both sets are displayed as Figure 2.

Signature genes and GAP algorithms
A simple k-nearest-neighbour method was used to obtain a near optimal number of 30 genes from the 885 filtered gene set for discriminating specimens between acute SARS (AS) and non-SARS (NC and IN) (Additional file 4). The selected top 30 upregulated (P < 6 × 10 -6 ) and the top 30 downregulated genes (P < 4 × 10 -7 ) from the AS versus non-AS (IN and NC) Student's t-test were used as the specific probe set to assess the status of SARS infection. Eight genes that were also significant in the NC versus non-NC (AS and IN) t-test were excluded, resulting in a specific AS probe set of 52 genes. For the GAP analysis, we calculated pair-wise Euclidean distances among 55 samples (11 AS, 33 RS, 11 NC) using these 52 genes, aiming to identify a one-dimensional order that could reflect the severity structure of the disease (Fig. 3a). Using this GAP elliptical arrangement of 55 specimens (columns), we observed a transition of gene expression patterns of 52 genes (rows) from the left side where NC clustered to the right side where AS accumulated (Fig. 3b). Hierarchical clustering trees guided by self-organized map (SOM) and other clustering methods were also performed to sort the SARS patient samples using the same 52 genes. The Robinson criterion [22,28,29] is often employed to assess the performances of different seriation algorithms. Table 1 (and Additional file 5) shows that the GAP algorithm derived a smoother transition pattern than other methods in the Robinson sense. Thereby, we have derived the SARS severity rank according to the expression profile of 52 signature genes as a whole in each patient, as demonstrated by the smooth transition of expression levels in each (row) of these genes from NC to RS to AS (Fig. 3b).
For validation purposes, we further tested the stability of the rank (order) derived from GAP analysis on the 52 genes for the 55 specimens. The same GAP procedure was repeatedly applied to the top 20 to 200 genes (among the filtered 885 genes) with significant p-values (Student's ttest) between the AS versus non-AS (IN and NC). While the ranks for the 55 specimens obtained from the most significant 20 to 200 genes are highly correlated to each other, they are significantly different from the ranks derived from the 52-gene sets that were randomly selected from the 885 genes (data not shown).
We scrutinized the clinical courses of patients who donated the 10 RS specimens that were scattered among AS (Fig. 3a) and found evidence of underlying severity of the disease in the majority of patients. For example, sample RS43 from a patient who had been discharged from hospital for 2 weeks was still PCR-positive for SARS-CoV; RS54, a PCR-positive sample was not grouped as AS because of the negative ELISA result. RS38, RS40, and RS42 still represented acute SARS infections because they were collected only 1, 2, and 3 days after AS37, AS39, and AS41, respectively. Patients with RS78 and RS91 who had severe SARS courses were hospitalized for 41 and 51 days, respectively. The patient for RS8 was in the second week of disease. The only two unexplained specimens, RS18 and RS71 from the same patient, may represent a unique biological variability, accounting for the misclassification using this 52-gene molecular signature.

Molecular signature for severity and clinical correlations
To test the efficacy of using these 52 genes as the molecular signature for the severity of SARS patients, we identified a significant correlation (P < 1 × 10 -6 ) between the derived rank of SARS severity and the number of days after the onset of disease (Fig. 4a). We further used this rank of SARS severity to examine the recovery trend in 17 recovering patients who had donated multiple specimens (Fig.  4b). Except for the one patient (5.3 % = 1/19, shown as the red line in Fig. 4b  temperature, white blood cell count, volume and appearance of tracheal secretions, oxygenation, chest X-ray, and tracheal aspirate cultures into a clinical pulmonary infection score (CPIS) as a diagnostic tool for pneumonia [30]. We observed that the rank of SARS severity was also significantly (P < 0.001) correlated with the CPIS (Fig. 4c). Col-lectively, these results demonstrate a correlation between the molecular severity rank and clinical factors, suggesting the usefulness of the molecular signature as a genomewide parameter for gauging the severity of SARS patients.

Discussion
Diverse infections can induce a shared core gene expression involving the human innate immune system; each infection may also trigger a pathogen-specific immune response of the host. The innate immune genes were upregulated in both acute SARS (AS) and bacteria infection (IN) patients (Fig. 2c). SARS was a novel viral infection that had not been encountered by the humans in the history before 2003. Intriguingly, most of the genes specifically upreguated in SARS patients were ESTs (13/20 genes) (Fig. 2a), suggesting that the first human encounter with SARS-CoV might provoke a set of human genes that were poorly annotated due to disuse. Annotation of these ESTs may lead to the discovery of novel genes.
Given the high cost of microarray analyses, the detection of a comprehensive gene expression profile may not be cost-effective for clinical diagnosis and evaluation of patients with infectious diseases. However, in a complex system such as the human body where genes interplay through intricate circuitries, it is inadequate to examine only a few routine parameters in biochemistry and blood cell counts for the global physiochemical status of a patient at the time of blood collection. In this report, we applied the GAP method to derive a smooth transition pattern among samples based on the molecular signature consisting of 52 genes, which in turn were used to monitor the severity of clinical courses of SARS patients. Instead of clustering samples into discrete groups in a method similar to commonly-used microarray classifications [31], GAP focuses more on a global orientation of the sampleto-sample relationship. For instance, the AS and RS samples were seriation ranked (Fig. 3), and the rank order proved to correlate well with clinical parameters (Fig. 4).
The GAP-derived rank of severity also provided us with a unique way, where expression of most relevant genes were all considered, to decipher the meaning of the changes in other genes obtained from the same microarray experiment. For instance, we have identified the correlative change in matrix metalloproteinase MMP-7 and MMP-9 (Additional file 6): both can stimulate α-defensin [32]. Importantly, these correlations could not be revealed with other parameters alone, such as number of days after disease onset or clinical score CPIS (data not shown).
In this study, however, there might be technical limitations during RNA isolation from some clinical specimens as well as an unavoidable sample-collecting bias. First, both RNA isolation from SARS specimens and RNA Generalized associated plots (GAP) analysis of SARS patients samples  amplification were performed in the Biosafety Level 3 laboratory, where the instrument for RNA quantitation was not available. This limitation resulted in the failed generation of aRNA from 10 out of 54 SARS specimens (Additional file 1). Unfortunately, these 10 specimens contained 7 specimens from patients at an early (i.e. first 2 weeks) stage [14]. Secondly, 25 SARS patients who donated blood specimens for this study may belong to the milder subgroup of a total of 44 SARS patients in Kaohsiung Medical Center of Chang Gung Memorial Hospital. According to a paper describing the complete cohort of SARS patients [33], intubation and mechanical ventilation were required in 20 out of these 44 patients. However, only two in our 25 patients needed intubation (Additional file 1). The aforementioned two potential limitations may account for why our microarray results could not detect a correlation with a possible worsening clinical course before recovering, which was described by Peiris et al [14].
In conclusion, we propose the use of a molecular signature reflecting the severity of SARS in order to interpret the trends of expression changes in groups of genes within particular functional categories. The use of GAP methodology proved to be instrumental in determining the severity of SARS. The derived severity ranking of SARS patients in turn formed a gradual basis for the analysis of the interaction patterns, providing us with a useful tool for understanding the molecular pathogenesis of this novel viral infection.

Conclusion
We illuminate the human gene expression profiles, in terms of gene expression in peripheral blood, to the unprecedented infection of SARS-CoV. We also discovered a smooth transition pattern of severity from normal controls to acute SARS patients based on the gene expression profiles by generalized associate plots (GAP). The rank of SARS severity was significantly correlated with other clinical parameters.

Anti-SARS-CoV IgG ELISA and real-time quantitative PCR analysis
The antigen used for the SARS detection ELISA was the detergent-extracted and gamma irradiated Vero E6 cells infected with SARS-CoV. Identical preparations from uninfected Vero E6 cells were used as the control. Patients' sera were 1:10 diluted and added to the ELISA plates, and goat anti-human IgG antibody conjugated with horseradish peroxidase (DAKO, Cambridgeshire, UK) was added

Microarray procedures
In this study, we used the GMRCL Human 7K set, Version 2 chips as previously described [35]. Twelve amplified RNA samples from healthy donors (Additional file 3) were pooled as the common reference for every array in this study. A total of 66 aRNA samples including 11 acute SARS (AS), 33 recovering SARS (RS), 11 non-SARS infection (IN), and 11 normal controls (NC) were analyzed with cDNA microarrays as tests against the pooled aRNA (the common reference). Among 66 aRNA preparations, 28 were analyzed with the dye-swapping microarray design. We averaged the log ratios of the duplicated spots on each slide. In the dye swapping experiments, we further averaged the log ratios derived from two slides. We used 400 ng of aRNA for labeling and hybridization using a 3DNA Array 350RP Detection kit (Genisphere, PA, USA), and scanned slides with a confocal scanner ChipReader (Virtek, Canada). We acquired the spot and background intensities with GenePix Pro 4.1 software (Axon Instruments, Inc., CA, USA), and carried out within-slide normalization using programs written with MATLAB 6.0 software (The MathWorks, Inc., MA, USA).
To assure the reproducibility of our microarray system, we got the similar gene expression profiles from replicated samples (RS88) using the hierarchical clustering analysis and also got the highly correlated results (r 2 = 0.84) from two specimens (AS37 and RS38) that were collected from the same patient at a time interval of only one day. We consistently obtained identical results in each of 28 pairs dye-swapping experiments. The complete microarray data is available in Additional file 7.

Hierarchical clustering and singular value decomposition
We performed hierarchical clustering using Cluster and TreeView software [36] with the following parameters: (i) a standard deviation > 0.5 as the filtering cutoff point (885 genes with marked changes selected among 66 arrays), (ii) mean-centered genes and normalized genes, (iii) cluster analysis carried out with uncentered correlation of arrays. We also performed a singular value decomposition (SVD) [23] analysis of the correlation matrix for all 66 samples. The first two eigenvectors weighted by the Correlations between the GAP-derived rank for SARS sever-ity and clinical parameters Figure 4 Correlations between the GAP-derived rank for SARS severity and clinical parameters. (A) The scatter plot of all SARS specimens with the order obtained from the GAP method and the days after the onset of disease showed a significant correlation (P < 5 × 10 -7 ). (B) Sixteen out of 17 SARS patients who submitted multiple blood specimens showed a similar trend of changes in the GAP-derived severity rank along with the recovery from the disease. Patients with 2 (n = 15) and 3 specimens (n = 2) were labeled with blue and green lines, respectively. (C) The scatter plot of all AS and RS specimens with the order obtained from the GAP method and clinical pulmonary infection score (CPIS) showed a significant correlation (P < 0.001).

Euclidean distance matrix by generalized association plots
Robinson criterion [22,28] is frequently used to assess the performances of sorting algorithms on symmetric proximity matrices. A Robinson Matrix, R = [r ij ], is a symmetric matrix such that r ij ≤ r ik if j<k<i and r ij ≥ r ik if i<j<k. The GAP elliptical seriation [22] utilizing the ellipse structure from a singular value decomposition of a converged correlation coefficient matrix usually identifies permuted matrix with a near Robinson form. A brief review on GAP and some details of its applications are available [37].