A Marfan syndrome gene expression phenotype in cultured skin fibroblasts

Background Marfan syndrome (MFS) is a heritable connective tissue disorder caused by mutations in the fibrillin-1 gene. This syndrome constitutes a significant identifiable subtype of aortic aneurysmal disease, accounting for over 5% of ascending and thoracic aortic aneurysms. Results We used spotted membrane DNA macroarrays to identify genes whose altered expression levels may contribute to the phenotype of the disease. Our analysis of 4132 genes identified a subset with significant expression differences between skin fibroblast cultures from unaffected controls versus cultures from affected individuals with known fibrillin-1 mutations. Subsequently, 10 genes were chosen for validation by quantitative RT-PCR. Conclusion Differential expression of many of the validated genes was associated with MFS samples when an additional group of unaffected and MFS affected subjects were analyzed (p-value < 3 × 10-6 under the null hypothesis that expression levels in cultured fibroblasts are unaffected by MFS status). An unexpected observation was the range of individual gene expression. In unaffected control subjects, expression ranges exceeding 10 fold were seen in many of the genes selected for qRT-PCR validation. The variation in expression in the MFS affected subjects was even greater.


Background
Aneurysm and dissection are major diseases of the aorta and are often asymptomatic until a life-threatening event like ischemic organ damage or rupture occurs. Marfan Syndrome (MFS) is a diverse yet clinically recognized subgroup of people at risk for aneurysm, including dissecting aneurysm, and constitutes a significant fraction (estimated at 5-7.5% [1,2]) of all individuals with ascending and thoracic aortic aneurysmal disease. MFS incidence is estimated to be 1 in 5-10,000 [3]. Our long-term goal is to develop an assay that will identify people at risk for aneurysm before the disease process has reached an advanced state. This report is a small step in that direction.
In this study, we focus on individuals diagnosed with Marfan syndrome. The prevalence of MFS combined with its clinical recognition makes it an excellent model system for studies on aneurysmal disease. MFS is an autosomal dominant heritable disorder caused by mutations in the fibrillin-1 (FBN1) gene [4,5], with more than 500 unique mutations identified [6]. FBN1 mutations show a high degree of penetrance but considerable inter-and intrafamilial variability in their phenotype [3]. The variable penetrance suggests that environmental factors and/or disease modifying genes also contribute to the phenotype. Neonatal MFS correlates to mutations within exons 24-32 and MFS defined by mutations in exons 59-65 carry a reduced risk of aortic pathology. Large-scale comparisons between MFS individuals with premature termination mutations and cysteine substitutions in FBN1 revealed significant differences in ocular, skeletal and hypermobility features but no difference in the frequency of ascending aortic aneurysm [7,8]. Apart from these observations, determining the nature of the mutation (a time and labor intensive process) does not improve prediction of the severity of the disease, the risk of aneurysm development or of its progression [7,9]. These limited genotype-phenotype correlations suggest that genes other than FBN1 may significantly influence the phenotype, and their identification may lead to a more informative test of risk.
Fibroblasts are not smooth muscle cells. However, in culture they display a stable phenotype with stress fibers composed of cytoplasmic actins and a splice variant of cellular fibronectin [10]. The increased mechanical stress on dermal fibroblasts seeded at low density produces a cell culture population consisting of 70-80% myofibroblasts. The term "myofibroblast" was proposed over 30 years ago to describe the fibroblasts that appeared in granulation tissue at the sight of open wounds [11]. Recently, it has been recognized that Thy-1 surface expression defines a subpopulation of fibroblasts capable of differentiating into myofibroblasts [12]. We can detect Thy-1 expression in both affected and unaffected skin fibroblasts by array and have confirmed that observation by quantitative real time polymerase chain reaction (qRT-PCR, data not shown). Thus, the skin cultures we used were "myofibroblast" like.
In the last several years, use of DNA microarrays to analyze gene expression has emerged as a promising technology for disease classification and prognosis and for identification of genes that could be potential causes, biomarkers or drug targets [13,14,[14][15][16][17][18]. However, there are limits to the sensitivity of microarrays for detecting genes expressed at low levels as well as additional confounding problems associated with arrays [19][20][21][22][23]. Consequently, in common with most recent studies [20,24], we go beyond mere classification by independently validating expression levels using quantitative qRT-PCR and validating the results in a second population.
In the present study, we used total RNA in oligo dT primed cDNA reactions to identify an expression phenotype associated with the MFS genotype in cultured skin fibroblasts. Our results show a clearly recognizable expression phenotype in cultured fibroblasts. We of course do not expect exactly the same expression phenotype in aortic smooth muscle cells, but we do expect some overlap in the perturbed pathways, as they share the same root cause. Some of the identified genes, including elastin and several collagens, are obvious targets for roles in the development and maintenance of the extracellular matrix environment and cell-matrix contacts. Additional genes validated by qRT-PCR analysis, including the vitamin D receptor, programmed cell death-10 and the LIM domain only 7, may represent genes that will provide new insight into the disease process. Our ability to detect a MFS expression phenotype in cultured fibroblasts provides both a simple method for large-scale screening and a basis for mechanistic studies of the genes identified as differentially expressed.

Results
An overview of our experimental design is shown in Figure 1. In brief, we analyzed total RNA from 36 subjects by spotted membrane DNA macroarray. We used Research Genetics spotted cDNA membrane arrays to analyze gene expression in fibroblast cultures, 17 with characterized mutations in the FBN1 gene (7 missense, all cysteine substitutions, 9 nonsense and 1 multi-exon deletion; 14 of 17 had an aortic phenotype) and 19 unaffected controls (details in Table 1). We selected differentially expressed genes between MFS and unaffected samples based on estimated False Discovery Rate (FDR) [25]. We calculated the 4132 p-values for the two-sample t-tests associated with the expression differences between affected and control for each gene on the array. At a q-value (estimated FDR) threshold of 0.001, 283 genes (265 with identifiable unique Unigene ID numbers) out of a total of 4132 are selected as differentially expressed in MFS vs. unaffected. The top ranking genes based on ratio are listed in Table 2 [see Additional File 1 for the entire list]. At the same qvalue threshold, we found 175 differentially expressed genes (165 with unique Unigene ID) between missense and unaffected samples, and 111 differentially expressed genes (106 with unique Unigene ID) between nonsense and unaffected samples. The q-values are only approximate, since their validity depends on various assumptions such as normality of the expression levels within groups. To globally assess the reliability of these test results while making fewer statistical assumptions, we performed a permutation test (see Methods) to determine if the large number of significant genes we found could easily have arisen by chance. Of 50,000 random permutations for MFS and unaffected samples, no permutations have more than 283 differentially expressed genes, making the array results highly statistically significant, with an empirical pvalue smaller than 0.00002. Furthermore, the only permutations exhibiting a substantial number of significant genes were those permutations that happened to separate most of the MFS subjects from most of the unaffected controls. Over forty five thousand permutations showed no significant genes, and the average number called significant was only 0.45, making it very likely that our estimated false discovery rate of 0.001 is conservative. [See Additional File 1 for the complete 4132-gene dataset.]

Quantitative RT-PCR
We performed duplicate quantitative RT-PCR assays on total RNA from 74 subjects (Table 1) to measure the concentration of 12 mRNAs [see Additional file 2]. We chose 10 genes based on the degree of ratio change, small qvalue and the availability of Applied Biosystems predetermined assay reagents (PDARs). We included two additional genes, GUSB and TBP as internal references. Behaviors of these genes in the array are summarized in Table 3. The qRT-PCR data for a given subject are generally positively correlated with the array results for the same 10 genes from the same RNA sample, with a mean Spearman (rank) correlation of 0.46 (0.35 among unaffected subjects, 0.52 among MFS subjects). Figure 2 summarizes qRT-PCR results for all 74 subjects, and Table 4 additionally summarizes qRT-PCR results for the array and new subjects separately.
It is noteworthy that the expression values in the qRT-PCR assays do not appear to be normally distributed. E.g., boxand-whisker plots of normally distributed data analogous to Figure 2 would be expected to show less than 1% of the samples as "outliers" (the triangles plotted outside the "whisker" ranges, which encompass ± 2.7 times the estimated standard deviation), whereas Figure 2 has 3.7 times as many outliers among the UC samples and 9.4 times as many among MFS samples (also see Table 5). In response to this apparent non-normality, we used the non-parametric Wilcoxon rank sum test to evaluate the significance of gene expression differences between groups, a more conservative choice than the usual t-test in such a circumstance. Under the null hypothesis that data for the two groups are sampled from the same (unspecified) continuous distribution, the test is sensitive to a shift in the location (e.g. mean and median) of the distributions (but not necessarily sensitive to a shift in variance, as seen in most genes here). Table 4 shows that when all of the samples are combined, 6 of 10 genes were validated. Under the null hypothesis that the genes are independent and not influenced by MFS status, the chance that the qRT-PCR results would confirm 6 of 10 predictions as being statistically significant (Wilcoxon p value < 0.05) is less than 2.8 × 10 -6 . Fold changes and Wilcoxon p values for all 10 genes are shown in Table  4 based on all 74 subjects and separately, on 30 of the original 36 array subjects and 32 new subjects (Table 1). Among 10 genes selected based on the array experiments, 8 show consistent direction of difference by qRT-PCR on the same array subjects, and 5 of these 8 genes are statistically significant. Similarly, 6 of these 10 genes show the same direction, and are significant, when tested on new subjects, as are 6 of 10 when tested on all subjects. Four of the 10 genes are significant in all three sets of subjects showing that the array experiments were effective in detecting MFS associated genes.
The two genes marked N/A show marginally statistically significant changes in Array Subjects but very highly significant changes in the opposite direction in New Subjects. One of these genes is Fibrillin-1, the key gene implicated in MFS. In our array experiments, FBN1 was significantly repressed (1.89 fold, q = 2.52 × 10 -10 ; Table  4) and qRT-PCR confirmed that its (geometric) mean level was 1.47 fold lower in MFS based on (predominantly) the same subjects (p = 0.052, Table 5 "Array Subjects"). However, qRT-PCR shows that in "New Subjects" this gene is significantly more highly expressed in MFS than controls (2.71 fold higher, p = 3.3 × 10 -9 ). Viewed another way, over all subjects, the difference in mean FBN1 levels between affected and unaffected subjects is not statistically significant, but 22 of 42 MFS subjects have values more extreme than the most extreme value in the unaffected samples (9 lower, 13 higher). It seems likely that MFS subjects exhibit considerable heterogeneity in FBN1 mRNA levels and our original and new subject populations may not be equivalent samples. As was previously shown, nonsense mutations lead to nonsensemediated decay of the mutant mRNA and reduced fibrillin protein synthesis [26][27][28]. On the other hand, all the known FBN1 missense mutations used in this study are cysteine substitutions in calcium-binding EGF domains.
FBN1 transcript levels and fibrillin protein synthesis were normal in these cells [8,26,27]. As the distribution of mutations types in the second set of clinically identified MFS subjects is unknown, they may not replicate the distribution of the two mutation classes in the original MFS subjects. An additional factor is a difference in cell culture passage number. Nearly all the UC and all new MFS subjects were used at passage 2 while the original MFS cultures were used between passage 3 and 6. The second gene marked N/A in Table 4 is procollagen C-endopeptidase and quantitative RT-PCR (T). The thirty-two additional subjects ("new subjects") were added for the validation phase of the study.
(PCOLCE), another gene with a clear mechanistic role in maintenance of the extracellular matrix [28]. It is plausibly co-regulated with FBN1, showing a moderately strong correlation to FBN1 in our qRT-PCR data. Given these uncertainties as to the homogeneity of our subject population with respect to these two genes, we considered that overall ratios and p-values were potentially misleading; hence we chose to omit them.
A surprising result of the qRT-PCR measurements, evident in Figure 2 and analyzed in more detail in Table 5, are the variability of expression of these genes, both in unaffected and especially in MFS subjects. We looked at both the full range of the data and at the interquartile range (IQR; the range between the 25 th and 75 th percentiles, quantifying the spread in the central majority of the population, a statistic that is very robust to the influence of outliers  relevant phenotypic predictors, since small samples may be non-representative. For the purposes of the present study, variability complicates interpretation of the results for similar reasons. For example, in our previous discussion of FBN1 we remarked that it did not meet our criterion for a statistically significant shift in expression level in the "All Subjects" grouping, although it remains likely that FBN1 level is influenced by MFS status and specific types of FBN1 mutations. Similar remarks apply to other genes, where increased variability across MFS subjects suggests these genes are involved in disease processes, but not by a mechanism that consistently elevates or consistently represses expression. These large variations in expression may limit the extent of our interpretations and will require more focused studies and/or additional statistical power before an association with the MFS phenotype can be established firmly. Alternatively the expression distribution might be influenced by other factors such as the genotypes of FBN1 or disease modifying genes that we have not yet identified.

Discussion
We conducted a small-scale gene expression analysis in MFS affected subjects. We hypothesized that subsets of the expression phenotype might indicate genetic and epigenetic factors that are causal or predictive of the MFS diseased state, and more generally, of the aneurysmal state.
To conduct this test, we hypothesized that primary cul-   Table 1 for subject details.) Bold: array validation or significant p value (p < 0.05). Italic: significant p-value but difference opposite to the expected direction. N/A: see text. The bottom section of the table provides a summative assessment: the (binomial) p-value of the observed number of significant qRT-PCR validations (p < 0.05) under the null hypothesis of no between-group differences. We selected 10 genes for validation based on their performance on the array. Column headings are as in Table 2, except PDAR: Applied Biosystems predetermined assay reagent identifier.
tures of skin fibroblasts derived from punch biopsies could be a simple, robust model. An important advantage to using this system is the ability to collect samples from large numbers of both affected and unaffected subjects. Fibroblasts express FBN1, and distinct fibrillin protein phenotypes have been identified in fibroblast cultures from MFS individuals [26,27,29]. The protein profiles have been linked to distinct classes of FBN1 mutations and clinical phenotypes [8,30].
We used spotted membrane DNA array screening to identify gene expression changes associated with FBN1 mutations. The most important outcome of this study is the identification of a small number of differentially expressed genes that distinguished skin fibroblasts of MFS affected individuals from the fibroblasts of unaffected controls. Our DNA array experiments identified a subset of genes that were validated in a new group of subjects. Due to the size of our study we chose to focus on the expression pattern of affected vs. unaffected and chose 10 genes for validation.
We can speculate on how a number of the validated genes could contribute to the MFS phenotype in a mechanistic way. MFS is a heritable fibrillinopathy where defects in the synthesis and/or assembly of fibrillin-1 microfibrils lead to impairment in the elastic fibrils that confer resilience and recoil in elastic tissues [31,32]. For over a decade it was thought that the microfibrillar component of elastic fibers provided a three-dimensional scaffold for the assembly of elastin in the process of elastogenesis [33]. The histopathological abnormalities of aneurysmal vessels include abnormal extracellular matrix protein accumulation, fragmentation and disorganization of the elastic fibers in the medial layer of the vascular wall, and a generalized loss of elastin content [31,32]. Analysis of mouse models of fibrillin-1 deficiency suggested that the primary defect is not in elastogenesis but rather in elastic fiber homeostasis [34]. Characterization of the mouse model led to a model of acquired elastolysis [35] with evidence suggesting that the altered elastic fiber structure could change the expression phenotype of the underlying vascular smooth muscle cells, resulting in the increased expression of several proteases. A number of investigators have identified the association of specific types of FBN1 mutations with an increased susceptibility to protease digestion [9] and with the ability of FBN1 fragments to up regulate protease expression in culture [36]. In contrast, our study identifies elastin as one of the most significantly regulated genes in the affected cell lines. Surprisingly, elastin mRNA is markedly decreased (Table 4; (Table 4; 2.5 -1 , p = 7.3 × 10 -3 ). Vitamin D is known to decrease elastin mRNA by decreasing message stability in cultured cells [37]. Vitamin D treatment also is known to decrease both elastin content and the number of elastic lamellae in the aortas of animals treated pre and postnatally [38].
Recent publications have also described an increasing role for transforming growth factor beta (TGF-beta) in the expression of the MFS phenotype. Mutations in both TGFbeta receptors have been identified in humans with heritable forms of aneurysm [39][40][41]. In a mouse model of MFS the mutant phenotype includes developmental abnormalities of the distal alveolae of the lung, associated with increased TGF-beta protein and activity [42,43]. Our arrays did not detect any significant difference in TGF beta expression level, while a related family member, INHBA, was not validated by the qRT-PCR analysis. More recently, TGF-beta antagonists, including a neutralizing antibody to TGF-beta and losartan (an angiotensin II type receptor ATI blocker) were able to prevent and reverse aneurysmal progression in the same mouse model [44]. We found significant differences in the expression of a number of genes in this pathway including VDR (Table 4; 2.53 -1 , p = 7.3 × 10 -3 ) and TSC2 (Table 2; 1.64, q = 1.3 × 10 -4 ). The vitamin D receptor (VDR) is a negative regulator of TGF-beta transcriptional activation [45] and as mentioned above is significantly decreased in MFS skin fibroblasts. In contrast, the tuberous sclerosis complex 2 (TSC2), a potent activator of TGF-beta transcription, is expressed more highly in MFS fibroblasts (Table 2; 1.64, q = 1.3 × 10 -4 ). Another gene, LMO7 (a protein that connects the nectin-afadin and E-cadherin-catenin systems through alpha-actinin and therefore regulates cell adhesion) [46] is induced by TGF-beta and is significantly elevated in MFS fibroblasts (Table 4; 2.77, p = 1.7 × 10 -6 ). The behavior of all three genes is consistent with enhanced TGF-beta activity that could contribute to the aneurysmal phenotype if smooth muscle cell loss or differentiated phenotype contributes to the pathological process.
PDCD10 (programmed cell death gene 10) is increased in the MFS samples (Table 4; 1.41, p = 5.0 × 10 -5 ). PDCD10 was recently identified as one of three genes responsible for a heritable form of cerebral cavernous malformations (CCM3) characterized by abnormally enlarged capillary  Figure 2). If the data were normally distributed, approximately 1% of the points would so identified, i.e., on average 0.32 per gene for UC and 0.42 per gene for MFS samples.
cavities causing seizures and hemorrhages [49]. These vascular malformations appear to result from a failure in vascular morphogenesis and/or remodeling. In CCM, the endothelial tubes continue to expand and stabilizing pericytes are not recruited. The result is a vessel with an enormously dilated lumen and increased fragility [50].
When we initiated these studies we had two important concerns. First, a priori, it is not obvious that cultured skin fibroblasts would exhibit significant expression differences for other genes in MFS subjects. However, the skin is one of the connective tissues affected in MFS, and our experiments confirm a significant MFS expression phenotype in cultured fibroblasts. The second concern is that these differences may not be relevant to aneurysmal disease, believed to be largely a disease of smooth muscle cells. Obviously, we cannot be sure that the gene changes discovered in fibroblasts are the same gene changes found in aortic smooth muscle cells, but, given that the two cell types share commonalities in their morphology, extracellular matrix environment and the common fibrillin defect, it is reasonable to suspect that they also share some relevant core changes in fibrillin-related pathways. While our fibroblast study cannot answer this question, it did identify some differentially expressed genes whose known roles are suggestive of involvement in SMC tissue failure. A third concern arose during the course of the studynamely, the large sample-to-sample variation in gene expression levels. This variability needs to be recognized as an important confounding factor in expression analysis studies. It also supports our decision to use skin fibroblasts rather than aortic SMC as our sample source.

Conclusion
Our ultimate goal is to use genome-wide expression analysis to identify and classify people at risk for developing aneurysms of the ascending thoracic aorta. Our preliminary data support the idea that there are common mechanisms triggered by mutations in FBN1 that lead to identifiable differences in gene expression between unaffected control and MFS affected cultured fibroblasts. Some of these changes appear to be downstream from TGF-beta activation. While the limited phenotype described here is only a partial description of the potential "mutation associated" expression profile, the results suggest that a complete genomic screen of several hundred MFS vs. unaffected cell lines, using FBN1 mutation detection, microarray expression analysis and qRT-PCR validation of possible biomarkers, could lead to the identification of genes that contribute to the mechanistic events that initiate vessel wall destruction.

Cell culture, RNA isolation, and array hybridization
Study participants were recruited under an institutional review board approved protocol and informed consent. A skin punch biopsy was used to establish a fibroblast cell culture. Unaffected control subjects were selected from patients visiting dermatology clinics. All of the cell strains (in the array group, between passage 2 and 6 and in the New Subject group, passage 2) were grown to confluence in SmGM2 +10% FBS + bullet kit containing recombinant EGF, FGF and insulin (BioWhittaker). After reaching confluence, the medium was modified to include ascorbic acid at 50-μg/ml [51]. Fresh medium was added after 24 hours. At 48 h, total RNA was isolated using a guanidinium iso-thiocyanate-phenol-chloroform extraction protocol [52]. Each sample was analyzed for quality and quantity by UV spectroscopy and gel electrophoresis. For hybridization to individual arrays, 1 μg of total RNA was used to synthesize 33 P-dCTP labeled first strand cDNA with Invitrogen Superscript II. Generally, each sample was labeled twice and hybridized to duplicate Research Genetics GF211 arrays (4 arrays in all). Each array was hybridized with 30-60 million cpm of probe for 18 h then washed extensively at 50°C in 0.5 × SSC +0.1% SDS. Multiple exposures were collected on a Storm phosphorimager. Images were imported using Research Genetics Pathways 3 software, and processed as described in "Array Data Normalization."

Quantitative RT-PCR
Validation of the array results was done using Applied Biosystems pre-designed qRT-PCR primer and probe sets (PDAR's, see ABI P/N 4333458_a.pdf for the complete protocol). These assays use a standard PCR format where all the assays are performed with the same PCR cycling conditions. Quantification is based on calculating a ratio to an included calibrator sample. Our calibrator was 50% Stratagene Universal Target total RNA and 50% total RNA pooled from the 36 array subjects listed in Table 1. We synthesized first strand cDNA using random primers and the ABI High Capacity cDNA Archive Kit (P/N 4322171). Duplicate PCR reactions were set up in a final volume of 25 ul using 10 ng equivalents of input total RNA. We used the ABI 7900 machine with the standard cycling protocol, increasing the number of cycles to 40. GUSB and TBP were used as an internal reference in all of the samples and calibrator. The delta delta Ct method [54] was used to calculate ratios of each gene in each sample against the calibrator RNA. Both references gave similar results and we present the GUSB quantification.

Array data normalization
Normalization refers to the process of attempting to correct for the many sources of systematic variation in cDNA array. In outline, we do the following. The median of log 10 -transformed expression values for each gene across all exposures of all arrays is calculated, and serves as a common baseline for comparison between experiments. Then, for each exposure, we compute a smooth "local" (loess) regression line between its log intensities and the median levels. The resulting regression function transforms the expression values of each exposure to the scale of the common baseline while capturing nonlinear effects. Finally, we combine data from multiple exposures of an array by taking their medians. This technique builds on earlier work [54,55] and more details are reported in [56]. One innovation introduced here is the following: to reduce the impact of outliers and differentially expressed genes on the inferred levels of other genes, the loess regression is based on only a "stable" subset of genes uniformly chosen across all intensity levels. Specifically, for each gene we calculate the rank of its measured intensity in each exposure, and the median absolute deviation (MAD) of its rank across exposures. We sort all n genes by increasing median intensity and partition them into n/5 groups of consecutive genes. From each group of 5, the gene with the minimum MAD statistic is chosen. This selection of n/5 genes comprises the stable subset of genes on which our regression is based. Other approaches to normalization based on stable genes have been proposed [57,58] but those techniques seem more complex, and choose stable genes in ways that potentially yield sparse coverage of some regions of the intensity spectrum, consequently increasing the influence of anomalous or differentially expressed genes on the normalization in those regions. As intensity-dependent non-linearity is a key concern in our normalization, uniform representation of intensities in the stable subset is valuable.
Correlation analyses and ANOVA tests identified possible technical biases in a subset of hybridizations. As a consequence, hybridization data from seven Marfan and two unaffected samples were eliminated from further study (indicated by blanks in the platform columns in Table 1). Most of these subjects were included in the qRT-PCR analysis and appear in the "All Subjects" category of Table 4.

Differential expression and permutation analysis
We used standard t-test p-values to rank genes for evidence of differential expression, coupled with FDR analysis, and selected a conservative q = .001 cutoff threshold. We chose the classical t-test for its familiarity and simplicity. The non-parametric Wilcoxon rank sum test yielded very similar gene rankings. We decided against the commonly used SAM-statistic [59] because of an interest in genes with low expression levels. SAM is biased somewhat against such genes [60].) To avoid dependence on parametric assumptions about the array data, we performed a permutation test to deter-mine if the large number of significant genes we found could easily have arisen by chance. For a comparison of subject groups A and B, we combined A and B and then randomly split the combined group into two groups A' and B', such that groups A and A' are the same size, groups B and B' are the same size, and the technical replicates of any one sample are either all in A' or all in B'. Hence, the disease status labels were randomly permuted to determine whether the true disease was responsible for the observed result of 283 genes with q < 0.001. We then computed the number of differentially expressed genes at the same cutoff threshold (FDR < 0.001) in each such random partition.

qRT-PCR data analysis
As part of the statistical analysis of the qRT-PCR data, we performed per-gene regression tests for age, sex and/or group effects on expression levels (in addition to MFS status). There were no significant sex effects in any of the tests. Age effects were marginally significant for two genes, but not after Bonferroni correction, and furthermore would not have altered the significance of MFS status even if included. For some genes, inclusion of subject in "new" vs. "original" subject groups (Table 1) was significant, but again correction for this effect did not alter the significance of MFS status. Hence, we have chosen to present results only for the simplest model, which does not attempt to adjust for any of these covariates.
Significance of differential expression was quantified by Wilcoxon rank sum test; see Figure 2, Table 4. "Overall significance" results reported in the last line of Table 4 follow from a simple binomial model.