Plant materials and pathogen isolate
Three soybean genotypes with varying degrees of resistance to Phytophthora sojae were used for this study. V71-370 is a resistant genotype, VPRIL9 is moderately susceptible and Sloan and is highly susceptible[16, 17]. P. sojae isolate, PT2004C2.S1, isolated from Ohio in 2004, and virulent to soybean differentials with Rps1a, Rps1b, Rps1k, Rps2, Rps3a, Rps3c, Rps4, Rps5, Rps6, or Rps7, was used in this study.
Inoculation assays were performed using the slant board technique . Briefly, 7-day-old soybean seedlings, grown in vermiculite, were thoroughly rinsed under tap water and placed in inoculation trays (20–30 plants per sample). The plants were wounded at 2 cm below the beginning of the root zone by scraping the epidermis with a scalpel and then were inoculated with a mycelial agar slurry from a 7-day-old culture or agar alone. Root tissue samples 7.5 mm long were collected at 5 days post inoculation (dpi) from immediately below (the treatment "Lower") and above (the treatment "Upper") the upper lesion margin from each seedling with characteristic lesions. In each case, the root tissue samples from like treatments and harvests were pooled. Thus each set of 30 pathogen-inoculated plants yielded two pooled samples, one pooled from 30 sections from the lower infection courts and one pooled from 30 sections from the upper infection courts. For the mock-inoculated plants (the treatment "Mock"), 15 mm tissue sections were taken spanning from 7.5 mm below to 7.5 mm above the position corresponding to the average lesion length measured from the inoculated samples. Thus each set of 20 mock-inoculated plants yielded one pooled mock sample. All the plants for each block were grown in the same growth chamber (Environmental Growth Chambers, Chagrin Falls, Ohio; Model M-48 with TC2 microcontroller unit) with day and night temperatures settings of 27°C and 21°C, respectively and relative humidity averaging 75 to 90%.
Overall experimental design
The data for this study comes from control inoculations that formed part of a large study of a recombinant inbred line population that was organized into 29 blocks. All plants for a block were raised, inoculated and incubated together in the same growth chamber. Each block included the same three control genotypes (V71-370, Sloan and VPRIL9). Three replicates of the three genotypes were inoculated with P. sojae (generally 30 plants) or mock-inoculated (generally 20 plants), then harvested 5 days later at three different times during the morning of harvest (9 am, 10:30 am, and 12 pm). Each block yielded 27 sets of samples, which were the complete combinations of the 3 genotypes, 3 treatments and 3 time windows. Thus the experiment was balanced with respect to the factors, soybean genotypes, treatment ("Upper", "Lower", and "Mock") and inoculation time window. Since the differences between the samples harvested at 9 am, 10:30 am and 12 pm were small, in this study we refer to the 3 × 29 sets of samples obtained from the 29 blocks as biological replicates. The 24 blocks in which all the control samples passed quality control were used for statistical analyses.
RNA extraction and microarray assay
Total RNA was isolated from each of the pooled plant samples using the QIAGEN RNeasy® Plant Mini Kit, following the manufacturer's protocol for total RNA extraction from plant and filamentous fungal tissue with the following minor modifications. 250 mg of fine tissue powder was used with appropriate scaled-up buffer volumes, and the Buffer RLT-suspended samples were thoroughly vortexed for 2 minutes and then incubated in 56°C water-bath for 3 min and kept in ice for 1 min before the first spin. The quantity and quality of total RNA were checked with both a NanoDrop ND-1000 Spectrophotometer and an Agilent 2100 Bioanalyzer.
Microarray procedures were performed at the Core Laboratory Facility (CLF) of the Virginia Bioinformatics Institute. The standard eukaryotic gene expression assay protocols were followed as described in the Affymetrix GeneChip® Expression Analysis Technical Manual. Briefly, as the first step, biotin-labeled cRNA was generated using the One-Cycle Target Labeling and Control Reagents (Affymetrix) and 1 μg of total RNA. Second, 20 μg of biotin-labeled cRNA was fragmented in Fragmentation Buffer. Third, fragmented cRNA was hybridized to a soybean GeneChip at 45°C for 16 h in an Affymetrix hybridization oven (model 640). Fourth, the GeneChips were washed and stained with streptavidin-phycoerythrin using the fluidics protocol EukGE-WS2v5-450 in the Affymetrix® GeneChip® Fluidics Station 450. Finally, the stained chips were scanned with an Affymetrix GCS3000 7G Scanner. The Affymetrix GeneChip® Operating Software (GCOS, v1.4.0.036) was used to provide instrument control, first-level data analysis, and data management for the entire GeneChip System. Samples within experimental blocks were randomized before being provided to the CLF, to reduce possible block effects arising from processing samples together during the microarray analysis work flow.
Low level data pre-processing
Quality control was performed using a variety of tools from the MAS5 and AffyPLM quality control toolsets. These included analysis of percentage of probes called present, the scaling factor and average background of each chip. In addition, ratios of 3' to 5' probes for β-Actin were used to check for RNA degradation. Calculation of the interquantile ranges of the normalized unscaled standard error and of the relative log expression was used to remove outliers. All chips in the 24 blocks selected for this study passed these QC tests. Low-level analysis of the raw GeneChip data began with filtering the non-expressed genes based on the Affymetrix Microarray Suite version 5 (MAS5) algorithm for calls (as implemented in the Bioconductor package affy, http://bioconductor.org/packages/2.0/bioc/html/affy.html) taking into account the experimental design. The default parameter τ (0.015) in the MAS5 present call was used, and a probe set (which approximately represents a gene, as 95% of the soybean probe sets are unique) was declared to be detectable (or present) if it had a present call in 40% of the chips.
The principal data pre-processing method used in our data analyses was GC-RMA, which included a GC-RMA background correction, quantile normalization, and computation of gene summary values from the corrected probe-level data. Background correction was performed with the model-based procedure  using sequence information as implemented in the Bioconductor package gcrma
http://bioconductor.org/packages/2.0/bioc/html/gcrma.html. Quantile normalization  and Tukey's median-polish algorithm based gene summary values were computed using a modified C program capable of processing several thousand GeneChips simultaneously (L. Bao, unpublished results).
As an alternative method for data pre-processing, we used the Affymetrix MAS5 default algorithms available in the affy package. Global scaling was implemented to assure that all the chips had the same trimmed mean signal intensity using all soybean probe sets.
Our finding that most detectable transcripts show significant changes raises important questions about the assumptions underlying normalization procedures. MAS5 assumes that the sum of the expression levels of all transcripts remains constant. Quantile normalization assumes this also, and assumes in addition that the distribution of probe signals remains constant across all arrays. Invariant set approaches assume that a pre-determined set of genes show no changes. The assumption that the sum of the expression levels of all transcripts remains constant is a highly robust one because the amount of labeled cRNA added to a microarray hybridization is always normalized. Thus even if the amount of mRNA per cell in the source tissue varied (which should be reflected in variation in the sum of the expression levels of all transcripts) this variation is cancelled by the experimental procedure. Accurately capturing changes in the amount of mRNA per cell in source tissue is currently technically infeasible.
To estimate whether normalization had systematically biased the measured transcriptional changes, we determined the mode of the changes. Under the assumption that the most common transcriptional changes will be the smallest in magnitude, the mode should lie close to 1.0 (i.e. no change). In fact the mode in each contrast studied ranged between 0.99 and 1.01.
Linear mixed model analysis of gene expression levels
A total of 648 soybean GeneChips from 24 complete experimental blocks were selected for the high-level data analyses. The high-level analyses were performed separately for each soybean probe set, using Linear Mixed Model Analysis (LMMA) in the PROC MIXED procedure of SAS (SAS 9.1 for Windows, SAS Institute Inc., Cary, NC, USA) . The fixed factors of the linear mixed model included genotype, treatment (Upper, Lower and Mock), and time (9 am, 10:30 am, and 12 pm), and the genotype by treatment interaction. Random factors included in the model were: block, the interactions block by genotype, block by treatment, block by time and block by genotype by treatment, and residual. Variance components were estimated by the method of Residual Maximum Likelihood, and F-tests were performed for all fixed factors.
All F-tests across all genes and all the fixed factors were considered as one family of tests, and the resultant collection of p-values was used to compute adjusted p-values based on two False Discovery Rate (FDR) controlling methods, the linear step-up method of Benjamini and Hochberg  and the two-stage linear step-up method (TST) of Benjamini et al . In addition, the p-values were used to compute the q-values for the positive FDR criterion as described by Storey and Tibshirani . Because the differences between the FDR methods were small, we mainly report results obtained with the TST method.
We also estimated and tested individual contrasts of interest, most of which were linear functions of the cultivar by treatment interactions. Because the F-test for the genotype by treatment interaction was found to be significant for the majority of probe sets (24,669 out of 28,351), we employed a one-step approach for testing the significance of individual contrasts, i.e. we tested all contrasts of interest for all genes rather than for only those genes with significant F-test for the genotype by treatment interaction. We were interested in identifying the differentially expressed genes for the contrast "infection response", which compares gene expression between pathogen and mock inoculation in a given cultivar and infection court, and these are used to determine which genes are up- or down-regulated within a given infection court as a result of the pathogen attack in a given cultivar. The infection responses were split into two subtypes, i.e., Upper vs. Mock, and Lower vs. Mock. The collection of the t-tests for all contrasts across all genes was considered as a family of tests, and significant contrasts were determined with the same FDR methods as those used for the F-tests.
Wilcoxon signed rank tests of differential gene expressions
As an alternative high-level analysis to LMMA, we used the non-parametric Wilcoxon Signed Rank Test using the wilcox.test function (R package stats version 2.6.0). For each gene and within each cultivar, we formed two pairs of samples, Upper and Mock, and Lower and Mock, respectively. Each consisted of the GC-RMA log2 scale signal intensity data for the paired treatments (e.g. Upper and Mock) (sample size = 3 × 24 = 72). The collection of p-values across all genes was considered as a family of tests, and significant contrasts were determined with the same FDR methods as those used earlier. The same procedure was used to perform the Wilcoxon signed rank test and FDR control with the preprocessed data generated by MAS5.
Kolmogorov-Smirnov test for two distribution patterns
We used the Kolmogorov-Smirnov (KS) test to test for significant differences in the distributions of gene expression changes for specific categories of genes. The expression changes in response to pathogen inoculation were considered for genes in which the change was significant by LMMA contrast analysis of infection responses using SAS Proc Mixed. The distributions of expression changes were examined for genes in six functional categories related to infection: "defense and disease", "signal transduction", "transcription", "intracellular trafficking", "cell structure" and "metabolism". For this purpose we used the functional category of each gene drawn from annotation of the Affymetrix soybean GeneChip by the Goldberg group at the University of California, Los Angeles http://estdb.biology.ucla.edu/seed; we have reviewed this annotation extensively and found it very reliable. In each case, the distribution within the functional category was compared to the distribution for the total gene set, including the genes within the functional category, to test the null hypothesis that the observed distribution of expression changes affected all genes irrespective of functional category. This is a more conservative test than testing the distribution of a subset of genes against the remainder of the genes. To test for distribution differences within specific ranges of gene expression changes, three ranges were chosen: all genes with less than two-fold regulation (up or down); all genes with less than 1.5-fold regulation (up or down); and all genes with less than 1.2-fold regulation (up or down). Then the distribution of changes was compared between a functional category and all genes within the chosen expression range. For all these tests we used the ks.test function in the R package stats (version 2.7.0). Because we compared the distribution of the genes in a category to that of all genes, and because the gene expression profiles have complex correlation structures and thus the values for individual genes may not be independent , we computed an empirical p value for each test by the following procedure. For any gene category with n genes, we randomly sampled n genes from the list of all significant genes and recomputed the KS test statistic. This process was repeated 10,000 times, and the collection of ordered KS statistics was then used to compute an empirical p-value for the category. Finally, FDR control was performed on the set of p-values from all KS tests using the TST method .
Quantitative real time RT-PCR (qRT-PCR) assay
To provide a comparison between the microarray assays and another commonly used measure of gene expression differences, qRT-PCR, gene expression differences (infected versus mock) for a set of selected genes were measured by qRT-PCR during a pilot experiment. The pilot experiment included four biological replicates and was conducted using the same methods as described above, with the following modifications: for each of the two cultivars (V71-370 and Sloan) four replicates of 30 plants (grown together in the same growth chamber) were harvested 5 days after inoculation, and a single segment of infected tissue (treatment "Inoculated") comprising the upper and lower infection courts was excised from each plant. RNA was extracted from each pool of 30, then equal amounts of the RNAs were pooled from four biological replicates for microarray analysis. In parallel, an equal number of mock-inoculated plants were harvested and RNA extracted pooled in the same way.
For qRT-PCR assays, 22 soybean genes of interest (including 4 housekeeping (HK) genes) with varied levels of gene expression were selected (Table 5). Equal amounts of RNA from each of the four biological replicates were pooled for the qRT-PCR assays. The primers [see Additional file 9] were designed using the Beacon Designer 4.0 (Premier Biosoft International, Palo Alto, CA) and synthesized by Integrated DNA Technologies, Inc.(Coralville, IA). The amplicons for development of standard curves were prepared first using the purified total RNA as a template and the SuperScript™ III First-Strand Synthesis System for RT-PCR (Invitrogen™ Life Technologies) and then using conventional PCR with the synthesized cDNA as template. The amplicons were purified using a QIAquick PCR Purification Kit (Qiagen) and checked for quality in the Bioanalyser.
qRT-PCR assays were carried out by the Virginia Bioinformatics Institute Core Laboratory Facility. Briefly, total RNA (1 μg) was transcribed to cDNA using first strand cDNA synthesis reagents (Invitrogen Corp., Carlsbad, CA) in a total volume of 20 μl. Standard curves were produced with serial 10-fold dilutions of cDNA products starting from 10 pg/μl. Each 25 μl qRT-PCR reaction consisted of 300 nM sense and anti-sense primers, 1 μl 10× diluted cDNA, and 12.5 μl SYBR Green I PCR Mastermix (Applied Biosystems, Foster City, CA). Each reaction was run in triplicate for both the standard and samples. PCR reactions were performed on a Bio-Rad i-cycler (BioRad, Hercules, CA) under the following conditions: 95°C for 3 min, 40 cycles of 95°C for 10 s, 56°C for 45 s to calculate cycle threshold (CT) values, followed by 95°C for 1 min, 55°C for 1 min, and 80 times of 55°C for 10 s, increasing temperature by 0.5°C each cycle to obtain melt curves. The BioRad iCycler IQ 3.1 Optical System Software was used and PCR efficiency (E) was estimated using the equation (1+E) = 10(-1/slope)
]. Pathogen-inoculated vs Mock expression ratio of a gene of interest was calculated from the equation:
Where E denotes PCR efficiency, CT denotes cycle threshold value, ΔCT is the cycle threshold difference between a mock sample and its corresponding pathogen-inoculated sample, gene denotes the gene of interest, NF denotes the normalization factor calculated from the geometric mean of the raw ratios (i.e., (1 + E
, here h denotes a house keeping gene) of the three selected housekeeping genes . Four housekeeping genes (the first four rows of Table 5, including actin) were initially evaluated, based on the microarray data from the pilot experiment. The best three (marked [HK] in Table 5) were selected using the gene-stability measure and ranking method .