A multi-treatment experimental system to examine photosynthetic differentiation in the maize leaf

Background The establishment of C4 photosynthesis in maize is associated with differential accumulation of gene transcripts and proteins between bundle sheath and mesophyll photosynthetic cell types. We have physically separated photosynthetic cell types in the leaf blade to characterize differences in gene expression by microarray analysis. Additional control treatments were used to account for transcriptional changes induced by cell preparation treatments. To analyse these data, we have developed a statistical model to compare gene expression values derived from multiple, partially confounded, treatment groups. Results Differential gene expression in the leaves of wild-type maize seedlings was characterized using the latest release of a maize long-oligonucleotide microarray produced by the Maize Array Project consortium. The complete data set is available through the project web site. Data is also available at the NCBI GEO website, series record GSE3890. Data was analysed with and without consideration of cell preparation associated stress. Conclusion Empirical comparison of the two analyses suggested that consideration of stress helped to reduce the false identification of stress responsive transcripts as cell-type enriched. Using our model including a stress term, we identified 8% of features as differentially expressed between bundle sheath and mesophyll cell types under control of false discovery rate of 5%. An estimate of the overall proportion of differentially accumulating transcripts (1-π0) suggested that as many as 18% of the genes may be differentially expressed between B and M. The analytical model presented here is generally applicable to gene expression data and demonstrates the use of statistical elimination of confounding effects such as stress in the context of microarray analysis. We discuss the implications of the high degree of differential transcript accumulation observed with regard to both the establishment and engineering of the C4 syndrome.


Background
Photosynthesis in the majority of plants occurs in a single photosynthetic cell type (C 3 photosynthesis) [1]. Within the chloroplasts, the enzyme ribulose-1, 5-bisphosphate carboxylase/oxygenase (Rubisco) fixes atmospheric carbon by addition of CO 2 and water to the five-carbon sugar ribulose-1, 5-bisphosphate (RuBP). Rubisco will also catalyze the oxidation of RuBP in a process known as photorespiration that does not fix carbon [2]. The reduction in efficiency associated with photorespiration and the energetic costs of recycling its products has been estimated to limit the performance of C 3 photosynthesis by as much as 30% in hot arid conditions [3]. A number of taxa utilize a two-step carbon fixation process, known as C 4 photosynthesis, to limit the impact of photorespiration upon photosynthetic performance [4]. Plants that utilize C4 photosynthesis appear to be at a particular fitness advantage under conditions of limited water availability, high temperature and high irradiance light [5]. Interestingly, some of the most promising grasses for biofuel production are C4 grasses, including Miscanthus × giganteus (Giant Miscanthus), Panicum virgatum (switchgrass), Zea mays (maize), Sorghum bicolor (sorghum) and Saccharum officinarum (sugarcane).
In C 4 plants, Rubisco accumulation is spatially restricted to CO 2 -rich sites within the leaf so that the carboxylase reaction is favoured over photorespiration. In maize, Rubisco accumulation is restricted to thick-walled bundle sheath (B) cells that surround the leaf veins ( Figure 1A). Carbon is initially fixed in adjacent mesophyll (M) cells and subsequently transported, by a multi-enzyme carbon shuttle, into the B, where decarboxylation elevates local CO 2 levels and generates an environment for efficient Rubisco function ( Figure 1B).
Cell-type specific differences in morphology and physiology are fundamental to C 4 photosynthesis [1,6]. Detailed analysis of B and M differentiation in maize has shown that Rubisco, enzymes of a C 4 carbon shuttle and components of the light-harvesting machinery accumulate to different levels in B and M cells [7]. B cell chloroplasts are predominately agranal and do not accumulate key components of the water oxidizing complex of photosystem II (PSII) [8,9]. Consequently, a number of processes requiring chemical reduction, including portions of Calvin cycle [10,11], synthesis of antioxidants [12] and nitrogen assimilation [13] are localized to the M cells. Despite detailed understanding of certain metabolic pathways utilized in C 4 photosynthesis, the molecular mechanisms governing cell differentiation and the full extent of metabolic partitioning are still to be fully characterized. Promoter fusion, methylation assays and transient expression studies have identified a number of cis acting elements in the promoter sequences of C 4 -related genes [14][15][16][17].
Much less is known about trans acting factors that may drive the C 4 differentiation process [18,19]. Genetic approaches have resulted in the isolation of maize mutants characterized by B cell-specific defects, but these mutants have not directly identified regulators of cell-specific development [20][21][22].
Many biochemical and molecular studies of C 4 photosynthetic cell types have made use of techniques for isolation of separated cells. Typically, B cells have been isolated as vascular strands by mechanical disruption and M cells isolated as protoplasts by enzymatic digestion [23,24].  Therefore, different isolation protocols complicate the identification of differences between the two cell types. This is especially true when comparing the accumulation of RNA transcripts because changes can occur rapidly in response to the stresses of protoplast preparation [25]. When small numbers of genes have been analyzed, additional treatments have been used to control for the effect of these stresses (e.g. [22]). In contrast, previous multigene profiling studies of C 4 cell types have not accounted for such effects in the initial analysis [26][27][28]. Following preliminary two-sample microarray experiments, we were especially concerned with the problem of mistakenly identifying stress-induced transcripts as M enriched (Sawers et al., unpublished observations).
In order to control for a stress effect associated with M cell isolation, total leaf and stressed total leaf samples were included in analysis. In general, microarray experiments are based on paired comparisons [29]. Multiple paired comparisons may be linked, as in a time course, or may be cross-referenced, as in a clustering analysis [29]. In the case of the isolated C 4 cell types, the situation is somewhat different in that cell type and effects of the separation protocol are partially confounded between treatments. To formally describe this relationship, we have developed an analytical model to describe C 4 gene expression and to allow elimination of the stress effect resulting from protoplast isolation. Using this approach, we have identified 1,280 features in the Maize Array Project oligonucleotide array that are predicted to be B or M enriched. We also present an analysis of the same data without consideration of the stress effect. Comparison of these two analyses demonstrates the importance of considering the stress effect and the application of a statistical modelling approach to control confounding factors in microarray experiments.

Results and discussion
Experimental design and data collection B strands (T B ) and M protoplasts (T M ) were isolated from 10 day-old W22 inbred maize seedlings by mechanical disruption and enzymatic digestion, as previously described [30] (Figure 2A). Isolation of M protoplasts required a 3 h enzymatic incubation that was not performed during isolation of B strands. To control for transcriptional differences arising from these different treatments, additional total leaf (T T ) and total leaf stress (T S ) samples were isolated. Both T T and T S samples contain a combination of B and M cells. Leaves for the T T sample were harvested and cut into thin strips as for the T B sample and then frozen directly in liquid nitrogen rather than subjected to mechanical disruption. Leaves for the T S sample were harvested and cut as for the T M sample, subjected to a 3 h mock enzymatic digestion and then frozen. An interwoven loop design [31] was used to directly com-pare all combinations of the four treatment groups (T B , T M , T T , T S ) on two-label microarrays ( Figure 2B). To maximize biological replication, a unique set of seedlings was used for each T B , T M , T T and T S replicate. The resulting design used 24 independent RNA samples analysed over 12 array sets.
Samples were analysed by hybridisation to a long-oligonucleotide microarray produced by the Maize Array Project consortium [32]. Maize sequences for design of this chip were predominantly selected from the The Institute for Genomic Research (TIGR) maize gene index [33]. To manufacture this microarray, approximately 50,000 sequences were positively orientated and identified as suitable for oligonucleotide design. This initial sequence set was supplemented with further EST sequences, organellar sequences, repeat sequences and community requests to provide a final design data set of 57,441 features. Single 70-mer oligonucleotides were designed for each sequence in the design set and printed over two glass slides.
Array detection ( Figure 2C) and image analysis were performed as described in Methods. Images and intensity data may be accessed in a MIAME compliant form at the Maize Oligonucleotide Array Project website [34]. Data are also available in the NCBI GEO database, series record GSE3890. The data are presented in full on both the Maize Array Project and NCBI websites.
Feature intensity values were log-transformed and corrected for local background signal and a LOWESS procedure [35] was used to normalize between channels for each slide (see Methods). On the basis of the T T treatment (which contains both B and M), features with either low or saturating signal intensity were discarded (see Methods). A stringent filtering of low expression values was used to reduce the dimensionality of the data set in light of the complexity of the experimental design. High expression filtering was less stringent to avoid elimination of previously characterized, high abundance, C 4 'marker' transcripts. Following filtering, 15,988 unique features were considered for subsequent analysis (Additional file 1). We considered this reduced data set appropriate for model development while maintaining sufficient and meaningful biological information to allow general conclusions on the extent of differentiation between B and M cell types.

Construction of a model to include a stress term in the analysis of the B and M data set
The experimental design required the analysis of four, partially confounded treatment groups; i.e. the T T treatment contains both B and M cell types found in T B and T M treatments, while the T M treatment combines the effects of cell-specificity and the stress effect that is also seen in T S . In the following discussion, the term 'stress' is used to refer specifically to the effect of protoplast isolation on transcript levels. Importantly, although the aim of the experiment is to compare expression levels between B and M cell types, the level of gene expression within the intact M could never be directly measured because of the stress effect of M cell isolation. A model was constructed in order to formally describe this situation and to allow statistical elimination of the stress effect.
Let V 1f and V 2f represent the log 2 transformed expression level of a given feature (f) in B and M cell types, respectively. The aim is to estimate (V 1f -V 2f ) on a feature-by-feature basis. We use c to label the leaf sample, whether B cell prep (c = 0), M cell prep (c = 1) or total leaf sample (c = 2). The parameter j indicates the presence (j = 1) or absence (j = 0) of stress. For any given feature f, we fit the normalized signal Y fcjr with the model, where a c is the proportion of M cells in the sample, S f represents the effect of stress on gene expression, r represents the replicate number from 1 through 6 and f represents the feature number from 1 to F, the number of features in the array. In this model, the effect of preparative stress is assumed to act additively and uniformly in both cell types. Below, the performance of the model will be assessed with respect to analysis in which the stress effect is not considered.
If we assume there is no contamination of the other cell type for the single cell treatments (T B or T M ), the values of a 0 and a 1 (the proportion of M in the sample) are set to be 0 and 1, respectively. In practice, two types of cellular contamination might be recognized. First, a proportion of contaminating M cells will be present in the B prep and of B cells in the M prep. The level of this cross-contamination was estimated at below 5% as determined by semi-quantitative PCR using known markers for B (RbcS and ChlMe) and M cell identity (Pepc and Mdh1) (data not shown). The level of cross-contamination was considered to be sufficiently low as to be ignored, thereby simplifying the model by elimination of V 1f and V 2f terms in expressions describing M and B, respectively. The values of a 0 and a 1 could be adjusted to allow consideration of such contamination if desired. The presence of additional leaf cell types constitutes a second source of cellular contamination, perhaps most notably the inclusion of epidermal and vascular cells. For simplicity and economy, we do not consider this second source of contamination in our model. We estimated the value of a 2 (the proportion of M cells in T T and T S preparations), by examination of leaf sections and by marker gene expression, at between 0.7 and Experimental design Figure 2 Experimental design. A. B stands and M protoplasts were isolated by mechanical disruption and enzymatic digestion, respectively. Representative chloroplasts are marked with an arrow. B. Interwoven loop design of microarray labelling. Four treatments (B, M, S, T) were compared using two-dye microarrays. Each arrow represents hybridization. RNA isolated from the treatment at the tail of the arrow was used to synthesize Cy3-labelled cDNA, and RNA isolated from the treatment at the head of the arrow was used to synthesize Cy5-labelled cDNA. Independent biological replicates were used for each labelling, giving a total of six per treatment. Numbers on arrows refer to 1. C. Pseudo-colour overlay of Cy3 and Cy5 images from a representative hybridization. A single 26 × 26 features sub-grid is shown. The complete University of Arizona oligonucleotide array consists of 96 such grids arrayed over two slides. 0.5 (data not shown). For the analysis presented here, the value of a 2 was set at 0.5. The model (1) can thus be simplified as, where μ TBf , μ TMf , μ TTf and μ TSf represent expected expression levels indexed as appropriate to the four treatments T B , T M , T T and T S for feature f. Since not all treatments are observed within the same array, the spot-specific array effect is corrected as a random effect to the model (1) and estimated by REML [36]. The normalized values (see Methods) for each feature, corresponding to the 12 hybridizations, are listed in Additional file 1.
Two versions of the analysis were performed. First, an analysis as described above was used to generate estimates of (V 1f -V 2f ) and S f for each feature (referred to as the stress model). Second, we repeated the analysis with the stress effect ignored (referred to as the simple model). A second set of estimates of (V 1f -V 2f ) were calculated and compared with those obtained from the stress model. For both models, the deviation of each (V 1f -V 2f ) estimate from 0 was investigated using a t-statistic. The q-value procedure of Storey et al. [37] was used to control the false discovery rate (FDR). The estimates of (V 1f -V 2f ) and S f for the stress model are listed in Additional file 2 together with their test statistics. The estimates of (V 1f -V 2f ) obtained from the simple model are listed in Additional file 3.

Comparison of analyses using the stress and simple models
Lists of differentially accumulating features identified by the stress and simple models were compared. Under FDR control at the 5% level with q-values ≤ 0.05, the stress model identified 1,280 differentially accumulating unique features. At the same level of control, the simple model identified 4,384 unique features. 1,043 features were common to the gene lists obtained from the two models. Therefore, the simple model identified the majority of features identified by the stress model (approximately 80%). This is shown graphically in Figure 3A. For the features identified by the stress model, the q value obtained from the stress model (q stress ) is plotted against the q value obtained from the simple model (q simple ). Features are coloured red and blue for predicted M and B enrichment, respectively. The threshold values of q = 0.05 are shown by dashed lines. Among the features identified by the stress model, but not the simple model, were three annotated as components of PSII (MZ00023434, MZ00044083, MZ00040590) (shown in yellow in Figure  3A). Previous analyses have shown that PSII components predominantly accumulate in the M cells of the maize leaf [8,9]. The failure of the simple model to identify these PSII transcripts is consistent with reduction in RNA levels associated with M cell preparation. Under the stress model, accumulation of all three of these RNAs was predicted to be reduced by stress.
Although the simple model identified the majority of features identified by the stress model, it also identified many more features in total than the stress model ( Figure  3B). Preliminary microarray analysis of B and M cell types using a simple paired comparison had suggested that one confounding effect of preparation stress might be the mis- Empirical comparison of simple and stress models demonstrated the requirement for the control of the stress effect and the applicability of our stress model to this problem and is, therefore, the model we have chosen in the analysis of differential gene expression between B and M cells. Below, we describe the use of the stress model to identify potentially differentially accumulating transcripts and briefly discuss the resulting gene list.

Statistical identification of B and M enriched transcripts using the stress model
The estimate (V 1f -V 2f ) was calculated for all 15,988 features passing data filtering. The mean estimate of (V 1f -V 2f ) was 0.02 suggesting that the estimates were distributed around a value close to 0, i.e. equivalent expression in B and M cell types. Analysis of the distribution of resulting p-values allowed estimation of the proportion of nondifferentially expressed genes (π 0 ). Using the method of Storey et al., [37], π 0 was calculated to be 0.823, corresponding to 2,830 differentially expressed features from the 15,988 in the analysis. Under FDR control at the 5% level, the model identified 1,280 candidate features.

Annotation of features identified as differentially expressed under FDR control
The majority of features present in the Maize Array Project are designed to EST contigs present in the TIGR maize sequence database [40]. An annotation of the features, based on annotation of the TIGR sequences at the time of design, is available at the Maize Array Project website [41]. It is important to note that a complete maize genome sequence is not yet available and that current gene models may well change as additional data become available. Annotation of oligonucleotide probes is further complicated by difficulty in predicting cross-hybridization between related genes [42,43]. This is especially relevant to a highly polymorphic species such as maize that possesses a highly duplicated genome [44]. Given these caveats, approximately 50% of the identified features were fully or partially annotated in the Maize Array Project database. Of the remainder, approximately 15% were annotated as encoding genes related to hypothetical or unknown Arabidopsis or rice proteins, 10% were annotated by similarity to Arabidopsis or rice genomic regions and 25% were not annotated. To further investigate the number of unique genes represented by the 1,280 features, a BLAST search was used to identify maize sequences in the NCBI non-redundant database homologous to the oligonucleotides. For each feature, the bestmatched sequence was recorded. Under the search criteria used 1,173 matches were recovered. Of these, 899 unique sequences were represented. In addition, TIGR rice gene model matches were obtained for 792 of the features from the Maize Array Project database. These 792 rice matches represent 730 independent gene models. This estimate is in line with the original design criteria of the microarray used [33]. Annotations are included in Additional file 2.

Validation of candidate gene list using prior knowledge
Differential gene expression in B and M cells has been the subject of extensive prior investigation [18]. The wealth of previous studies provides a collection of well-characterized marker transcripts that can be used in the preliminary validation of the candidate gene list. In total, we identified The enzymes of the C 4 carbon shuttle are abundant and cell-type specific ( Figure 1B). Previous studies have characterised accumulation of their transcripts by RNA gelblot analysis [45], in situ hybridization [46], real-time PCR [27] and differential screening [26]. Features corresponding to genes encoding the carbon shuttle enzymes, phosphoenolpyruvate carboxylase (PEPC), malate dehydrogenase (MDH), NADP-dependent malic enzyme (NADP-ME) and pyruvate orthophosphate dikinase (PPDK) activities are shown in Figure 4. Although considered markers of C 4 cell identity, the carbon shuttle enzymes are encoded by members of small gene families that contain both C 3 and C 4 isoforms. For example, C 4specific malic enzyme is hypothesized to have arisen following the acquisition of a plastid transit peptide sequence by a gene encoding a cytosolic isoform, followed by duplication and divergence of C 3 and C 4 forms [47]. Consequently, maize contains at least three NADP-ME loci [47]. The gene ZmChlMe1 encodes a leaf-specific, plastid-targeted isoform required for decarboxylation of malate in B cells [47,48] (Figure 1B, reaction 3) while two, nearly identical, Me2 genes (ZmChlMe2a and ZmChlMe2b) have been identified that encode cytosolic isoforms [47]. At the nucleotide level, ZmChlMe2a and ZmChlMe2b are 99% identical to each other and 87% identical to ZmChlMe1 [47]. A further NADP-ME activity has been characterized from roots and found to be 99% and 98% identical to ZmChlMe2a and ZmChlMe2b, respectively [49]. Gene duplications of this type could pose a serious difficulty in oligonucleotide analysis if features do not discriminate between paralogous gene copies. In the case of the carbon shuttle enzymes, the C 4 isoforms are typically more abundant than C 3 isoforms [50] and the accumulation patterns we observed suggest that the signal obtained in our experiment corresponds to C 4 cell-specific transcripts.

Comparison of analysis with and without consideration of the stress effect
A somewhat different situation is illustrated by the multisubunit Rubisco holoenzyme. Here, an enzyme that is abundant in all photosynthetic cell types in the C 3 plant is restricted to the B in maize, although the biochemical role of the protein remains unchanged. Rubisco consists of a number of large (LSU) and small (SSU) subunits. LSU is encoded by the chloroplast gene rbcL [51], while SSU is encoded by a family of nuclear RbcS genes [52][53][54][55]. In maize, both LSU and SSU are restricted to the B cells of mature leaf tissue by regulation of transcript accumulation [54][55][56][57]. In this experiment, three features were identified corresponding to RbcS and each showed the expected B enrichment. Two additional Calvin cycle enzymes, carbonic anhydrase (CA) and phosphoribulokinase (PRK) were also represented by multiple features on the array and displayed the predicted B-enriched patterns of expression (Figure 4).
Previous gene profiling of leaf cell-types in maize [26,58] and sorghum [28] have identified a number of metallothionein (MT) genes that are expressed preferentially in B cells. MTs are a family of small, metal-ion binding proteins that are found in many taxa and are hypothesized to play important roles in metal tolerance and homeostasis [59]. Consistent with these studies, we identified a number of B-enriched features corresponding to MT-like proteins (Figure 3). There are three annotated MT genes in maize, designated ZmMtl1 [60], ZmMtl2 [61] and ZmMtl3 [62]. Although the features we identified showed homology to these genes, the majority were most similar to a non-characterized EST sequence (Gen Bank: CF023010) that we tentatively annotate as ZmMtl4. The role of MT proteins in the B is not immediately apparent. However, the analysis of Nakazono and colleagues [58] suggests an involvement in the functioning of the vasculature.
Proteomic analysis of maize B and M chloroplasts has identified β-glucosidase as among the most strongly Benriched proteins [63]. Immunocytochemical studies have also demonstrated B enrichment of the major plastidic isoform of β-glucosidase in maize leaves [64,65]. We found six B-enriched features corresponding to β-glucosidase in our candidate gene list. β-glucosidase is proposed to function in plant defence by conversion of hydroxamic acid glucosides to toxic benzoxazolinones [66]. In addition, β-glucosidase activity has been implicated in cytokinin signalling [67]. Although providing a further marker for B identity, the significance of the localization of the enzyme in maize leaves remains unresolved.
In summary, approximately 8% (1,280 of 15,988) of the features analysed were identified as accumulating differentially between B and M when FDR is controlled at 5%. Approximately 50% of these features were fully or partially annotated by the maize array project database. Searching of maize sequences databases suggested that that these features represent at least 899 unique genes. An estimate of the overall predicted proportion of differentially accumulating features (1-π 0 ) suggests that as many as 18% of features may be differentially expressed. Approximately 80 features identified in our gene list correspond to previously characterized marker genes and provide convincing evidence of the validity of our data set and analysis. Below we consider the significance of differential expression in the establishment and potential engineering of the C 4 syndrome.

General comments on the differentiation of B and M cell types
It is estimated that the C 4 syndrome has been derived at least 45 times in 19 families of angiosperms [5] and that a small number of regulatory changes are sufficient to establish a functional C 4 type [68]. By contrast, our observations suggest that differentiation of B and M cell-types in maize involves the cell-specific regulation of many thousands of genes. A number of these differences likely pre-date the acquisition C 4 photosynthesis. Ancestrally, the M would have been the major site of photosynthesis while the B would have had, and presumably retains, functions associated with proximity to the vascular tissue [69]. Indeed, C 3 isoforms of certain C 4 -related enzymes show specialised patterns of accumulation in the leaves of C 3 plants [70]. Additionally, a number of C 4 cell-specific promoters have been shown to be functional when introduced into C 3 species [71][72][73][74][75][76]. These observations suggest that spatial information distinguishing B and M also predates the shift to C 4 .
Although it is likely there were ancestral differences between B and M, our data and previous studies suggest that the establishment of the C 4 syndrome resulted in many additional changes to the accumulation of transcripts within these cell types. We distinguish two modes of regulation that might establish these differences (shown graphically in Figure 5). First, modification of cisacting elements and the recruitment of transcription factors may directly change patterns of gene expression (genes A and B in Figure 5). Studies of C 4 gene regulation in maize have successfully identified such elements associated with a number of genes [16,17,19,77,47]. Although it has been assumed that regulatory changes of this type drive the establishment of the C 4 state, it appears unlikely that novel regulatory elements could be recruited on the scale required to explain the number of differentially expressed genes we have observed. As a second mechanism, we suggest that pre-existing regulatory mechanisms established in the C 3 state respond in a cell-specific manner to the creation of novel environments in the C 4 leaf (genes C and D in Figure 5). The B and M cells of a C 4 leaf differ in their complement of protein complexes, concentration of sugars, the redox poise of the photosynthetic electron transport chain and the availability of reducing equivalents [78]. More broadly, we suggest that a small number of changes in gene expression, when superim-posed on ancestral cellular differences, would further induce secondary changes in transcript accumulation and thereby generate the complex pattern we have observed. It should be noted that such a model does not suggest the presence of regulatory 'master-switches' but rather the rebalancing of a complex system in response to key alterations. We have recently initiated transcriptional profiling of maize mutants with defects in phytochrome signalling [79], Calvin cycle function [22] and tetrapyrrole biosynthesis [80,81] to assess the effect of disrupting cellular conditions on cell-specific patterns of gene expression.

Engineering C 4 photosynthesis
The ability to express C 4 -associated transcripts in C 3 plants has stimulated interest in the molecular engineering of a C 4 state in C 3 plants to achieve high photosynthetic performance and water and nitrogen use efficiencies [82,83]. Unfortunately, the physiological effects of over-expressing individual carbon shuttle proteins in C 3 plants have been limited and difficult to interpret [82]. The next steps in C 4 engineering will require that multiple enzymes be expressed together. To achieve this, knowledge of the mechanisms that coordinate the distribution of activities in the C 4 leaf will be essential. In addition, an appreciation of the extent of de novo regulation required for the establishment of the C 4 state will aid the selection of targets for manipulation. Many of the most promising attempts to transfer individual C 4 activities to a C 3 plant have been achieved by transfer of maize genes to rice [71][72][73][74]. Similarly, the applicability of information regarding genome wide regulatory events will likely be greatest between closely related species. In this regard, comparative genomic and proteomic studies of maize and rice offer perhaps the greatest potential for understanding the establishment of the C 4 syndrome.

Conclusion
Differential gene expression was examined in separated B and M cell types from maize leaf blade tissue. To control for stress effects generated during the isolation process we developed a model that includes a stress term and compare the results of the analysis with a model lacking the stress term. These results suggest that gene expression changes are induced during the M cell isolation process and that this confounding effect can be reduced using the stress model. Our analysis indicates that 8% of features detected on the maize long-oligonucleotide microarray produced by the Maize Array Project consortium and up to 18% of genes expressed in the leaf transcriptome are differentially expressed between B and M cell types.

Plant material and growth conditions
Wild-type W22 inbred maize seedlings were grown in a growth chamber at 28°C, 500 μmol m -2 s -1 light, 16 h days, 8 h nights. Light was provided by a combination of 400-W metal halide and 100-W halogen lamps. Ten-day old seedlings were harvested 2 h after the start of the light period for bundle sheath and mesophyll preparation.

Preparation of bundle sheath strands and mesophyll protoplasts
Bundle sheath strands and mesophyll protoplasts were prepared as previously described [22,84]. Approximately 5 g of tissue were harvested from the second and third leaves of 10-day-old maize seedlings for each mesophyll preparation. Approximately 4 g of tissue were harvested from the second and third leaves for each bundle sheath preparation.

Preparation of RNA
RNA was prepared as previously described [85]. DNAse treatment was performed using amplification grade DNAse I (Invitrogen, Carlsbad, CA) in the presence of RNase OUT RNase Inhibitor (Invitrogen). Following treatment, RNA was extracted first with phenol:chloroform:IAA (24:1:1) and then with chloroform:IAA (24:1). Following extraction, RNA was ethanol precipitated, washed twice in 70% ethanol and re-suspended in DEPCtreated dH 2 O.

Microarray detection
Microarray detection was performed using the Genisphere 3DNA 900 MPX two-stage labelling kit (Genisphere, Hatfield, PA). 8 μg of DNAse-treated total RNA were used per labelling reaction and resulting cDNA products split between a two-slide set covering a total footprint area of 3000 mm 2 . cDNA synthesis was primed using oligo dT and random primers according to the manufacturer's protocol. Maize oligonucleotide microarrays were obtained from the Maize Array Project (University of Arizona) as described [32]. Arrays were imaged using the Scan Array 5000 system (Perkin Elmer, Wellesley, MA). Intermediate laser gain (60-70%) was used to detect a majority of features while minimizing the problem of signal saturation from highly expressed genes.

Preliminary data processing and background correction
Preliminary segmentation and data extraction were performed using Imagene software (Biodiscovery, El Segundo, CA). For each feature, median signal (SMD) and background (BMD) values were extracted. To correct for background noise, the difference between the log values of SMD and BMD was calculated (Corrected intensity = log 2 (SMD) -log 2 (BMD)) [86]. Corrected data sets were examined graphically by plotting the difference between Cy5 and Cy3 (M = corrected intensity Cy5 -corrected intensity Cy3) against the average intensity of Cy5 and Cy3 signals [A = (corrected intensity Cy5 + corrected intensity Cy3)/2] for each slide. Figure 5 Mechanisms of differential regulation. The C 4 -specific regulation of four hypothetical genes and their protein products is illustrated. During the establishment of the C 4 syndrome, novel regulatory mechanisms that were absent in the C 3 state control the accumulation of proteins A and B. In the case of A, a protein that accumulates in all photosynthetic cell types in the C 3 state is restricted to a specific cell type, by a trans acting factor X, without any change to biochemical function. Gene B, expressed at low levels, accumulates in a specific cell type. Proteins C and D also accumulate differentially in the C 4 state, but, in these instances, differential accumulation is the result of a novel cellular environment. In the case of C, protein accumulation is regulated post-translationally at the level of assembly or stability through interaction with A. In the C 4 state and the absence of A, product C fails to accumulate. Such a scenario does not require differential accumulation of C transcripts, but could affect transcription of other genes. Transcription of gene D is linked, either directly of indirectly, to the presence of a metabolic signal Y. The accumulation of Y is governed by the activities of proteins A and B. In the C 4 state, changes in the accumulation of A and B affect the levels of Y and subsequently alter the accumulation of D. Boxes represent genes and spheres represent gene products.

LOWESS normalization and data filtering
A LOWESS procedure [35] was used to normalize signal intensity between channels for every slide. The difference (M) and average intensity (A) values were calculated as described above. A LOWESS regression was applied to M against A and the resulting trend-line was used to centralise M values around 0 and to correct for any dependence of M on A. Unreliable or uninformative data were discarded to reduce the dimensionality of the analysis. Data were discarded if intensity measurements were considered either too low (i.e. detection failed) or too high (i.e. signal saturation). Filtering criteria were applied to data obtained from T T hybridizations to retain features showing extreme expression profiles only under certain conditions. The background corrected intensities were averaged across the six T T hybridization data sets. A feature was discarded if this average was less than 1 (i.e. geometric average SMD < 2xBMD). Additionally, a feature was discarded as saturating if 3 or more out of the 6 T T signal intensities read at the maximum. Following data filtering, 47,591 features were discarded from an original 64,896 because of low expression. Of the remainder, a further 178 features were removed because of saturation. Excluding the control spots, we have a total of 15,988 unique features (25% of the original set) for subsequent analysis. The normalized M values for these 15,988 features are provided in Additional file 1.