MADS-box gene family in rice: genome-wide identification, organization and expression profiling during reproductive development and stress

Background MADS-box transcription factors, besides being involved in floral organ specification, have also been implicated in several aspects of plant growth and development. In recent years, there have been reports on genomic localization, protein motif structure, phylogenetic relationships, gene structure and expression of the entire MADS-box family in the model plant system, Arabidopsis. Though there have been some studies in rice as well, an analysis of the complete MADS-box family along with a comprehensive expression profiling was still awaited after the completion of rice genome sequencing. Furthermore, owing to the role of MADS-box family in flower development, an analysis involving structure, expression and functional aspects of MADS-box genes in rice and Arabidopsis was required to understand the role of this gene family in reproductive development. Results A genome-wide molecular characterization and microarray-based expression profiling of the genes encoding MADS-box transcription factor family in rice is presented. Using a thorough annotation exercise, 75 MADS-box genes have been identified in rice and categorized into MIKCc, MIKC*, Mα, Mβ and Mγ groups based on phylogeny. Chromosomal localization of these genes reveals that 16 MADS-box genes, mostly MIKCc-type, are located within the duplicated segments of the rice genome, whereas most of the M-type genes, 20 in all, seem to have resulted from tandem duplications. Nine members belonging to the Mβ group, which was considered absent in monocots, have also been identified. The expression profiles of all the MADS-box genes have been analyzed under 11 temporal stages of panicle and seed development, three abiotic stress conditions, along with three stages of vegetative development. Transcripts for 31 genes accumulate preferentially in the reproductive phase, of which, 12 genes are specifically expressed in seeds, and six genes show expression specific to panicle development. Differential expression of seven genes under stress conditions is also evident. An attempt has been made to gain insight into plausible functions of rice MADS-box genes by collating the expression data of functionally validated genes in rice and Arabidopsis. Conclusion Only a limited number of MADS genes have been functionally validated in rice. A comprehensive annotation and transcriptome profiling undertaken in this investigation adds to our understanding of the involvement of MADS-box family genes during reproductive development and stress in rice and also provides the basis for selection of candidate genes for functional validation studies.


Background
The MADS-box family members, identified initially as floral homeotic genes, are one of the most extensively studied transcription factor genes in plants [1][2][3][4][5][6][7][8]. The word MADS finds its origin from the first letters of its founding members, Mini Chromosome Maintenance 1 (MCM1) of yeast (Saccharomyces cerevisiae) [9], Agamous (AG) of Arabidopsis (Arabidopsis thaliana) [10], Deficiens (DEF) of snapdragon (Antirrhinum majus) [11] and Serum Response Factor (SRF) of humans (Homo sapiens) [12]. MADS-box transcription factors are characterized by the presence of an approximately 60 amino acids DNA binding domain, known as the MADS-box domain, located in the N-terminal region of the protein. The plant-specific MIKC-type MADS-box proteins include three additional domains followed by the MADS domain, viz. a less-conserved Intervening region of ~30 amino acids, a moderately conserved Keratin-like domain of ~70 amino acids mainly involved in heterodimerization, and a highly variable C-terminal region of variable length implicated in transcriptional activation and higher-order complex formation [13][14][15].
The MADS-box family has been divided into two main groups. The type I consists of ARG80/SRF-like genes of animals and fungi, also designated as M-type genes in plants, and type II contains MEF2-like genes of animals and yeast as well as MIKC-type genes of plants. It is proposed that an ancestral duplication before the divergence of plants and animals gave rise to these groups [16]. The MIKC-type genes are also characterized by the presence of K domain that could have evolved after the divergence of these lineages. The type II genes have been categorized into MIKC c -and MIKC*-type based on structural features [17,18]. The MIKC c genes have been further classified into 14 clades based on phylogeny [19,20]. Type I genes have also been categorized into M-and N-type based on the protein motifs identified using the MEME search tool [21] and also as Mα, Mβ, Mγ and Mδ, based on the phylogenetic relationships between MADS-box regions [6]. The Mδ group, however, corresponds to the MIKC* class described in this report and elsewhere [22].
The most striking feature of the MADS-box gene family is the diverse functions taken up by its members in different aspects of plant growth and development. These include flowering time control, meristem identity, floral organ identity, formation of dehiscence zone, fruit ripening, embryo development as well as development of vegetative organs such as root and leaf [23][24][25][26][27]. Genome-wide identification and phylogenetic analyses of MADS-box genes have revealed 107 and 71 (only 65 of these are listed in The Institute for Genomic Research (TIGR) Rice Pseudomolecule release 4 database) genes in Arabidopsis and rice, respectively [6,28].
Though a large amount of expression data based on SAGE, microarrays and other high-throughput transcriptome analysis techniques is available in public databases, the studies involving expression of the entire MADS-box family have so far been restricted to northern blot analysis or reverse transcriptase PCR at limited stages of development [6]. Recently, the comparison of expression profiles resulting from a 22 k rice cDNA microarray-based transcriptome analysis of early panicle development in rice was used to implicate three MADS-box genes, OsMADS1, 14 and 15, in panicle branching [29]. The use of highthroughput genome-wide transcriptome analysis provides an insight into changes in the entire transcriptome across a variety of biological conditions. In combination with the whole genome sequence data and comparative expression analysis with genes of known functions, the transcriptomic data can become an initiation point for systematic investigations into structure-function relationships.
With an overall objective to understand regulation of reproductive organ development in indica rice, we have initiated a program on microarray-based expression profiling of transcription factors and signal transduction components. Here, we report a comprehensive account of identification and phylogenetic analysis of 75 members of MADS-box gene family in rice and their expression profiling during 11 stages of panicle and seed development along with three abiotic stress conditions and 3 stages of vegetative development. This analysis is based on TIGR Rice Pseudomolecule release 4 and KOME (Knowledgebased Oryza Molecular biological Encyclopedia) rice fulllength cDNA database. We have identified 10 new members belonging to this gene family besides confirming 65 previously identified genes. Out of 71 previously identified genes by Nam and coworkers [28], six were not found in version 4 of TIGR. Our analysis also suggests the existence of Mβ-type genes in rice, which was earlier thought to be absent in monocots [6]. The results of expression profiling have been discussed in light of phylogenetic relatedness of the genes and their known functions in rice as well as other systems.

Identification, organization and structure of MADS-box genes
HMM analysis and name search resulted in the identification of 75 MADS-box genes in rice genome. Since, gene names from OsMADS1 to OsMADS58 representing 34 genes already existed in the literature (though not in continuation), newly identified genes were named from OsMADS59 to OsMADS99 (see Table 1). The individual genes were localized on chromosomes based on the 5' and 3' coordinates for respective gene models in TIGR database ( Figure 1). The maximum number of genes (16;  21%) were found to be localized on chromosome 1, whereas, chromosomes 10 and 11 had only one MADSbox gene each. Out of five types of MADS-box genes, the Mγ genes were confined to chromosome 1, 3 and 4, while Mβ genes were present only on chromosome 1. No chromosomal bias was observed in the distribution of MIKC c , MIKC* and Mα genes ( Figure 1). Analysis of the TIGR rice segmental duplication database revealed 30 MADS-box genes within the duplicated segments of rice chromosomes. Only 16 genes, however, were found to have their counterparts on duplicated segments ( Figure 1). Most of the duplicated genes belonged to the MIKC c group. Several MADS-box genes, especially M-type, were also found juxtaposed on chromosomes 1, 4, 5, 6, 8 and 12. Twenty such genes showed significant sequence identity.
Similar to that reported in Arabidopsis, distribution of introns in rice MADS-box family genes was also found bimodal with MIKC c and MIKC* genes containing multiple introns and the Mα, Mβ and Mγ genes usually having no or occasionally up to 4 introns (see Table 1; [6]). The length of MADS-box proteins varied from 150 to 300 amino acids, with few exceptionally longer or smaller proteins (Table 1). For details on other parameters of nucleic acid and protein sequences, refer to Table 1.

Evolutionary relationships between rice and Arabidopsis MADS-box family genes
To examine the evolutionary relationships of MADS-box genes in rice (including the 10 new genes identified in this study) and Arabidopsis, a tree was constructed using only the conserved MADS-box domain. Five groups, as described by Parenicova and coworkers [6] were identified containing representative genes of both rice and Arabidop-sis. All the Arabidopsis proteins were found to lie in groups similar to those identified previously [6], except AGL47 and AGL82, which instead of forming a basal branch of the Mβ, grouped with Mγ proteins in our analysis as shown in supplementary figure S1 [see Additional file 1]. OsMADS64 grouped separately with AGL33 of Arabidopsis, which does not cluster with any of the MADS groups described above [see Additional file 1].
A separate phylogenetic tree was also generated from complete protein sequences of all the MADS-box genes in rice and Arabidopsis ( Figure 2). Of the 75 rice MADS-box genes, 38 grouped with MIKC c , six with MIKC*, nine with Mβ, 13 with Mα and 10 grouped with Mγ-type Arabidopsis genes. In case of M-type genes, barring OsMADS90, 91 and 96, all other genes exhibited similar groupings as in the MADS-domain-specific phylogenetic analysis ( Figure  2). MIKC c proteins were further divided into 14 clades. Representatives of both rice and Arabidopsis could be identified in all the clades except OsMADS32 and the FLC clade, which are exclusive to rice and Arabidopsis, respectively ( Figure 2

Distribution of conserved motifs
The MEME motif search tool was employed to identify the conserved motifs present in MADS-box proteins ( Figure  3 (75) and Arabidopsis (98), showing similar groups in both the plant species as given by Parenicova and coworkers [6]. Total of 14 clades formed by MIKC c -type genes are also marked. Scale bar represents 0.1 amino acid substitution per site.

Expression profiling of MADS-box genes during vegetative and reproductive development and stress
For expression analysis, the rice panicle and seed development stages were divided into 6 and 5 broad categories, respectively, based on landmark developmental events as described by Itoh and coworkers [30]; information available at oryzabase [31] and our preliminary histochemical analysis (data not shown; Table 2). Seedlings subjected to three stress conditions, viz. desiccation, cold and salt stress, were also included in this analysis. Transcriptome profiling of these stages along with mature leaf, root and 7-day-old seedlings was carried out by using GeneChip ® Rice Genome Arrays. The raw data from 51 chips representing three biological replicates each from 17 samples was normalized by Gene Chip Robust Multiarray Analysis (GCRMA) algorithm [32]. Since five tandemly duplicated Mγ genes OsMADS81, 82, 83, 85 and 99 showed very high sequence identity (94.6 to 98.6%), the unique probe sets for these genes were not available on the GeneChip ® . Therefore, a cumulative expression profile for these genes is presented (Figure 4). Expression profiles for OsMADS78 and 79, which were also not represented on the chip, were studied using QPCR (quantitative real-time PCR, Figure  4). The primers used in this study are listed in supplementary table S2 [see Additional file 5]. The initial analysis revealed all but one (OsMADS80) gene to be expressing in at least one of the experimental stages analyzed (Figure 4).
Based on expression profiles during panicle, seed and vegetative development, MADS-box genes were classified into eight groups. Figure 5 shows mean expression profiles for each of the groups. Group I consisted of 12 genes (OsMADS2, 14, 15, 16, 18, 22, 30, 56, 60, 64, 65 and 77) most of which showed high transcript accumulation in all the stages analyzed. However, eight genes (OsMADS1, 4. 5, 6, 7, 8, 17 and 58) of group II showed high expression preferentially in panicle and seed with more than 100 to 500 folds expression in majority of the reproductive tissues in comparison to mature leaf. OsMADS7 and 8 were the most highly expressed genes in panicle and seed stages with more than 1000-fold transcript accumulation in S1 stage of seed development. Group III comprised of the genes OsMADS20, 32, 34 and 72, which showed high expression during early stages of panicle development. The expression declined gradually as the panicles matured. In contrast, the expression of group IV genes (OsMADS3 and 13) increased with the development of the panicles and continued to increase during stages of seed development; the highest expression being 10 and 100 folds for OsMADS3 and 13, respectively. Five genes, OsMADS21, 29, 71, 78 and 79 (group V), expressed preferentially during seed development. The expression of OsMADS21 and 29 was more than 100 folds in S1-S3 stages, whereas, the expression of OsMADS71 increased up to 15 folds in S3 to S5 stages. OsMADS78 and 79 transcripts showed S3-S4 stage-specific accumulation ( Figure  4). Group VI comprised of five genes, OsMADS26, 27, 47, 55 and 57 that expressed predominantly in vegetative tissues. The maximum expression of OsMADS57 was observed in mature leaves, whereas, OsMADS27 showed higher expression in roots as well. Three genes (OsMADS31, 63 and 66) that constituted Group VII showed low levels of expression in most stages of panicle and seed development. The peak expression for these genes was observed either in P6 (OsMADS63), S1/S2 (OsMADS31) or S3 (OsMADS66) stages. The remaining Expression Analysis of MADS-box genes in rice OsMADS79 Eight genes showing discrete expression patterns were selected for validation by QPCR analysis. Figure 6 shows a comparison of the QPCR and microarray analysis. The expression patterns obtained for all eight genes using QPCR were similar to that derived from the microarrays. Moreover, most of the characterized genes showed similar expression patterns to that already described in the literature, which strengthens the reliability of the data obtained by using microarrays [33]Expression levels of four MADSbox genes (OsMADS18, 22, 26 and 27) were up regulated by more than two folds in response to cold and dehydration stress treatments (Figure 7). Three genes (OsMADS2, 30 and 55) showed more than 2-fold down regulation in response to dehydration and salt stress. The fold change values with respect to seedlings are given in supplementary table S5 [see Additional file 8]

Expression profiles of putative orthologs of Arabidopsis MIKC c -type genes in rice
Comparative expression profiles of phylogenetically related MIKC c class genes in rice and Arabidopsis is shown in figure 8. These genes have been considered orthologs in Arabidopsis and rice [34][35][36]. In GLO-like clade, the expression of PI and OsMADS4 was found to be restricted to reproductive tissues, whereas, OsMADS2 showed significant expression in seedlings as well. The expression of another B-class gene, AP3, was also found to be specific to reproductive tissues in Arabidopsis; however its rice ortholog, OsMADS16, also expressed in vegetative tissues besides showing peak expression in reproductive tissues. The expression of AG was found to be restricted to the stages of floral development and initial stages of seed development in Arabidopsis. In rice, its putative orthologs OsMADS3 and OsMADS58 were found to have comparable expression profiles to that of AG, with OsMADS3 showing relatively low level transcript accumulation. In AG clade, the expression profiles of an Arabidopsis D-class gene, AGL11 and its rice counterpart OsMADS13 showing 53% identity at amino acid level were also comparable. SUPPRESSOR OF CONSTANS1 (SOC1) and its putative ortholog OsMADS50 in ricewere found to exhibit low level ubiquitous expression in the stages analyzed. The members of AGL2-like clade, OsMADS7/45 and OsMADS8/24 are duplicated genes with high level of sequence homology to AGL2/SEP2 and AGL14/SEP1, respectively. These genes, along with OsMADS1, 5 and 34 were found to exhibit similar expression patterns as those of SEP genes. In SQUA-like clade, AP1 gene of Arabidopsis and its putative rice orthologs OsMADS14 (RAP1B), Expression patterns of MADS-box genes in rice in vegetative as well as panicle and seed development  QPCR results for selected eight genes and its correlation with microarray data Figure 6 QPCR results for selected eight genes and its correlation with microarray data. Two and three biological replicates have been taken for QPCR and microarray, respectively. Standard error bars have been shown for data obtained using both the techniques. Y-axis represents raw expression values obtained using microarays, QPCR data has been normalized to ease profile matching with that of microarrays. X-axis depicts developmental stages as explained in table 2.

OsMADS1
M L S S D S C S S 5 S 4 S 3 S 2 S 1 P 6 P 5 P 4 P 3 P 2 P 1 S D R

15(RAP1A) and
OsMADS18 show high sequence similarity. In reproductive tissues, the expression profiles of OsMADS14, 15 and 18 were found to be very similar to that of AP1, but unlike AP1, the rice genes also expressed in mature leaves.
In AGL6-like clade, duplicated genes, OsMADS6 and OsMADS17, exhibit similar expression patterns as that of AGL6. Both the genes show more than 50% identity with AGL6 at amino acid level suggesting that these could be putative orthologs of AGL6 in rice. Figure 7 Differential expressions shown by seven MADS-box genes in response to various abiotic stress conditions. Left panel shows four genes up regulated and right panel shows down regulated genes more than 2 folds with p value less than 0.05 in response to three abiotic stress conditions. X-axis represents seedling followed by stress samples (CS, cold stress; DS, dehydration stress; SS, salt stress). Y-axis represents average expression values obtained using microarrays. Error bars represent standard error for data obtained in three biological replicates.

Expression profiles of duplicated genes
Analysis of the TIGR rice segmental duplication database revealed 19 MADS-box genes that were localized on the duplicated segments of the rice genome. A comparison of expression profiles of the duplicated gene pairs, as obtained from microarray data, revealed that except for 3 gene pairs, viz. OsMADS7:8, OsMADS3:58 and OsMADS2:4; expression patterns of other duplicated genes had diverged significantly (Figure 9). Chromosome 1 was found to have four tandemly duplicated genes within a 12 kb region. Two groups of tandemly duplicated genes with three genes each were localized on chromosome 4. Incidentally, this region overlapped with an intra-chromosomal duplicated segment suggesting that these six genes probably evolved from a single ancestral gene by a combination of segmental and tandem duplication events. All four tandemly duplicated MIKC c -type genes showed varied expression patterns. OsMADS13 and 33 as well as OsMADS14 and 34, although fulfilling our selection criteria, were not considered as being tandemly duplicated because of the high level of sequence divergence, which was evident from their placement in different clades of MIKC c -type genes.

Involvement of MADS-box genes in panicle and seed development in rice
For over a decade, investigations leading to the understanding of genetic and molecular basis of floral development in model eudicots, Arabidopsis and Antirrhinum, have revealed involvement of a number of MADS-box genes in specifying floral organ identity [37,38]. Attempts have been made to predict the function of MADS-box genes in diverse species based on sequence similarities [24,39]. However, identification of additional paralogs with very similar sequences and existence of duplicated genes with different expression patterns made it difficult to predict the function based on sequence data alone. Similarity in temporal and spatial expression patterns in combination with the sequence comparisons, however, was found to be a better criterion for establishing orthologous relationships. In this paper, we have presented a comprehensive expression profiling for all the MADS-box genes in rice along with an account of their phylogenetic relationships with the Arabidopsis genes.
Of the 75 genes analyzed in this study, more than 20 were found to exhibit either specific or preferential transcript accumulation during stages of panicle and seed development. Some of these genes have already been characterized as orthologs of Arabidopsis ABCDE class genes, viz.OsMADS14 and 15 are APETALA1 orthologs; OsMADS2 and 4 are PISTILLATA orthologs; OsMADS16 is an APETALA3 ortholog; OsMADS3 and 58 are AGAMOUS orthologs; D class gene, OsMADS13 is putative ortholog of AGL11; OsMADS7 and 8 are orthologous to SEPALLATA2 and 1, respectively [40][41][42][43][44][45][46][47][48]. Most of the putative orthologous genes in rice and Arabidopsis exhibit similar expression patterns (Figure 8). It was, however, observed that the rice counterparts had a general tendency to express in vegetative organs as well, whereas, the expression of Arabidopsis genes was restricted to reproductive tissues. From the expression data and per cent identity, it seems that duplicated genes, OsMADS6 and OsMADS17 (AGL6-like clade) are orthologous to AGL6. Further experimentation would be required to verify if these genes have similar functions as well.
In addition to some of the well characterized genes described above, there are several others, e.g. OsMADS34, 32,20,72,63,98,89,92 and 86, that show specific upregulation in panicles but have not yet been functionally validated in rice. This list also includes MIKC*-and Mtype genes along with MIKC c genes. Most of the functionally validated MADS-box family genes belong to the MIKC c class, while functions of most of the M-type genes are not yet known in any system. Therefore, this study provides a solid base to select genes for functional validation.
In 2003, Parenicova and coworkers showed that 64 of the 109 Arabidopsis MADS-box genes expressed in siliques [6]. Later, by using high-density transcription factor filter arrays, almost all the MADS-box genes were found to express during silique development [49]. These results suggested that besides being involved in the development of flowers, the MADS-box gene family could be involved in the process of seed development as well. In rice, we have also found (with the exception of OsMADS80)that transcripts for almost all the MADS-box genes are expressed in at least one of the seed development stages analyzed. Interestingly, the highest expression values for OsMADS13, 21,23,29,71,75,78 and 79 were observed for seed stages, suggesting that these genes could be involved in development of seeds. Four of these genes belong to type II (MIKC c group), whereas, the remainder are type I (Mα-type). Since, two of the type I (Mγ) genes, AGL80 and PHERES, have previously been implicated in seed development, it might be interesting to investigate the role of other type-I genes showing up-regulation during development of seeds [50,51].

Rice MIKC c genes
The MIKC c genes have been sub-grouped into 13 clades in Arabidopsis. Representatives of all but the FLC clade were also found in rice. Six genes belonging to the FLC clade have been implicated in control of flowering by vernalization and autonomous pathways in Arabidopsis. Since rice does not require vernalization for flowering, this clade has been suggested to be lost in rice [19]. Recently, Zhao and coworkers (2006) have reported a new monocot-specific clade, OsMADS32-like clade, consisting of OsMADS32 of rice and TaAGL14 and 15 of wheat [20]. The expression of TaAGL14 and 15 was detected in most vegetative stages along with inflorescence and seeds. In contrast, the OsMADS32 transcripts were found to be restricted to early stages of panicle and late seed development, suggesting that the OsMADS32-like clade might have evolved to cater for diverse monocot-specific functions. A comparison of phylogenetic relationships and expression profiles between rice and Arabidopsis MADS-box genes suggests that although most of the basic ABCDE functions have been retained in rice, acquisition of new functions and subfunctionalization of existing gene functions is also apparent.

Mβ-like genes are represented in rice
In earlier studies, no gene of rice could be assigned to the Mβ group of M-type MADS-box genes, hence it was suggested that probably Mβ genes have not been retained in the rice genome [6,28]. In this study, we have identified nine new genes that grouped with Arabidopsis Mβ type genes. Although bootstrap values are low, separation of this clade from the rest of M-type genes of rice and the presence of conserved motifs in Arabidopsis Mβ protein and the newly identified group is suggestive of the existence of Mβ group in rice as well.

Duplication seems to have played major role in diversification of MADS-box family of genes
Arabidopsis has been reported to have 107 MADS-box genes [6]. However, in rice that has a genome size almost three times as that of Arabidopsis [52,53], the number of MADS-box genes was found to be only 75. The reason for this could be the variable status of whole genome duplications in Arabidopsis and rice [54,55]. Surprisingly, however, the number of MIKC c -type genes in both Arabidopsis and rice was found to be similar at 39 and 38, respectively. Therefore, the difference in the total number is mainly due to the variation in the number of M-type genes, which are 37 in rice and 68 in Arabidopsis. It seems that duplication events have contributed significantly towards evolution of M-type genes. Our analysis revealed 16 M-type genes, which could have originated because of tandem duplications (Figure 1). Phylogenetic analysis suggests that rice and Arabidopsis Mγ genes probably had a common ancestor and the expansion occurred independently after divergence of monocots and dicots.
MADS-box genes seem to have evolved mainly through gene duplication events followed by neofunctionalization, subfunctionalization or in some cases pseudogenization of the duplicated gene [56]. However, redundancy being one of the fates of duplication is also common in MADS-box family. We found 30 MADS-box genes lying on segmental duplicated regions of rice chromosomes while only 16 were found to have been retained, suggesting that considerable changes may have taken place following segmental duplication leading to loss of some of the genes. Except one, all paralogous gene pairs belong to MIKC c -type of MADS-box family. Expression data show that most of these duplicated genes have divergent expression patterns that may be because they have undergone neofunctionalization or subfunctionalization, though sufficient experimentation is required to prove this hypothesis. Interestingly, three genes, viz. OsMADS6, 17 and 56, lying on duplicated segments of chromosomes 2, 4 and 10, respectively, show collinearity in gene order. On the other hand, OsMADS50 lying on chromosome 3 shows synteny with only one of these genes, i.e. OsMADS17. They may all have resulted due to duplication of a segment on chromosome 4, but thereafter, evolution of these genes may have been quite independent resulting in loss of micro-collinearity between the duplicated regions.

Stress responsive MADS-box genes in rice
MADS-box genes have been shown to be affected by low temperature stress in tomato [57] and by application of hormones like cytokinins, gibberellins [58], ethylene [59] and auxins [60] in other plants. Seven MADS-box genes exhibited differential expression in response to cold, salt and/or desiccation stress in rice. So far, none of these genes has been implicated in stress response. Amongst stress-induced genes, OsMADS18 is a member of AP1/ SQUA group that has been shown to express widely during development with its transcripts accumulating at high levels specifically in meristematic tissues [61]. It has been shown to interact with OsMADS6, 8/24, 7/45, and 47 [45,61] and in our analysis its expression pattern was found to overlap with those of OsMADS6, 8 and 7 in reproductive tissues and with OsMADS47 during vegetative development, suggesting that it might be interacting with different partners during reproductive development and stress. Recently, Tardif and coworkers showed that a large number of genes involved in flower development are associated with abiotic stress responses in wheat [62]. Our preliminary analysis involving transcript profiling during reproductive development and abiotic stress conditions has also revealed approximately 400 genes that are up regulated during panicle/seed development and three stress conditions, viz. cold, salt, and dehydration (unpublished data). It would be, therefore, interesting to undertake specific investigations, which could establish the interactions of biochemical pathways that are activated during reproductive development and stress response.

Conclusion
Contribution of MADS-box gene family in flower organ specification is well documented in eudicots; however, functions of many gene members of this class have not been elucidated in rice. A comparison of phylogenetic relationships and expression profiles between rice and Arabidopsis MADS-box genes suggests that although most of the basic ABCDE functions have been retained in rice, acquisition of new functions and subfunctionalization of existing gene functions is also not uncommon. Furthermore, the role of MADS-box transcription factors in seed development and during stress response also needs to be explored. The new information generated is expected to help in selection of appropriate candidate genes for further functional characterization.

Identification of genes, nomenclature and mapping on chromosomes
Name Search and Hidden Markov Model (HMM) were employed to identify the MADS-box genes from rice genome. MADS-box sequences available for all land plants were downloaded from SWISSPROT and TrEMBL [63] and their HMM profile was generated using HMMER 2.1.1 software package [64,65]. This profile was used to search the complete proteome of rice available in TIGR [66] and KOME [67,68] databases using Basic Local Alignment Search Tool (BLAST; [69]) with filter off. Name search using MADS, SRF, AGAMOUS and AP1 as keywords in these databases helped in identification of more genes, which could not be identified using HMM profile due to the presence of incomplete MADS-box. Redundant sequences were removed by aligning the protein sequences using ClustalX 1.83 [70] and checking their genomic locus in TIGR. Motif scan was performed using SMART [71,72] or National Center for Biotechnology Information Conserved Domain Database (NCBI-CD; [73]) searches with filter off. According to already available nomenclature, 34 MADS-box genes have been named from OsMADS1 to 58. Thus, the newly identified genes were named from OsMADS59 to 99. MADS-box genes were mapped on chromosomes by identifying their chromosomal position given in the TIGR rice database. Information regarding ORF length, amino acids number, molecular weight and isoelectric point of protein was downloaded from TIGR release 4. For OsMADS3 and 20, Gene Runner program version 3.04 was employed to find molecular weight and PI of protein, as it was not available in TIGR.

Phylogenetic analysis
To identify the number of groups formed by rice MADSbox genes in comparison to Arabidopsis, MADS-box domain comprising of 60 amino acids, identified by SMART from all the MADS-box sequences of Arabidopsis and rice were aligned using ClustalX (version 1.83) program. An un-rooted neighbor-joining (NJ; [74]) phylogenetic tree was constructed in ClustalX with default parameters. Separate phylogenetic trees were constructed using complete protein sequence and coding sequences of rice and Arabidopsis MADS-box genes. Bootstrap analysis was performed using 1000 replicates. The trees thus obtained were viewed using TREEVIEW software [75].

Sequence and duplication analysis
To identify the conserved motifs, MEME version 2.2 [76] was employed using following parameters; number of repetitions -any, maximum number of motifs -20, optimum motif width set to ≥ 6 and ≤ 200. The motifs obtained were annotated using SMART and NCBI-CD search program.
Further, MADS-box gene duplications were mapped on segmental duplications database of TIGR with 100 kb as well as 500 kb distance allowed between collinear genes [66]. For finding tandemly duplicated candidates, genes with intergenic distance not more than 20 kb and having fair degree of overall homology between them were selected. Identity among duplicated genes was calculated using DNASTAR MegAlign 4.03 package.

Collection of plant material
Oryza sativa indica var. IR64 tillers spanning all stages of panicle and seed development were collected from field grown rice. Mature leaves were also harvested from same plants. For all the stages, three biological replicates were harvested from independent populations of plants. After harvesting, panicle and seed samples were frozen in liquid nitrogen and stored at -70°C. For stress treatment, rice seeds were surface-sterilized with 0.1% HgCl 2 and soaked in RO (reverse osmosis) water overnight in dark. Next day, the seeds were spread on a meshed float and grown hydroponically at 28 ± 1°C in culture room conditions. After 7 days of growth, the seedlings were transferred to 100 ml beaker for treatment. For salt stress, sodium chloride was used at final concentration of 200 mM for 3 hours. For cold stress, seedlings were kept at 4°C for 3 hours. Desiccation stress was simulated by drying the plants on tissue paper and spreading them on Whatmann 3 mm sheet for 3 hours. The seedlings with their roots kept in water, for 3 hours duration, were used as control.

Microarray experiments
Affymetrix GeneChip ® Rice Genome Arrays representing 49,824 transcripts (48,564 of japonica and 1,260 of indica) have been employed to study the transcriptome profiles of MADS-box genes during reproductive organ development and stress response in rice. Total RNA was isolated from all the tissues, except seeds, using TRIzol method (Invitrogen Inc., USA; [77]). Due to high carbohydrate content, RNA from seed samples was isolated using the method described earlier [78]. After checking the quality on agarose formaldehyde or TAE gels, the RNA samples were quantified using nanodrop (ND-1000 Spectrophotometer). Five micrograms of RNA with 260:280 ratios of 1.9-2.0 and 260:230 ratios more than 2.0 was used for cDNA synthesis. Labeling and hybridizations were carried out according to Affymetrix manual for one-cycle target labeling (Affymetrix, Santa Clara, CA). Hybridization was performed in GeneChip ® Hybridization Oven 640 for 16 hours at 45°C and 60 rpm. GeneChips were washed and stained with streptavidin-phycoerythrin using the fluidics protocol EukGE_WS2V5_450 in Affymetrix fluidic station model 450. Finally, chips were scanned using the Gene-Chip ® Scanner 3000.

Digital expression analysis
The expression data for Arabidopsis, using Affymetrix GeneChip ® ATH1 Genome Array, from stages comparable to those used for rice was obtained from Gene Expression Omnibus (GEO) database at the NCBI under the series accession numbers GSE5620, GSE5621, GSE5623, GSE5624, GSE5629, GSE5630, GSE5631, GSE5632 and GSE5634. Total of 55 CEL files representing 21 stages of development as well as stress treatments were downloaded from [79] and analyzed by using avadis™ microarray data analysis software version 4.2 [80]. The data was normalized using GCRMA followed by log transformation and average calculation. Heat Map was generated for selected genes.

Microarray data analysis
CEL files generated in GeneChip Operating Software (GCOS) were further analyzed using avadis™. Data were normalized using GCRMA algorithm and log transformed. To get the expression values, averages of three biological replicates were used. The expression data for MADS-box genes was extracted by using name search and the gene IDs listed in table 1. Wherever more than one probe set was available for one gene, the probe set designed from 3' end was given preference. Cluster analysis on rows was performed using Euclidean distance metric, and Ward's Linkage rule of Hierarchical clustering. Differential expression analysis was performed taking mature leaf as reference to identify genes expressing more than two folds in panicle and seed, with p values <0.005. Similarly, for identifying stress-induced genes, differential expression analysis was performed with no correction applied and p values less than 0.05. Further K-means clustering was performed to identify the expression patterns shown by genes expressing in panicle and seed. Since 73 genes (69 probe sets) are represented on chip, expression profiles of OsMADS78 and 79 were studied using QPCR. Raw microarray data have been deposited in the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE6893 and GSE6901.

QPCR
Real time PCR reactions were carried out using the same RNA samples, which were used for microarrays as described earlier [81]. In brief, primers were designed for all the genes preferentially from 3' end of the gene using PRIMER EXPRESS version 2.0 (PE Applied Biosystems, USA) with default parameters. Each primer was checked using BLAST tool of NCBI database with filter off for its specificity for respective gene, which was further confirmed by dissociation curve analysis obtained after the PCR reaction. First strand cDNA was synthesized by reverse transcription using 4 μg of total RNA in 100 μl of reaction volume using high-capacity cDNA Archive kit (Applied Biosystems, USA). Diluted cDNA samples were used for Real time PCR analysis with 200 nM of each primer mixed with SYBR Green PCR master as per manufacturer's instructions. The reaction was carried out in 96well optical reaction plates (Applied Biosystems, USA), using ABI Prism 7000 Sequence Detection System and software (PE Applied Biosystems, USA). To normalize the variance among samples, Actin was used as endogenous control. Relative expression values were calculated after normalizing against the maximum expression value. These data were further normalized to ease the profile matching to that obtained from microarrays.

Authors' contributions
RA has done the computational analysis, microarray experiments for mature leaf and panicle stages, performed real-time PCR, analyzed the data and drafted the manuscript. PA has performed microarray for root and seed stages. SR has performed microarray experiments for seedling and stress stages. AKS and VPS had grown the rice plants, provided tissue material, and helped in identification of biological stages for microarray experiments. AKT was involved in planning of experiments, revised the final version of the manuscript and headed the project. SK designed and participated in all the experiments and revised the final text of the manuscript. All authors have read and approved the final manuscript.

Additional material
Additional file 1 Figure S1.