Genetic variation and expression diversity between grain and sweet sorghum lines
© Jiang et al.; licensee BioMed Central Ltd. 2013
Received: 10 September 2012
Accepted: 9 January 2013
Published: 16 January 2013
Skip to main content
© Jiang et al.; licensee BioMed Central Ltd. 2013
Received: 10 September 2012
Accepted: 9 January 2013
Published: 16 January 2013
Biological scientists have long sought after understanding how genes and their structural/functional changes contribute to morphological diversity. Though both grain (BT×623) and sweet (Keller) sorghum lines originated from the same species Sorghum bicolor L., they exhibit obvious phenotypic variations. However, the genome re-sequencing data revealed that they exhibited limited functional diversity in their encoding genes in a genome-wide level. The result raises the question how the obvious morphological variations between grain and sweet sorghum occurred in a relatively short evolutionary or domesticated period.
We implemented an integrative approach by using computational and experimental analyses to provide a detail insight into phenotypic, genetic variation and expression diversity between BT×623 and Keller lines. We have investigated genome-wide expression divergence between BT×623 and Keller under normal and sucrose treatment. Through the data analysis, we detected more than 3,000 differentially expressed genes between these two varieties. Such expression divergence was partially contributed by differential cis-regulatory elements or DNA methylation, which was genetically determined by functionally divergent genes between these two varieties. Both tandem and segmental duplication played important roles in the genome evolution and expression divergence.
Substantial differences in gene expression patterns between these two varieties have been observed. Such an expression divergence is genetically determined by the divergence in genome level.
Grain sorghum is the fifth most important cereal crop, providing food, feed and fiber for the world. Sweet sorghum has been evaluated as a viable feed stock for bio-ethanol production due to its high biomass yield and sugar content. Both grain and sweet sorghum lines originated from the same species “Sorghum bicolor L.”. These lines were found in sorghum landraces but modern sorghum cultivars were domesticated through breeding programs. In addition, they exhibit considerable differences in their phenotype. The grain sorghum genome AT×623 was first sequenced by methylation filtration technology . On the other hand, the BT×623 genome was then completely sequenced by the short-gun sequencing technique and total of 36,338 loci were annotated with protein-coding transcripts . Recently, Zheng et al. (2011) have re-sequenced two sweet sorghum and one additional grain sorghum genomes and identified a large numbers of SNPs (Single Nucleotide Polymorphisms), Indels (Insertion and Deletion), PAVs (presence/absence variations) and CNVs (copy number variations) . However, our detailed analysis revealed that the differentiation in gene functions between the grain (BT×623) and sweet (Keller) sorghum lines might not directly contribute to their phenotypic divergence (see below), which raises the question how the obvious morphological variations between these sorghum lines occurred in a relatively short evolutionary or domesticated period.
Accumulated data demonstrated that expression divergence correlated with phenotype variations and manipulations of appropriate gene expression are sufficient for recreating phenotypic differences [4, 5]. Expression profiling is one of the most important tools for dissecting biological functions of genes. However, no commercial microarray chips are available for sorghum to analyze the expression profiles. The first sorghum cDNA microarray chips were developed to identify differentially expressed genes under various treatments [6, 7] but were not commercially available. The chips were used for detecting the expression of only 12,982 unique genes. Calviño et al. (2008; 2009) identified 154 differentially expressed genes between grain and sweet sorghum lines by using an Affymetrix sugarcane genechip [8, 9]. However, only conserved sorghum genes could be detectable using the sugarcane chip. To our knowledge, no other microarray-based genome-wide expression analysis has been carried out in sorghum species. In this study, we first designed a sorghum custom microarray chip based on the genome annotation and available expressed sequence tag (EST) datasets. The chip comprises of 41,905 probes, representing 35,465 annotated loci and 6,440 sorghum ESTs, which have not been mapped on the annotated loci. Subsequently we analyzed and compared the transcription profiles between the grain and sweet sorghum lines. Such analysis of our data revealed around 30,000 expressed genes in both lines. These two lines showed difference in their transcriptomes with considerable numbers of variety-specifically or differentially expressed genes and they also exhibited expression difference in response to sucrose treatment.
Another raised question is about the molecular basis of expression divergence. Similar to other genomes, gene duplication and expansion were also observed in the sorghum genome . Expression divergence of duplicated/expanded genes is a subject of great interest to geneticists and evolutionary biologists because it may contribute to the retention and functional divergence of duplicated/expanded genes . Meagher (2010) proposed the hypothesis that epitype and associated phenotypes evolved by gene duplication, divergence, and subfunctionalization . To investigate the contribution of gene duplication and expansion to expression divergence, we have identified tandemly or segmentally duplicated as well as expanded sorghum genes by mobile elements in a genome-wide level and subsequently investigated their expression divergence under normal growth conditions and sucrose treatments. Our data showed that higher expression divergence was observed in segmentally duplicated genes when compared to the tandemly duplicated genes. These duplicated genes in the grain sorghum experienced higher ratio of expression divergence when compared with those in the sweet sorghum. Limited expression divergence was observed for those genes expanded by mobile elements between BT×623 and Keller.
Finally, although expression divergence has been observed in several closely related species [12–14], little is known about the mechanisms underlying this divergence. To figure out the mechanisms underlying these expression variations among orthologous or paralogous genes, we further analyzed the regulatory motifs of their promoter regions. Our data showed that SNPs or structural variations (SVs) have significantly contributed to the expression divergence. In addition, DNA methylation may also play an important role in the species divergence through expression regulation of genes.
Since there are obvious phenotypic variations between these two lines, we expect a significant difference in their genotypes. The re-sequencing results revealed up to 85,041 SNPs, 16,781 Indels and 1,847 SVs in all annotated gene regions, accounting for 20%, 34% and 27% of total SNPs, Indels and SVs, respectively . These variations cover 14,782 genes for SNPs, 7,977 genes for Indels and 2,071 genes for SVs. However, most of these variations may not affect functions of these genes. We have detected only 254, 251 and 79 genes suffered from SNP, Indel and SV, respectively, which may encode truncated proteins due to premature stops in Keller genes (Figure 1d). We subjected the remaining genes to the Ka/Ks analysis (where Ka = nonsynonymous substitutions per site, and Ks = synonymous substitutions per site) and C-value test (see methods). We have detected 563, 287 and 69 genes from SNP, Indel and SV, respectively, with functional divergence (Figure 1d). However, for structural variations, up to 237 genes were not investigated in their substitution rates since they were from deletion or copy number variation. Totally, 817, 538 and 385 genes have been detected with functional or protein divergence between BT×623 and Keller varidue to SNP, Indel and SV, respectively. Some of these genes have been undergone more than one type of variations. For example, 58 genes were detected with variations from SNP, Indel and SV (Figure 1e). Thus, a total of 1,332 genes were identified with functional or protein divergence between BT×623 and Keller. Further analysis showed that higher percentage of genes with Ka/Ks > 1 (24%) were annotated with low confidence while a total of 5197 genes were annotated in this class (20%) in the genome-wide level (Figure 1f). Furthermore, only 61% and 56% of these genes showed expression in our microarray analysis in BT×623 and Keller, respectively (Figure 1g). Among the expressed genes, up to 57% in BT×623 and 58% in Keller were identified with expression abundance less than 50 (Figure 1g). All these data suggested that in our analysis some of these genes with functional or protein divergence might have evolved into pseudogenes. Therefore, although significant difference in their visible phenotype between grain and sweet sorghum, relatively less divergence was observed in their encoded gene level.
To explore why both BT×623 and Keller show the obvious difference in their brix degree, we carried out the comparative expression analysis under sucrose treatment. In BT×623, only 580 down-regulated genes were identified whereas up to 1,173 genes were detected with up-regulated expression patterns (Figure 3e). However, in Keller, total of 820 genes were down-regulated and only 344 genes were regarded as up-regulation in their expression (Figure 3e). Thus, our data revealed that the number of up-regulated genes in BT×623 was at least twice as many as the down-regulated genes; and in Keller, contradictory situation was observed, that was, up-regulated genes were only half of the down-regulated genes. For both down- and up-regulated genes, considerable numbers of genes were commonly regulated by sucrose treatment in both the grain and sweet sorghum (Figure 3f). Besides the commonly down-regulated genes, up to 447 genes were down-regulated only in Keller and 207 genes were only in BT×623. On the contrary, apart from the commonly up-regulated genes, majority of up-regulated genes were detected only in BT×623 and only 58 up-regulated genes were presented in Keller. These data revealed the obvious difference between the grain and sweet sorghum varieties in their sucrose regulation pathways.
Among the differentially expressed genes under sucrose treatment, we were interested in those genes encoding transcription factors (TFs) or carbohydrate metabolism related genes. Among around 2,000 genes encoding transcription factors, we have identified 88 down-regulated and 38 up-regulated genes (Additional file 1a and b). They are from different families of TFs and both grain and sweet sorghum showed the difference in the sucrose regulation of TFs. Among around 300 carbohydrate metabolism related genes, only 10 of them were down-regulated and 11 were up-regulated (Additional file 1c and d). These results showed that most of these genes were not regulated by sucrose treatment although they were involved in sucrose metabolism.
Since genes with either functional/protein (Figure 1d and e) or expression (Figure 4a) divergence may contribute to phenotypic variations (Figure 1a to c), we were interested in figuring out the difference in their functional annotation. Hence, we investigated Gene Ontology (GO) terms and identified overrepresented GO terms (Figure 4c-e). For each term, we identified GO-slim terms in three categories: molecular function (F), biological process (B), and cellular component (C) . Our primary motivation was to evaluate whether these genes are biased toward particular functions. Our data showed that overrepresented BT×623-specific genes mainly functioned in apoptosis/cell death related biological functions or as helicase (pink column in Figure 4c) in molecular function. Keller-specific genes might play roles in reproductive cellular process and post-embryonic morphogenesis or with monooxygenase activity, heme and tetrapyrrole binding (blue column in Figure 4c). Interestingly, differentially expressed genes with at least twice higher in BT×623 also showed functions in apoptosis/cell death related biological processes (red column in Figure 4d). However, for the genes with twice higher expression in Keller, they showed overrepresented biological functions in multiple GO terms (black column in Figure 4d). Similarly, multiple GO terms have been identified with overrepresented biological functions or molecular functions for differentially expressed genes under sucrose treatment (Figure 4e). Some overlapping GO categories have been observed between sucrose regulated genes and these genes with expression only in Keller (Figure 4c and e). These categories include reproductive cellular process with item No. 6 for biological function and monooxygenase activity, heme and tetrapyrrole binding with No. 8, 9 and 10 for molecular function (blue numbers in Figure 4c and e). In addition, we also analyzed the genes with functional or protein divergence as indicated in Figure 1d. Similarly, multiple GO categories were detected with over-representation in both biological and molecular functions (Figure 4f). Among them, interestingly, genes with apoptosis/cell death related biological functions (as shown in Figure 4c and d) were also detected with over-representation (red numbers in Figure 4f).
Since up to 69.6% of promoters showed polymorphism between BT×623 and Keller, we also investigated how these variations contributed to the expression divergence. All the promoter sequences from differentially expressed genes were achieved from both BT×623 and Keller genomes and were then submitted to motif searches (see Methods). Over-represented motifs were identified according to their frequency presented in BT×623 and Keller (Additional files 2 and 3). For the genes expressed only in BT×623, one of the over-represented motifs is TATABOX3, which is critical for accurate transcription initiation. In Keller, some of the motifs have been mutated; as a result, no expression was detected. The motif HDZIP2ATATHB2 was over-represented in these genes with specific expression patterns or with higher expression level in BT×623. Matrix attachment regions (MARs) usually resulted in higher expression . Over-represented MARTBOX motif may provide an evidence to explain why expression level in BT×623 is higher than that in Keller in corresponding genes. Both motifs SURE1STPAT21 and SURE2STPAT21 are sucrose responsive elements (SURE) and they are over-represented in the sucrose-regulated genes only in BT×623. For Keller-specific genes, the xylem-specific expression element was over-represented, suggesting that Keller might be different from BT×623 in xylem development. Among the sucrose-regulated genes only in Keller, the motif ABREZMRAB28 was over-represented, which functions in ABA and water-stress responses. This fact may imply the interaction of sucrose metabolism and abiotic stress signaling.
To further investigate how these variations in promoter motifs affect gene expression patterns, two promoters with difference in their motif structures between BT×623 and Keller were selected randomly from differentially expressed genes. Promoter-GFP cassettes were constructed and were then transferred to sorghum shoots for transient expression. One of the examples is based on the analysis of the gene with locus name Sb01g001893. This gene showed significant difference in their expression abundance between BT×623 and Keller by microarray (Figure 5e) and quantitative real time reverse transcription PCR (qRT-PCR, Figure 5f) analysis. Promoter sequence analysis showed that the motif SEF4MOTIFGM7S was absent in BT×623 but present in Keller (Figure 5g). The motif was within an enhancer and has been involved in the regulation of expression abundance [17, 18]. We isolated the SEF4MOTIFGM7S-containing promoter from Keller. This promoter was mutated at the motif sequence from GTTTTTA to GTTTATA. Both promoters were used to drive GFP expression. The transient expression analysis showed that obviously stronger GFP signal were observed in the shoots where GFP was driven by the SEF4MOTIFGM7S-containing promoter (Figure 5h and i). These data suggested that SNP-mediated promoter motif modification might result in expression divergence.
In addition to segmental and tandem duplication, we have also investigated the contribution of both LTR-retrotransposon and CACTA elements to expression divergence within BT×623 and Keller. Our data showed that more than 50% of expanded genes by these mobile elements were under expression divergence (Figure 6f and g), similar to the role of segmental/tandem duplication. However, only 11 and 4 pairs of expanded genes from LTR-retrotransposons and CACTA elements, respectively, showed expression divergence under sucrose treatment (Figure 6f and g). The data suggested the low contribution of mobile elements to sucrose related expression divergence either within BT×623 or Keller.
Among 3436 annotated genes with differential expression between these two lines (Figure 4a), majority of them showed no functional divergence based on Ka/Ks analysis (Figure 4b). They were classified into 6 types of differentially expressed genes (Figures 7 and 8). Interestingly, over-represented genes with expression only in BT×623 or with two times higher of expression level in BT×623 showed similar Gene Ontology: programmed cell death (Figure 4c and d). This result might imply that not only genome variation but also expression divergence contribute to the difference in development of tracheary elements. Among differentially expressed genes with two times higher in BT×623, genes were much more enriched in their functions related to flavonoid biosynthesis (Figures 4d and 8). In sorghum and other plants, flavonoids play important roles in disease resistance [21, 22]. Both BT×623 and Keller showed obvious difference in disease resistance. For example, anthracnose is one of the main diseases in sorghum. Keller is resistant to the disease  but BT×623 is susceptible . The disease is flavonoid phytoalexin-dependent . Among a total of 36,338 annotated genes, Liu et al. (2010) identified 6 favonoid structural genes encoding flavanone 3-hydroxylase, dihydroflavonol 4-reductase or anthocyanidin synthase . Four of them, Sb03g028880, Sb04g000260, Sb06g031790 and Sb09g003710, showed at least two times higher in their expression abundance in Keller than in BT×623. The result may provide an evidence to explain why Keller showed improved resistance to anthracnose. Thus, our data suggest that genes with two times higher in Keller might play a role in the divergence of disease resistance between BT×623 and Keller (Figure 8). On the other hand, over-represented molecular and biological functions of genes regulated by sucrose only in BT×623 are involved in multiple biological and metabolic processes (Figure 4e). Further study should be carried out to understand their roles in species divergence.
The remaining two sets of differentially expressed genes are those expressed only in Keller and regulated by sucrose only in Keller. Over-represented molecular functions of genes expressed only in Keller are monooxygenase activity, heme binding and tetrapyrrole binding (Figure 4c), which are required for tetrapyrrole biosynthesis. The biosynthesis pathway supplies important molecules for photosynthesis . Thus, these two sets of genes mainly function in photosynthesis and hormone metabolism (Figures 4c, 4e, 8). More genes have been involved in this process in Keller than in BT×623, suggesting the differentiation of photosynthesis system between these two varieties. This result might provide an evidence to explain more sugar accumulation in sweet sorghum. However, differentiation of sugar accumulation between sweet and grain sorghum lines is complex and more components might have been involved in this process. Recently, Calvino et al. (2011) carried out the transcriptome characterization of small RNA component in the grain (BT×623) and sweet (Rio) sorghum stems . Their data revealed that expression divergence of known miRNAs between BT×623 and Rio correlated with sugar content in their F2 population, suggesting a potential role of microRNA in stem sugar accumulation. The genes involved in sugar accumulated are not well characterized in sorghum due to the low heritability of the trait and its quantitative inheritance . Our comparative analysis from re-sequencing data showed that genes with functionally divergent genes between BT×623 and Keller might not be directly involved in sugar accumulation. Interestingly, screening of sorghum genes linked to high sugar content indicated that 80% of differentially expressed genes between sweet and grain sorghum had their orthologs in rice suggesting limited contribution of their gene content for differentiating sorghum sugar accumulation [8, 9]. Thus, our data and others [8, 9, 28] suggested that expression divergence should play important roles in the divergence of sugar accumulation between sweet and grain sorghum lines. In fact, changes of gene expression in eukaryotes often give rise to new phenotypes and the changes were frequently used as a proxy indicator of functional divergence of genes .
Due to very limited genes with functional or protein divergence but with obvious phenotypic diversity between BT×623 and Keller, they provide an excellent and comparable materials to study the relationship between genotype and phenotype. Our re-sequencing data showed that majority of tandem and segmental duplications as well as transposition events of mobile elements occurred beyond the divergence between BT×623 and Keller. For duplication related gene expansion, we have analyzed both segmental and tandem duplications since both BT×623 and Keller showed similar whole genome duplication events. For transposition related gene expansion, we investigated only both LTR retrotransposons and CACTA super family since these two classes account for majority of mobile elements in sorghum . We have investigated functional divergence of coding regions of expanded genes. Our data showed that less than 10% of expanded genes had evolved into new functions within a species (Figure 2c-f) or between BT×623 and Keller. We were wondering how these expanded genes could survive with only 10% of functional divergence. Subsequently, we investigated the expression divergence among these expanded genes. Interestingly, our results showed that around 50% of expanded genes showed expression divergence within BT×623 or Keller (Figure 6), suggesting that their expression divergence is a major mechanism to drive the retention of expanded genes by segmental/tandem duplication or transposition by mobile elements. On the other hand, among total of differentially expressed genes, tandem or segmentally duplicated genes were over-represented (Figure 7). This data suggested that expression divergence of expanded genes also significantly contributed to the divergence of two sorghum lines. Due to the significant contribution of expression divergence to phenotypic variation, we further analyzed the mechanism how these expanded genes were retained by expression divergence. Reasonably, our data showed that motif variations of promoter regions contributed to expression divergence (Figure 5). However, our data showed that expression divergence was also observed in genes with same promoter sequence. In this case, DNA methylation has been proven to play important roles in expression divergence between BT×623 and Keller. In fact, DNA methylation has been observed in sorghum and played a role in tissue-specific expression . Our work demonstrated that differential DNA methylation might play a role in expression divergence not only within a genotype but also between closely related two genotypes. Since we analyzed expression patterns with only several samples under limited stress conditions, more expression divergence should be revealed within BT×623 / Keller or between these two varieties. Thus, higher expression divergence should be detected. However, thus an expression divergence is genetically determined by the divergence in genome level.
Substantial differences in gene expression patterns have been observed between closely related species [12–14]. Mechanisms underlying these differences have not yet fully understood. Generally, gene expression is regulated by various transcription factors. However, sequence divergence at transcription factor-binding sites accounts for only a small fraction of observed expression differences [13, 31]. Our data also showed the limited expression divergence in genes encoding various transcription factors between BT×623 and Keller (Additional file 1a and b). Studies also suggested that chromatin regulators have a key role in generating expression diversity . However, majority of the studies on gene expression divergence were carried out using closely related species (inter-species), little is known about the expression divergence within intra-species. Does a similar mechanism control the expression divergence? Interestingly, our data demonstrated that DNA methylation play important roles in gene expression divergence within intra-species (Figure 5a-d), indicating a similar mechanism to control the expression divergence within intra-species.
Another interesting issue is how the expression divergence leads to differential sugar accumulation between grain and sweet sorghum. Comparative expression analyses have been carried out in multiple species to investigate the differentially expressed genes under certain stress conditions between a pair of genotypes with obvious difference in a specific phenotypic trait. For example, between drought tolerant and sensitive rice cultivars, more drought up-regulated genes were detected in the sensitive than in the tolerant cultivars . More salinity up-regulated genes were also identified in the salinity sensitive lines than in the tolerant lines . Our data show a similar trend. More up-regulated genes by sucrose treatment in BT×623 were detected than that in Keller (Figure 3e and f). Based on the above mentioned data, we propose the hypothesis that higher sugar accumulation in Keller might be due to constitutive over-expression of these genes that function in sugar accumulation and were up-regulated in BT×623.
Sweet and grain sorghum lines have originated from the same subspecies and the former is a natural variant of the latter . How sweet sorghum differs genetically from grain sorghum is not well characterized . Modern sweet sorghum cultivars were developed from naturally variant sweet sorghum by sexual crossing and subsequent selection. What is the mechanism beyond the breeding selection? Plant genomes contain higher percentage of duplicated genes when compared with most of other eukaryotes . Segmental, tandem duplications and transpositions by mobile elements have significantly contributed to such an event [2, 36]. Both duplication and transposition provided basis for genetic variation. Expanded genes might give birth to new genes with divergent functions for better adaptability, competition for species existence. On the other hand, duplicate genes in Arabidopsis also exhibited higher percentages of expression divergence between or within species [37, 38]. Similar results were observed in sorghum in this study. During artificial domestication of sweet sorghum, genetic variation was pyramided to contribute to the higher content of sugar accumulation by breeding program. Previous studies showed that tandem duplication played a role in the adaptive response to environmental stimuli . Our data showed that tandem/segmental duplication played a role in the divergence of certain specific traits such as photosynthesis and disease resistance. As a result, tandem/segmental duplication might contribute to intra-species divergence and cultivar domestication under natural variation and artificial selection.
The application of “next-generation sequencing” technique has greatly increased the speed and output of sequencing works with reduced costs. How to explain the phenotypic difference among closely related species is becoming more and more interesting since their genome sequencing data are more easily available. Our data showed that although both grain and sweet sorghum genomes exhibited considerable differences in their genome sequences, only limited divergence occurred in their functional genes. Thus, majority of phenotypic differences between these two varieties might not be directly due to the divergence of these functional genes. However, we have detected more than 3,000 differentially expressed genes between these two varieties. Such expression divergence was resulted from mutations in expression regulatory sequences and DNA methylation, which was genetically determined by functionally divergent genes between these two genomes. Further investigation showed that both tandem and segmental duplication played important roles in the genome evolution and expression divergence. Recently, Hollister et al. (2011) reported the contribution of transposable elements to gene expression divergence between Arabidopsis thaliana and Arabidopsis lyrata. However, in sorghum, limited contribution of mobile elements to expression divergence has been observed, suggesting the difference in expression regulation between Arabidopsis and sorghum.
Both grain and sweet sorghum (Sorghum bicolor L. Moench cultivars BT×623 and Keller were used for all experiments. They were planted in greenhouse and were grown under natural light and temperature conditions in Singapore. The detailed light conditions were listed in Additional file 4. The other climate conditions including temperature profiling were attached in Additional file 5. We have carried out 4 independent biological repeats. We planted sorghum lines on 5th January 2010, 12th January 2010, 5th January 2011 and 12th January 2011, respectively. They were planted in pots (18 cm depth, 25 cm top and 17 cm bottom diameters). Only one individual was planted in each pot. Soil used was from Singapore Far East Flora company (http://www.fareastflora.com). Brix degree was measured using a portable refractometer (N1 model, ATAGO, Japan). Since brix degrees are different in internodes, juice from the whole stalk was used for the test. In each biological repeat, total of 20 stalks were harvested at the stage of physiological maturity and were then immediately used for extracting juice followed by Brix degree test. Cryostat-microtome (CM3050S, Leica) was used for preparing slides for observing vascular system. The two-week old seedlings were collected and washed using fresh water with minimum root damage. They were then subjected to 5% of sucrose solution for treatment. Samples were then collected in 0, 2, and 6 h intervals, respectively for microarray analysis as shown below.
Custom 60-mer oligonucleotide probes were designed using the publicly available eArray software (Agilent Technologies). In sorghum, total of 36,338 coding regions were annotated (http://www.phytozome.net/sorghum), from which 35,577 probes were designed and the remaining around 1,000 genes were not suitable for probe design due to the sequence homology to other genes. We have also collected total of 209,835 ESTs from the NCBI EST database (http://www.ncbi.nlm.nih.gov/dbEST/). We used these sequences for BLAST searches against all annotated genes and we found 6,400 non-abundant ESTs with no homology to these annotated genes. All these ESTs were also used for probe design. Therefore, we have designed total of 41,977 probes for microarray analysis (Additional file 6). These probe sequences were then submitted for manufacturing on 4×44k format of chips. The expression analysis was carried out using 14-day old whole seedlings from both grain and sweet sorghum lines under sucrose treatment (Additional file 7a). Two biological replicates were carried out for both control and sucrose treatments, resulting in a dataset of 12 microarrays. Total RNA quality was analyzed by nanodrop reading and Agilent Bioanalyzer running (Additional file 7b). The data quality was assessed by measuring the correlation coefficients (Additional file 7c). These analyses suggested the high quality of the data obtained in this study. On the other hand, we have also validated our data by quantitative real-time reverse transcription polymerase chain reaction (qRT-PCR) (Additional file 7d).
GO assignments for sorghum genes were obtained from the DOE Joint Genome Institute database (http://www.jgi.doe.gov/). Three top GO categories including biological processes, molecular functions and cellular components  were analyzed. Gene Set Enrichment Analysis (GSEA, ) was used to determine over-represented genes and their GO categories. To test if tandem/segmental duplication or transposition by mobile elements significantly contributes to expression divergence between BT×623 and Keller or within these species, the observed percentage of each type of duplicates with expression divergence (y1) among total duplicates from this type (n1) was compared with the expected percentage by statistic analysis. The expected percentage was estimated from the number of each type of differentially expressed genes (y2) among total annotated genes (n2). The u value was calculated using the formula as shown below and the value was converted to P value, which was used to estimate the statistic significance.
Microarray expression data were confirmed by qRT-PCR. Total of 48 genes were randomly selected for the analysis. All gene-specific primers were designed by Applied Biosystems Primer Express® software. The amplification of a SbUBQ5 gene was used as an internal control to normalize the data. All primer sequences were listed in Additional file 8. The qRT-PCR analyses were carried out using the AB power SYBR Green PCR Master mix kit (Applied Biosystems) according to the manufacturer’s protocol. The threshold cycle (CT) value was automatically calculated based on the changes in fluorescence of SYBR Green I dye in every cycle monitored by the ABI 7900 system software. The mRNA relative amount was used to evaluate gene expression level as 2– ΔΔCT. Here, ΔCT = CT target – CT reference and ΔΔCT = ΔCT test sample – ΔCT calibrator sample. The ΔCT values were used for t-test, which will yield an estimation of ΔΔCT.
Promoter sequences 1Kb upstream of start codon of each gene were retrieved from both BT×623 and Keller genomes. Promoter motifs were detected by submitting these promoter sequences to the PLACE database . A small program was designed to investigate the frequency of a promoter motif. Over-represented motifs were detected by comparing percentages via u test. For detection of DNA methylation, genomic DNA samples from BT×623 and Keller were subjected to sodium bisulphite treatment using the EpiXplore Methyl detection Kit (Clontech) following the procedures recommended by the manufacture. The Methyl Primer Express® Software v1.0 (Applied Biosystems, USA) was used for the design of bisulphite primers as listed in Additional file 8. PCR products were cloned into pGEM-T Easy vector for sequencing.
An around 1Kb of sorghum promoter from the ATG start codon was amplified from the Keller genomic DNA by PCR amplification using the primers as listed in Additional file 8. After verification by sequencing, the fragment was cloned in front of GFP reporter gene, and then subcloned into pCAMBIA1300 Ti-derived binary vector. The new binary vector was named as Keller_motif_GFP vector. The site-directed mutagenesis PCR was carried out using another set of primers as listed in Additional file 8 to produce the motif presented in the BT×623 genome. The mutated vector was named as BT×_motif_GFP vector. Both vectors were then introduced into both BT×623 and Keller shoot cells by the Biolistic PDS-1000/He particle delivery system (Bio-Rad). GFP activity and intensity were visualized and analyzed under a confocal microscope (Zeiss, Germany).
The Keller genome re-sequencing data were downloaded from the sweet and grain sorghums database (http://gigadb.org/sweet-and-grain-sorghums/). The sequences and annotation of the sorghum genome BT×623 were achieved from the sorghum genome database (http://www.phytozome.net/sorghum).
Total of 84 families of transcription factors  were selected for genome-wide identification. The GRASSIUS database was also used for the identification of sorghum transcription factors . For identification of genes encoding carbohydrate metabolism related enzymes, metabolism pathways were identified according to the description in the KEGG database ( ; http://www.genome.jp/kegg/pathway.html). For each family of transcription factors or carbohydrate enzymes, conserved motifs or domains were retrieved using the GenomeNet Database Resources (http://www.genome.jp/). These motif or domain sequences from seed members were obtained from the Pfam database (http://pfam.sanger.ac.uk/). Their seed amino acid sequences were used to construct Hidden Markov Model (HMM) profiles using HMMER 2.3.2 (http://hmmer.janelia.org/). Using the profile HMMs, we scanned the sorghum annotated protein database and to search for all putative transcription factors and carbohydrate metabolism related enzymes. These members were confirmed by further motif and domain analysis. All identified genes encoding transcription factors and carbohydrate metabolism related enzymes were listed in Additional files 9 and 10, respectively.
Tandemly duplicated sorghum genes were determined using annotated sorghum proteins downloaded from the Phytozome sorghum genome database (http://www.phytozome.net/sorghum). All proteins were screened in an all-versus-all BLAST searches using BLOSUM62 matrix with an E value of less than 0.01. Pairs of matching peptides were identified with 70% or higher sequence identity and with a minimum of 30% of the query length. Pairs of matching proteins were clustered into families using a transitive closure algorithm (if A = B and B = C, then A = C). Tandemly duplicated genes were scored when two genes from the same family were located on the same chromosome, and were separated by a maximum 10 unrelated genes. The information for sorghum gene segmental duplication was obtained from the plant genome duplication database (http://chibba.pgml.uga.edu/duplication/). Both tandemly and segmentally duplicated genes were listed in Additional files 11 and 12, respectively.
We executed the LTR_Finder program  to genome-widely identify full-length sorghum LTR retrotransposons. The result was presented in Additional file 13. To identify CACTA DNA transposon elements, we first carried out BLASTN searches using both terminal inverted repeats (TIRs) and subterminal repeats (TRs) of the elements. The TIRs and TRs were from multiple species [47–49]. These searches generated two sets of BLAST hits (E ≤ 1e-5) including 5’ and 3’ terminal regions. Full-length CACTA elements were identified by matching two terminal regions with less than 30,000 bp in length. We also built a HMM profile using HMMER 2.3.2 with default values. Seed TIRs and TRs were selected from multiple species. Using the profile HMMs, we scanned the sorghum genome sequences and looked for hits that were separated by at least 200 bp and that were no more than 30,000 bp in the proper orientation. We then manually inspected these putative CACTA elements to remove any remaining artifacts by comparing their target site duplication/TIR sequences. The identified CACTA elements were presented in Additional file 14. These genes fully or partially located within a mobile element were regarded as LTR/CACTA-related genes and were listed in Additional files 13 and 14, respectively.
For Ka and Ks estimation, amino acid sequences were aligned and were subsequently transferred to the original cDNA sequences using the PAL2NAL program . Pairs were selected while a minimum 70% of queried protein coding regions were aligned with an E value threshold at 10-8. Both Ka and Ks values were then estimated using the yn00 program of the PAML4b package . The Ka/Ks ratios were also used for evaluating the functional divergence by the C value test .
All expression data have been deposited into the NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo) under the serial number GSE36689.
Expressed sequence tag
Hidden Markov Model
Nonsynonymous substitutions per site
Synonymous substitutions per site
Long terminated direct repeat
Matrix attachment regions
Most recent common ancestor
Quantitive real time reverse transcription PCR
Single nucleotide polymorphism
Sucrose responsive elements
We would like to thank Kunde Ramamoorthy Govindarajan and Natascha May for their kind help in bioinformatics related works.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.