Dynamics of the chili pepper transcriptome during fruit development
© Martínez-López et al.; licensee BioMed Central Ltd. 2014
Received: 18 October 2013
Accepted: 18 February 2014
Published: 21 February 2014
The set of all mRNA molecules present in a cell constitute the transcriptome. The transcriptome varies depending on cell type as well as in response to internal and external stimuli during development. Here we present a study of the changes that occur in the transcriptome of chili pepper fruit during development and ripening.
RNA-Seq was used to obtain transcriptomes of whole Serrano-type chili pepper fruits (Capsicum annuum L.; ‘Tampiqueño 74’) collected at 10, 20, 40 and 60 days after anthesis (DAA). 15,550,468 Illumina MiSeq reads were assembled de novo into 34,066 chili genes. We classified the expression patterns of individual genes as well as genes grouped into Biological Process ontologies and Metabolic Pathway categories using statistical criteria. For the analyses of gene groups we added the weighted expression of individual genes. This method was effective in interpreting general patterns of expression changes and increased the statistical power of the analyses. We also estimated the variation in diversity and specialization of the transcriptome during chili pepper development. Approximately 17% of genes exhibited a significant change of expression in at least one of the intervals sampled. In contrast, significant differences in approximately 63% of the Biological Processes and 80% of the Metabolic Pathways studied were detected in at least one interval. Confirming previous reports, genes related to capsaicinoid and ascorbic acid biosynthesis were significantly upregulated at 20 DAA while those related to carotenoid biosynthesis were highly expressed in the last period of fruit maturation (40–60 DAA). Our RNA-Seq data was validated by examining the expression of nine genes involved in carotenoid biosynthesis by qRT-PCR.
In general, more profound changes in the chili fruit transcriptome were observed in the intervals between 10 to 20 and 40 to 60 DAA. The last interval, between 40 to 60 DAA, included 49% of all significant changes detected, and was characterized predominantly by a global decrease in gene expression. This period signals the end of maturation and the beginning of senescence of chili pepper fruit. The transcriptome at 60 DAA was the most specialized and least diverse of the four states sampled.
The set of all RNA molecules transcribed in an organ or tissue at a particular point of time under a given set of environmental conditions constitute the transcriptome. In contrast to the genome, which remains largely constant during the life of an individual, the transcriptome is highly dynamic. Global patterns of gene expression vary greatly in space (within different cell or tissue types) as well as in time (during development or due to changing environmental conditions). The transcriptome is characterized by both qualitative features, such as a description of which genes are transcribed, as well as quantitative features including the level of expression of each gene. Evaluating the transcriptome, i.e. estimating the level of expression of all genes under particular conditions, is a key step to understand the complex processes that occur during development.
RNA-Seq  is a robust technology to obtain genome-wide estimates of relative gene expression. RNA is purified from a sample of interest and converted to a cDNA library which is sequenced using one of a variety of high-throughput sequencing methods. Sequence fragments are then mapped to a reference genome and the frequency of these alignments is used to estimate the expression of each gene . When a published reference genome is not available, the cDNA reads can be assembled de novo in order to obtain a high quality reference to which the reads can be re-mapped . RNA-Seq experiments demand a careful design including replicates that permit the estimation of statistical error or variation that is not explained by the experimental treatment .
One of the challenges of high-throughput experiments is the analysis of very large datasets in order to extract biologically pertinent knowledge . Conclusions drawn from the expression of a single gene or a small set of related genes may lead to an incomplete understanding of particular phenomena. A current trend in developmental biology research is to interpret changes in gene expression in the context of simultaneous changes in the resulting proteome and the set of metabolites present during each state of development . As mentioned in , “simply comparing genes to themselves have the pitfall of taking molecular information out of context. Numerous scientists have emphasized the need for better context. This can be achieved through holistic measurements of differential connectivity in addition to, or in replacement, of differential expression”. Here we propose to achieve a better understanding of the dynamic changes of the transcriptome by examining the weighted expression of groups of genes related with specific biological processes or metabolic pathways. In addition, we examine the changes in the diversity and specialization of the transcriptome. These analyses create an interpretation framework for the results, which makes it easier to appreciate their meaning.
Chili pepper (Capsicum spp.) is one of the most important horticultural vegetable crops worldwide as well as a good model for the study of secondary metabolism during fruit development. Capsicum species (approximately 30) are members of the Solanaceae family, which also includes other important crops such as tomato (Solanum lycopersicum), potato (Solanum tuberosum), tobacco (Nicotiana tabacum), eggplant (Solanum melongena), and ornamentals like petunia (Petunia spp.). Chili pepper fruits synthesize and accumulate a number of valuable compounds including capsaicinoids (responsible for the characteristic “heat” of chilis), pigments such as anthocyanins and carotenoids as well as vitamins A, B and C [8–10]. Because of the agricultural importance of chili peppers, efforts have been made to study the transcriptome of Capsicum species as a source of basic and applied knowledge. For example, a chili pepper (Capsicum annuum L., cv. Bukang) EST database built from 122,582 sequenced ESTs and 116,412 refined ESTs from 21 cDNA libraries representing 11 different tissues, developmental stages or plants subjected to different stress conditions has been reported  and specific efforts have been made to identify genes involved in the biosynthesis of capsaicinoids [12, 13]. A transcriptome profile of red chili pepper fruits (Capsicum annuum L., TF68) was obtained using 454 GS-FLX pyrosequencing and 33,530 total unigenes were assembled. In this assembly, 30% of aligned reads were assigned to a locus with a specific function annotation, 24% of alignments matched to genes of unknown or unclassified function and 46% could not be aligned to an individual gene. Furthermore, 1,536 single nucleotide polymorphisms (SNPs) and 758 simple sequence repeats (SSRs) were detected that will be useful as molecular markers for linkage mapping and association mapping . Assemblies of two chili pepper transcriptomes from sequences generated by Sanger sequencing (>125,000 ESTs) or the Illumina NGS platform (200 million reads) were carried out to identify SNPs and SSRs as molecular markers useful for breeding or single position polymorphisms (SPPs) for genotyping . Recently, we described a Capsicum transcriptome database generated from a hybrid assembly of a collection of ESTs derived from five Capsicum annuum organs (root, stem, leaf, flower and fruit) sequenced by the Sanger method and from multiple leaf transcriptomes obtained by pyrosequencing. This project yielded almost 60,000 singletons and 32,314 high quality contigs (75.5% contigs with significant sequence similarity to entries in nucleic acid and protein databases, and 23% not previously described for C. annuum) . Transcriptomic studies of chili pepper plants subjected to different stresses have also been reported. For example, 8,525 ESTs were generated and a cDNA microarray analysis identified 613 chili pepper genes responsive to the non-host soybean pustule pathogen Xanthomonas axonopodis pv. glycines. cDNA microarrays were used to study the expression of ozone stress-regulated genes in a sensitive (Capsicum annuum cv. Dabotop) and a resistant chili pepper cultivar (C. annuum cv. Buchon) . Transcriptomic analysis of leaves following infection of chili pepper plants with a geminivirus revealed a total of 309 differentially expressed genes between healthy (non-infected) and symptomatic or recovered tissues . More recently, a global transcriptomic analysis of chili pepper plants treated with different biotic/abiotic stresses was carried out to investigate the participation of signaling components (regulons) in both types of stresses. This study targeted the involvement of salicylic acid in the activation of abiotic stress-responsive genes, methyl jasmonate and ethylene in regulating biotic stress-responsive genes, and abscisic acid in regulating both biotic and abiotic stress-responsive genes .
Here we present an analysis of the changes in the transcriptome during the development and ripening of chili pepper fruit. We quantified the expression of 34,066 chili genes at each of four time points sampled. Considering these transcriptomes at a more global level, we also evaluated their diversity and specialization, as well as the specificity of the genes expressed during chili fruit development. In addition, by categorizing genes into Biological Process ontologies as well as KEGG metabolic pathways in which they participate, we analyzed the behavior of groups of genes involved in these categories. Finally, our RNA-Seq expression data was validated by analyzing the expression of genes involved in carotenoid biosynthesis by qRT-PCR.
Results and discussion
The transcriptome of chili pepper fruit
Estimation of the number of expressed genes and scaled number of mRNA molecules
Number of reads, estimated number of genes and scaled number of mRNA molecules ( ) per sampling point
Estimated number of genes
The number of cDNA reads obtained in our sampling of chili transcriptomes at each stage of development was approximately two million for each sampling point (10, 20, 40 and 60 DAA), with a minimum of approximately 1.8 million and a maximum of approximately 2.3 million (Table 1). These figures are the sum of the number of reads derived from the two biological replicates at each stage. Table 1 also indicates the number of genes showing evidence of expression at each stage of development.
The number of genes for which we obtained evidence of expression at each stage of development ranged from a minimum of 23,987 genes at 60 DAA to 26,346 genes at 20 DAA. The total number of genes whose expression was detected in at least one of the samples was 34,066. This figure likely overestimates the true number of expressed genes because our de novo transcriptome assembly strategy results in isoforms or alleles of transcripts expressed from the same locus that are assembled into distinct contigs. Derived estimates of the true number of genes expressed in each one of the stages of development using Chao’s estimator (column “Chao” in Table 1) had a mean of 30,912 with standard deviation of 1,219. From this we concluded that our sampling depth was insufficient to detect the expression of approximately 6,000 genes, which likely have frequencies of expression lower than 0.5 transcripts per million (TPM), and thus were not detected in our data. Estimates of the number of expressed genes obtained with the estimator proposed by Changjiang et al. resulted in a larger number of genes whose expression was not detected (around 11,700). It is not clear which of the two estimators is more accurate, but our own simulation studies suggest that the Changjiang et al. estimator tends to overestimate the number of undetected genes (unpublished results). The true number of expressed genes is therefore likely to be closer to the number obtained by using Chao’s estimator. Considering that the Arabidopsis genome encodes approximately 27,000 protein coding genes , and tomato and potato (Solanaceae) have 30,855 and 32,988 protein-coding genes supported by RNA sequencing respectively , we can infer that in this study we detected a large proportion of the genes expressed in the chili fruit transcriptome. The last column of Table 1, , indicates estimates of the scaled number of mRNA molecules present at each stage of development, obtained by using Good’s estimator . The estimated values of are very close to the number of reads obtained in the corresponding stages of development (column “Reads”). In all cases the number of cDNA reads obtained represents more than 99% of the estimated , indicating that the sample sizes employed in this study are close to the scaled number of mRNA molecules at each stage of development. Therefore, only genes transcribed at very low frequencies, likely less than 0.5 TPM, were possibly missed by the sampling.
Genes detected per subset
Changes in gene expression during fruit development and ripening
Patterns of expression for genes, biological processes (BP) and metabolic pathways (MP)
The most common pattern of expression was represented by 28,392 genes (83.34%) that exhibited a consistent steady state mode of expression. In other words, these genes did not exhibit a significant change in expression between any of the consecutive time points (pattern 14−S S S). The remaining 5,674 genes (16.66%) exhibited a significant change in expression during at least one transition. The second- and third-most common patterns correspond to 13−S S D, represented by 1,290 genes (3.79%) and 15−S S I by 882 (2.59%) genes, respectively. Of particular interest, these two patterns represented by a total of 2,172 (6.38%) genes, were populated by genes showing significant changes in expression only in the third transition. This result suggests that the developmental period between 40 and 60 DAA was the most dynamic with respect to changes in the transcriptome. The fourth- and fifth-most common patterns were 5−D S S, represented by 537 genes (1.58%) and 23−I S S, represented by 307 (0.90%) genes, respectively. Patterns 5−D S S and 23−I S S were also characterized by genes showing significant changes of expression in only one transition, but in these cases, the relevant developmental period was between 10 and 20 DAA. Taken together, our results indicate that relatively large groups of chili genes showed the tendency to change their level of expression at only a single developmental stage and remain at this level until fruit maturity. The number of genes participating in patterns 13−S S D,15−S S I,5−D S S and 23−I S S includes a total of 3,016 genes (8.85%), which is more than half of the number of genes that exhibited a significant change in expression during at least one transition (5,674). In contrast, consider the patterns populated by genes whose expression changed significantly only in the middle transition (between 20 and 40 DAA). Examples of these patterns include 11−S D S, represented by 224 genes (0.66%) and 17−S I S, represented by 254 genes (0.75%). These patterns were relatively infrequent and populated by a total of 478 genes (1.41%). Patterns of consistent gene expression decrease (1−D D D) or increase (27−I I I) in each subsequent developmental interval were characterized by relatively few genes, 51 (0.15%) and 44 (0.13%), respectively. The least common pattern of gene expression observed was characterized by genes whose expression decreased in the first two transitions but increased in the third (3−D D I, represented by 11 (0.03%) genes). Additional file 2 includes gene identifiers, description, expression level, pattern and Q values for all genes detected in this study.
We also analyzed the global changes in gene expression during chili development by calculating the transition probabilities from one interval to the next. We begin by estimating the frequency of each transition during the first interval from 10 to 20 DAA (Table AF1-4 in Additional file 1). Then we calculated the conditional probabilities of change of state (D, S or I) from one transition to the next, assuming that a gene was selected at random. This allows the interpretation of the dynamic changes occurring during the transitions. For example, if a gene decreased its expression (D) during the interval 10 to 20 DAA, the most likely event is that its expression would remain at steady state S during the interval 20 to 40 DAA, and this probability was estimated to be 0.7339. All transition probabilities for genes whose expression were detected at a steady state level and then changed to a more active state, (from S to I or S to D) were larger for the transitions from 20 - 40 DAA to 40 - 60 DAA than from 10 - 20 DAA to 40 - 60 DAA. For example, the probability of changing from S to D was 0.0129 during the transition from 10 - 20 DAA to 20 - 40 DAA, but was four times more likely (0.0561) during the transition from 20 - 40 DAA to 40 - 60 DAA. This result suggests that the most active period for changes in expression frequencies occurred during the last period sampled, from 40 to 60 DAA.
We examined the expression patterns of genes encoding members of the xyloglucan transglucosylase/hydrolases (XTHs) family as examples of genes known to participate in fruit development and ripening. Genes belonging to the XTHs family where identified in our dataset using the results of the BLAST analysis. This family of enzymes is involved in xyloglucan endotransglucosylation and endohydrolysis and their respective genes are differentially expressed in mature-green and ripe tomato fruits [30, 34]. The expression of the 11 members of the XTHs family exhibited the 7−D I D pattern and a peak of expression at 40 DAA. However, the expression levels of individual members of this family was heterogeneous (see Figure AF1-1 and Table AF-5 in Additional file 1).
Classifying the patterns of expression of genes during chili pepper fruit development and ripening in the discrete space of 27 possible patterns allows for a more detailed analysis of the complex changes in the transcriptome. By considering only the significant differences in expression between sampling points, we can filter some of the noise produced by random differences in expression between biological replicates as well as excluding genes with very low expression in all states of development. On the other hand, pattern classification offers a complementary method to peruse the data, by focusing on genes that share the same patterns of expression. For example, within the subset of genes showing a continuous decrease in expression during development (pattern 1−D D D) are a number of orthologs of the Argonaute family. Argonaute proteins form an evolutionarily conserved family whose members silence gene expression in pathways such as RNA interference (RNAi) . This suggests that the activity of the RNA-induced silencing complex (RISC) decreases during fruit development. Within the subset of genes whose expression increased in every subsequent developmental interval (pattern 27−I I I), we identified a senescence-associated gene, the chili pepper ortholog to Arabidopsis, “AT4G02380, senescence-associated gene 21”. The previous example of the XTH family also shows that examining the expression of related genes can lead to a better understanding of the underlying genetic and metabolic processes. An additional interesting avenue is to study the promoters of genes that share the same pattern of expression, with the aim of identifying specific sequence motifs involved in regulating genes according to these specific patterns . This goal may be attainable when a chili pepper genome sequence becomes available.
Changes in diversity and specialization of the transcriptome
To further examine the global changes in gene expression during chili fruit development, we estimated the diversity and specialization of our transcriptomes as well as the specificity of the genes detected . The diversity of the transcriptome, H, is measured by the application of Shannon’s entropy formula to the set of estimated relative frequencies of expression of the genes. H is sensitive to both the number of expressed genes as well as to their frequencies, yielding higher values when the distribution of the frequencies is flatter. This scenario indicates that a larger content of information is passed from the nucleus to the protein synthesis machinery. We also estimated the specificity of each one of the genes. The specificity is zero if the corresponding gene is equally expressed in all the transcriptomes studied, and reach a maximum value when it is expressed exclusively at a single transcriptome (see  for details). Having the estimates of specificity for the genes we can estimate the specialization of the transcriptome, δ, as the weighted average of the specificity of the genes corresponding to that transcriptome. A larger value of δ indicates that, on average, more genes specific to the evaluated transcriptome are being expressed, i.e., the transcriptome is more “specialized” given that it express more specific genes. Figure 3 presents a plot for the diversity (H) and specialization (δ) of the transcriptomes we obtained during the development and ripening of chili pepper fruit. This figure also includes the number of significant changes in gene expression observed in each interval, discussed above. The diversity of the transcriptome increased from 10 DAA (H=11.95) to 40 DAA where it reaches its maximum value of H=12.14. Diversity then decreased abruptly in the interval from 40 to 60 DAA to its minimum value of H=11.69. Interestingly, there was no appreciable change of slope in the diversity increases between 10 to 20 and 20 to 40 DAA, indicating that the rate of increase in transcriptome diversity was constant from 10 to 40 DAA. This indicates that the point where genes were expressed at the most even frequencies was reached around 40 DAA, and, from that point of development, the transcriptome changed to a distribution in which genes were expressed at more variable frequencies (a less informative transcriptome). Another way to interpret the diversity of the transcriptome is to calculate the number of effective genes, . This parameter represents the number of genes equally expressed required to produce a given diversity value of H. In contrast to H, which is given in logarithmic scale, is given in additive genetic units and is thus easier to interpret. The estimated values of were 3,954, 4,140, 4,507 and 3,295 for 10, 20, 40 and 60 DAA, respectively. If the maximum value of at 40 DAA is considered as 100%, the percentage-transformed values of were 88, 92, 100 and 73% for 10, 20, 40 and 60 DAA respectively. This calculation makes it apparent that the value diminished by 27% from 40 to 60 DAA. This abrupt change in the number of effective expressed genes indicates that the last developmental period examined (from 40 to 60 DAA) was the one characterized by the most profound changes in the transcriptome. This claim is equally supported by considering the number of genes differentially expressed during each transition (presented as arrows in Figure 3), while in intervals 10 to 20 and 20 to 40 DAA we see a more even ratio of around 50% in significant increases over the total of changes, (1,081+1,035)/(2,400+1,862)=2,116/4,262≈0.5, this ratio descends to 1,520/(1,520+2,546)≈0.37 in the last interval from 40 to 60 DAA, indicating a stronger down-regulation of the transcription machinery which results in a less diverse transcriptome at 60 DAA.
The specialization of the transcriptome, (δ), only slightly decreased at 20 DAA compared to 10 DAA, but at later time points it increased until reaching its maximum value at 60 DAA. This indicates that the transcriptome at 60 DAA was, on average, expressing more genes specific to that stage of development than at the other developmental time points. In contrast with the values for the number of genes detected at each time point (Figure 2), the specialization of the transcriptome gives a quantitative measure which responds to slight changes in the relative expression of the genes, allowing a better understanding of the global change experienced by these parameters.
Considering together the results for H and δ presented in Figure 3, we conclude that the chili transcriptome was most specialized and least diverse at 60 DAA and that the largest changes in the fruit transcriptome occurred in the period from 40 to 60 DAA. To provide some context for these observations, the human organs producing the least diverse and most specialized transcriptomes are the pancreas, salivary glands and stomach, whose functional role is to produce a relatively small number of metabolites in large quantities . According to our analyses, mature chili fruits at 60 DAA are similar in that the transcriptome at this stage is less diverse and more specialized than the preceding developmental stages. This implies that more specific transcripts were produced in larger quantities than at the other stages sampled.
Figure AF1-2 in Additional file 1 presents the distribution of the gene specificity values  estimated in this study. In general, we observed a large proportion (25%) of generalist genes which were expressed in all four stages at approximately the same level, and a large proportion (18%) of highly specific genes that were almost exclusively expressed in only one of the stages of development. Combining the gene specificity parameters with information pertaining to differential expression of genes during each of the three periods of development represents a powerful tool to further investigate the biological roles of genes of interest.
Grouping genes to gain biological knowledge
Grouping genes into categories has significant advantages to summarize gene expression information in extremely large datasets . This approach reduces the dimensionality of the problem; instead of interpreting the expression patterns of thousands of genes, discrete groups of genes can be categorized in a variety of ways. This approach also increases statistical power, given that larger numbers of RNA-Seq reads can be taken into account in each comparison . Models that explain transcriptional regulatory networks are an achievable goal, and the one developed for E. coli is a good example. Eventually, we aim to develop models for eukaryotes that can explain and predict gene expression in different tissues and organs as a function of external and internal signals. By classifying patterns of gene expression into a smaller number of quantitative categories, it is possible to examine the tendencies of related groups of genes to coordinately change their levels of expression. There are currently a plethora of methods to analyze the expression of groups of genes (see for example [38, 40]). In our study, we made use of a relatively simple measurement. By adding together all RNA-Seq reads mapping to genes participating in a particular category of interest, we obtained a quantitative parameter describing the expression of the category as a whole. This approach implies the reductionistic assumption that a category pattern of expression can be accurately represented as the sum of its parts. This is clearly not the case in categories characterized by genes in which some must be up-regulated and some must be down-regulated in order to participate in a particular biological activity. More sophisticated alternatives of analyzing genes in categories imply a better understanding of the function of each gene participating in a category. For example, certain transcription factors may be activated and a distinct set repressed. Thus, our approach of evaluating the sum of expression of genes participating in a category of interest is a crude, but necessary, first step in interpreting our expression results. This approach needs to be complemented with additional studies of the expression levels and functions associated with individual genes participating in each category.
In our study, we categorized chili pepper genes with sufficient annotation information into the Biological Process (BP) (from Gene Ontology ) terms and also into the Metabolic Pathways (MP) in which they participate. The latter classification scheme made use of the Kyoto Encyclopedia of Genes and Genomes (KEGG) . For every chili gene that could be identified as an Arabidopsis ortholog, its corresponding BP term or MP was recorded. These classification schemes are somewhat redundant in that a single gene may participate in more than one BP or MP. To address this issue, we divided the estimated expression values (number of cDNA reads mapping to the gene) between the number of categories in which the gene has been annotated to participate and used this normalization for the grouping analysis. Thus, the total number of RNA-Seq reads considered for all genes was not altered, preserving the statistical power of the analysis (see Methods for details). Genes lacking sufficient annotation to participate in any of the BP or MP categories formed an “offset” category which was not tested but was included in the dataset. Even when the literature for the analysis of gene groups is very abundant (see for example [38, 40]), to the best of our knowledge this is the first time that our particular approach to take into account the redundancy of the categories is reported.
Genes grouped by Biological Process (BP)
In the BP category analysis, a total of 8,628 chili pepper genes were classified into 875 BP categories using the “slim” GO term (see Methods for details). The number and percentage of BP categories participating in each one of the 27 previously described global patterns of expression is presented in Table 2. Interestingly, the number of genes exhibiting a particular pattern of expression was highly correlated with the number of BPs showing the same pattern (); this suggest that not much relevant information was lost as a result of the grouping process and that the summarizing grouping procedure reflects the global behavior of gene expression. The most common pattern of expression for BP categories as well as for individual genes was 14−S S S. This pattern denotes genes whose expression did not change significantly over the sampling periods. The 14−S S S pattern includes 328 BP categories, which was 37.49% of the total. Expression patterns showing change only from 40 to 60 DAA (13−S S D and 15−S S I) comprised 10.51 and 11.43% of all BP categories, respectively. Examples of BP categories showing the 13−S S D pattern (a decrease in expression only during the third interval) were Histone methylation, Determination of bilateral symmetry, Secondary metabolic process, Meristem initiation, Pyrimidine ribonucleotide biosynthetic process and Meiosis. The 15−S S I pattern (a significant increase in expression only during the last interval) included BP categories such as Riboflavin biosynthesis, Defense response to fungus incompatible interaction, Cellular lipid metabolic process and Circadian rhythm. Translational elongation was the sole BP category characterized by the 1−D D D pattern (a decrease in expression during each subsequent interval). Nine BP categories exhibited the 27−I I I pattern, which is an increase in expression during each subsequent interval. These nine BP ontologies were Intracellular signal transduction, Amino acid transport, DNA-dependent negative regulation of transcription, Cellular membrane fusion, Response to abscisic acid stimulus, Response to ethylene stimulus, Negative regulation of endopeptidase activity, Regulation of proteolysis and Translational initiation. A table including results for each of the BP ontologies involved in our study is found in Additional file 3.
Figure AF1-3 in Additional file 1 shows the expression of genes grouped into Fruit maturation BP as well as the expression of the grouped genes. Even when the analysis of the grouped expression shows a pattern of non significant changes (14−S S S), the pattern of all four genes forming this group shows a significant change in some of the intervals. In particular, the most influencing gene of the four (the one with largest expression and changes), the chili pepper orthologous of Arabidopsis AT1G47530, a MATE efflux family protein with functions in ripening, increases its expression significantly in the last two intervals, from 20 to 40 and 40 to 60 DAA. The fact that the grouped expression presents a pattern of non significant changes in this case is due to the changes in opposite directions from the genes forming the group. This illustrates the fact that it is important to examine not only the expression pattern of the group, but also of its individual components.
Figure AF1-4 in Additional file 1 shows the expression of 32 genes grouped into Developmental growth BP as well as the expression of the group as a whole. The pattern of change of this BP was significant at the three intervals, increasing in the first and decreasing in the following two intervals (pattern 19−I D D; green dashed line in Additional file 1: Figure AF1-4). The pattern 19−I D D of Developmental growth is consistent with what is known about the development of the chili pepper fruit, where the most active period of development occurs between 10 and 20 DAA to then decrease up to the fully mature state at 60 DAA (see Figure 1). In this case the graph of the individual behavior of the 32 individual genes grouped into Developmental growth BP is too complex to be directly interpreted. This shows the advantage of summarizing the expression of genes as groups.
Genes grouped by Metabolic Pathway (MP)
Chili pepper genes (1,794) were grouped into 152 metabolic pathways (MPs) according to the classification of their corresponding Arabidopsis orthologs (see Methods). As for the BP ontologies, net changes in the expression of MPs were evaluated by summing the weighted expression of genes participating in each MP. The 27 possible patterns of change for the net expression of MP categories are shown in Table 2. The correlation between the patterns presented by genes considered individually and MP was estimated to be () whereas the correlation between the patterns observed for BP and MP was estimated to be (). These measures indicate that there was greater concordance between the patterns presented by genes grouped into MP categories with genes grouped into BP categories than between genes grouped into MP categories with individual genes. As was previously observed for individual genes and genes grouped by BP, the most common pattern exhibited by genes grouped by MP was 14−S S S. 31 (20.39%) MP categories showed a steady-state level of net expression during chili development. As was the case for individual genes and BP categories, the next-most frequent patterns of change in the expression of MP categories were those exhibiting significant differences only in the third interval: 13−S S D with 26 (17.11%) MPs and 15−S S I with 16 (10.53%) MPs. These two patterns were populated by a total of 42 (27.64%) MPs, reaffirming the fact that the third interval (between 40 and 60 DAA) had the most profound transcriptional changes in our samples of chili pepper development. Full results pertaining to our pattern analyses of MP categories during chili development are shown in Additional file 3. A total of 121 (80%) of the MPs included in our study showed a significant change in net expression in at least one of the three intervals studied (Table 2). Only 2 MPs (Amino sugar and nucleotide sugar metabolism as well as Steroid biosynthesis) exhibited the pattern of significant decrease in each subsequent interval (pattern 1−D D D), while three MPs (Phenylpropanoid biosynthesis, Tryptophan metabolism and Tyrosine metabolism) exhibited the pattern of significant increase in net expression in each subsequent interval (pattern 27−I I I). This result suggests a crucial role for these metabolic processes during chili development and ripening, however, a recent metabolomic study of Capsicum reports non significant changes in tyrosine during ripening while tryptophan was not detected, possibly by methodological shortcomings. On the other hand, the continuous increase in the expression of genes related with the phenylpropanoid biosynthesis during development could be related to the lignification of the fruit and also to the synthesis of capsaicinoids, as has been reported .
The expression patterns of genes associated with the capsaicinoid and ascorbic acid biosynthesis MP categories are shown in Additional file 1: Figures AF1-5 and AF1-6, respectively. Gene expression estimates for each gene participating in these categories are shown in Additional file 1: Tables AF1-6 and AF1-7, respectively. For the capsaicinoid biosynthesis pathway, the net expression pattern of the grouped genes participating in this pathway was 19−I D D, with a peak at 20 DAA. Six of the 13 individual genes grouped into this pathway also exhibited this pattern. This result is in agreement with previous studies of capsaicinoid accumulation in ‘Tampiqueño 74’ chili pepper fruits . These pungent compounds begin to accumulate at 10 DAA, reach their maximum levels at 50 DAA and then decay . With the aim of identifying genes involved in the biosynthesis of capsaicinoids Liu et al. performed an RNA-Seq study contrasting the placenta and pericarp of a highly pungent cultivar of C. frutescens L. . Fruits were collected in the period of 15 to 25 DAA and hundreds of genes potentially regulating capsaicinoid biosynthesis were identified, which were predicted to be involved in microbody, peroxisome, fatty acid synthase activity, CoA-ligase activity, acyltransferase activity, transaminase activity, and phenylalanine metabolism, among other processes. In our study we varied the time period (DAA) studying whole fruits, and we identified groups of genes differentially expressed in time with peroxisome localization (23 genes) as well as genes involved in protein import into the peroxisome matrix (74 genes). In general, these genes tend to increase their expression as a function of time. In addition, genes involved in phenylalanine metabolism and CoA metabolism as well as acyltransferase or transaminase activities were found to be differentially expressed during chili development (see Additional files 2, 3, and 4 for details). Two genes recently shown to be involved in the capsaicinoids biosynthetic pathway, DHAD (dihydroxyacid dehydratase) and TD (thr deaminase) , were identified in our study. DHAD presented the expression pattern 13−S S D, while TD presented the pattern 1−D D D. A comparison of the results by Liu et al. for different parts of the fruit with the data reported here for changes in transcription activity trough time in the whole fruit provides a better understanding of the dynamics of the transcriptome during fruit development.
Genes categorized into the ascorbic acid biosynthesis pathway also exhibited the 19−I D D pattern. In this case, the pattern characterizing the whole group was driven by the expression of the GGP gene, which codify for a GDP-galactose phosphorylase  (see Figure AF1-6 and comments in Additional file 1). The results presented in Alós et al.  showing the relative expression of genes involved in the ascorbic acid pathway (Figure 3 of that reference) are highly correlated with the expression levels obtained here and presented in Additional file 1: Figure AF1-6 and Table AF1-7, even though the cultivars and methods employed are different. This suggests that the pattern for the expression of genes involved in the ascorbic acid pathway is consistent between cultivars when comparing equivalent states of development.
Carotenoids are visual markers of chili pepper maturation. Color of the fruits may change from green to yellow, orange or red, depending on the type of carotenoids synthesized and accumulated by the fruits . Nine genes grouped into the carotenoid biosynthesis MP exhibited the expression pattern 6−D S I, characterized by a significant decrease in net expression from 10 to 20 DAA, a steady state from 20 to 40 DAA and a significant increase between 40 to 60 DAA.
The expression patterns and levels for genes grouped into the carotenoid biosynthesis pathway are shown in Additional file 1: Table AF1-8 and Figure AF1-7, respectively. For eight of the nine genes grouped into this pathway, the expression pattern was characterized by a significant up-regulation between 40 and 60 DAA, where the color change from green to red in chili pepper fruits usually occurs (see Figure 1 and Additional file 1: Table AF1-8). Of these genes, the one with the largest change in expression was a gene encoding a capsanthin/capsorubin synthase. Expression of this gene increased markedly, from 6 transcripts per million (TPM) at 10 DAA to 15,206 TPM at 60 DAA (row 3 of Additional file 1: Table AF1-8, Panel A of Additional file 1: Figure AF1-7). Another gene with a large influence on the net expression behavior of the carotenoid biosynthetic pathway was a β-carotene hydroxylase (row 6 of Additional file 1: Table AF1-8, Panel A of Additional file 1: Figure AF1-7), which showed an expression change from 42 TPM at 10 DAA to 1,709 TPM at 60 DAA. Notably, none of these carotenoid biosynthesis genes were up-regulated during the first period from 10 to 20 DAA when chili fruits are still green. Our results confirm the findings of Romar et al.  showing that genes involved in the carotenoid biosynthetic pathway are not coregulated during chili fruit ripening, which is consistent with the hypothesis regarding differences in the expression of these genes. Ha et al. studied carotenoid accumulation and expression of genes involved in the carotenoid pathway in chili varieties with different levels of fully ripe color . In that study the authors concluded that the expression levels of the phytoene synthase, phytoene desaturase, and capsanthin/capsorubin synthase genes are high in peppers with high levels of total carotenoids. These results are consistent with our findings, given that ‘Tampiqueño 74’ is a cultivar with strong red color at maturity (see Figure 1). Moreover, the results presented here add a more precise time frame for the expression of these genes (see Additional file 1: Table AF1-8 and Figure AF1-7).
Comparison of qRT-PCR vs. RNA-Seq results for genes related to carotenoid biosynthesis
Our qRT-PCR analysis also confirmed the finding that genes pertaining to carotenoid biosynthesis were most abundantly expressed during the last period of fruit maturation, reaching their maximum levels at 60 DAA.
This study presents a detailed analysis of gene expression in chili pepper fruit at four stages of development. We estimated that our RNA-Seq transcriptomes may be missing approximately 6,000 genes expressed at very low levels and that the scaled number of mRNA molecules in the samples were about the same sizes as the number of cDNA reads obtained. We analyzed the expression patterns of individual genes, as well as groups of genes categorized into Biological Processes (BP) ontologies and Metabolic Pathways (MP). The most prevalent pattern was for individual genes as well as the BP and MP groupings to change their behavior only during the third interval, from 40 to 60 DAA. In this interval, down-regulated genes were more prevalent, marking the end of the fruit maturation and the starting point of senescence. We also concluded that transcriptome diversity was at its maximum at 40 DAA and that the transcriptome of the mature fruit at 60 DAA was the most specialized and least diverse of the transcriptomes studied.
We demonstrated that by grouping genes and studying the pattern of expression of the groups as well as the individual components, it was possible to gain insight into the behavior of the whole system. By using this approach we inferred, for example, that the most abundant expression of genes related to capsaicinoid and ascorbic acid biosynthesis occurred at 20 DAA, while for genes related to carotenoid biosynthesis the maximum was estimated at 60 DAA.
Biological material and RNA extraction
Capsicum annuum Serrano ‘Tampiqueño 74’ seeds were germinated as previously described  and the seedlings were cultivated until maturity under greenhouse conditions in a completely randomized experimental design at Cinvestav-Unidad Irapuato (Guanajuato, México). The plants were grown during the spring and summer, and individual flowers were tagged at anthesis. Chili pepper fruits were randomly collected from different plants at 10, 20, 40 and 60 DAA. After sampling, the fruits were cleaned with ethanol immediately frozen in liquid nitrogen and stored at -80°C until further use. For total RNA extractions, 10 fruits at the 10 DAA developmental stage and 5 fruits from each of the 20, 40 and 60 DAA stages were randomly selected from the pool of all harvested fruits. Whole fruits (pericarp, placenta and seeds) were ground in liquid nitrogen with a mortar and pestle to form a fine uniform powder. Samples were mixed vigorously and 100 mg aliquots were measured for each RNA extraction. The process was repeated with different sets of fruits in order to obtain two independent biological replicates at each developmental stage. A NucleoSpin RNA Plant kit (Macherey-Nagel) was used for total RNA extraction and contaminating genomic DNA was removed by DNase I (Macherey-Nagel) treatment following the manufacturers’ protocols.
Total RNA concentration was quantified using a NanoDrop ND-1000 spectrophotometer (Nano-Drop, Wilmington, DE, USA) and RNA quality was evaluated by gel electrophoresis on 1.0% denaturing agarose gels. In addition, aliquots of RNA were run on an Agilent 2100 Bioanalyzer using RNA 6000 chips (Agilent, Santa Clara, CA, USA) to test the RNA integrity number (RIN). All eight samples had RIN values higher than 8.7. Thirty μ g of total RNA from each of the eight samples (two biological replicates of four fruit developmental stages) was used for cDNA library preparation.
Library construction and sequencing
The eight total RNA samples (two biological replicates from chili pepper fruits at 10, 20, 40 and 60 DAA) were prepared for RNA-Seq using the Illumina TruSeq RNA Sample Preparation v2 Guide following the manufacturer’s instructions. Briefly, mRNA was purified from 20 μ g of total RNA using poly-T oligo-attached magnetic beads using two rounds of purification. During the second elution of the poly-A RNA, mRNA was fragmented using divalent cations under elevated temperature and primed for cDNA synthesis. The cleaved RNA fragments were primed with random hexamers and reverse transcribed into single-stranded cDNA using reverse transcriptase. In the next step, the RNA template was removed and the complementary cDNA strand was synthesized using RNAse H and DNA polymerase I, respectively.
The overhangs that resulted from fragmentation were polished into blunt ends using an End Repair Mix (consisting of T4 DNA polymerase, Klenow fragment and T4 polynucleotide kinase). A single T nucleotide was added on the 3’ end of the adapter for ligating the adapter to the cDNA fragments. Indexing adapters were ligated to the ends of the cDNAs using T4 DNA ligase, preparing them for hybridization onto a flow cell. Finally, the DNA fragments with adapter molecules at both ends were amplified by PCR to enrich the amount of DNA in the library. PCR was performed with a PCR primer cocktail that anneals to the ends of the adapters. Quantity and quality of the DNA libraries was assessed using Agilent DNA-1000 chips on an Agilent 2100 Bioanalyzer.
The 8 cDNA libraries were pooled and simultaneously sequenced from both 5’ and 3’ ends using the Illumina MiSeq ®; System platform according to the manufacturer’s instructions. We performed three sequencing runs (technical replicates) with the aim of increase the sequencing depth. 150 bp paired-end reads were obtained in each sequencing run (see Table AF1-1 in Additional file 1). Fluorescent image processing, base-calling and quality value calculations for each of the three runs were performed using Illumina MiSeq Control Software. Data were deposited at the NCBI (GEO database under series record GSE54123).
Quality filtering, de novo assembly and remapping
Before assembly, the raw reads were filtered using PRINSEQ 0.20.3 software to obtain high-quality reads lacking adaptor sequences or ambiguous nucleotides. De novo assembly of the trimmed reads was performed using Trinity  (release 20121005) using DIAG (Data Intensive Academic Grid facilities)  (See Additional file 1 for details). The total number of sequences obtained and their characteristics are presented in Table AF1-2 in Additional file 1. For remapping sequence reads to the assembled contigs and transcript quantification of 45,505 genes and 99,487 isoforms assembled, we used RSEM  version 1.2.0 software (see Table AF1-2 in Additional file 1 for details).
For annotation purposes, only the principal isoform of each contig generated by the Trinity assembler was used. Various BLAST databases were sequentially queried with the chili pepper fruit sequences to obtain the most likely orthologs. Details pertaining to this process as well as the results of the BLAST queries are presented in Table AF1-3 in Additional file 1.
Blast2GO  was also used to obtain GO biological process information from BLASTx TAIR 10 hits using default parameters for the annotation rule. Metabolic pathway (MP) assignments were carried out based on the KEGG database  using the KAAS server automatic annotator .
The data for the capsaicinoid and ascorbic acid pathways (which are not present in KEGG) were obtained by manual curation of the data. This procedure consisted in identifying in our dataset the chili pepper sequences coding for each one of the enzymes taking part in each one of the two pathways (see [9, 45, 48] for the capsaicinoid pathway and  for the ascorbic acid pathway). We used the relevant polipeptide sequences from the GenBank database  (see Additional file 1 for details). The statistical analyses of these two pathways was performed as described for all the other gene groups.
To confirm our RNA-Seq results, nine genes associated with the carotenoid pathway  were chosen for expression validation using qRT-PCR with gene specific primers. Primers were designed with primer 3 plus software  and are listed in Additional file 5. Total RNA was extracted using a NucleoSpin RNA Plant kit (Macherey-Nagel) from independent samples of whole fruits collected at 10, 20, 40 and 60 DAA. Total RNA (1 μ g per sample) was digested with DNAse I (Invitrogen, Carlsbad, CA) to remove DNA. RNA concentration was measured using a ND-1000 spectrophotometer (NanoDrop products, Wilmington, DE) and 500 ng of total RNA per sample was reverse transcribed using SuperScript III Reverse Transcriptase (Invitrogen). All qRT-PCR reactions were performed using an ABI 7500 Real Time System (Applied Biosystems) using ubiquitin (Capsicum annuum hexameric polyubiquitin 6PU11, ubi11; GenBank accession AY496112) and elongation factor (Capsicum chinense E F 1α mRNA for elongation factor 1 α; GenBank accession AB275381) genes as the internal controls. Each 20 μ L reaction was composed of 1 μ L cDNA (100 ng/uL), 2 μ L of primer mix (10 mM of each forward and reverse primer), 8 μ L of sterile water and 10 μ L of SYBR Green PCR Master Mix reagent (Applied Biosystems). Reactions were performed with an initial incubation at 50°C for 2 min and at 95°C for 10 min, and then cycled at 95°C for 15 s, and 60°C for 60 s for 40 cycles. Duplicates (biological replicates) from independent RNA extractions of each state of development were performed. Relative gene expression was calculated using the method . The qRT-PCR results are presented as fold-changes in gene expression relative to the 10 DAA sample (Figure 6).
Data resulting from sequencing, assembly and annotation procedures were collected into a MySQL Ⓒ relational database (Server version 5.1.49). Expression data for each contig was obtained as the number of reads remapped by the RSEM software  version 1.2.0 (see Table AF1-2 in Additional file 1 for details). No correction for transcript length was applied, given that we are always comparing the same gene under different conditions (stage of development of the chili pepper fruit) . All statistical analyses were performed in R  version 2.15.3 (2013-03-01). Expression data for the 42,401 contigs in the eight sequenced libraries (see Table AF1-2 in Additional file 1) was handled by summing the numbers of reads that mapped to the contigs that shared the same identifier from the BLAST analyses. In other words, we considered contigs with the same BLAST identifier to represent the same gene. This resulted in a data matrix with 34,066 rows (called here “chili pepper genes”, even if some cases they were identified with an orthologous gene from other specie) and eight columns (libraries; see Table AF1-3 in Additional file 1). To estimate the true number of genes expressed in each transcriptome (Table 1) we used equations described in Chao  and Changjiang et al.. To estimate the scaled number of mRNA molecules, , presented in Table 1, we used Equation (8) in Good .
The R package edgeR was used to evaluate differential gene expression (DGE) between adjoining intervals during development (between 10 to 20, 20 to 40 and 40 to 60 DAA). Other statistical contrasts involving not adjoining intervals were not performed because we considered that all relevant information about the changes of expression is already present in the analysis of adjoining intervals. Briefly, for each contrast (adjoining interval) we entered the matrix containing the data (counts of the number of reads for each gene in each library) using the DGEList function, estimated common and tag-wise dispersion and performed the exact test via the exactTest function. P values resulting from the exact test were then fed into the qvalue function  with default parameters, except that we set fdr.level=0.01 to obtain a false discovery rate of 1%. The resulting Q values were used to classify the genes into the patterns of expression presented in Table 2. All information resulting from the analyses was stored in the MySQL relational database for interpretation and complete results are presented in Additional file 2.
To perform differential expression analyses between groups of genes associated with BP or MP of interest, we collapsed the original data matrix (34,066 rows and eight columns) into transformed matrices that additively reflected the expression of each of the gene groups. To achieve this, we first selected and classified all genes with their corresponding category (BP or MP). Given that some of the categories are redundant, i.e., a gene can belong to more than one BP or MP, the numbers of reads per gene (tag counts) for genes belonging to more than one category were divided between the number of genes in the corresponding category. For example, if gene “a” was classified into 5 BP categories, the original number of gene tags for gene “a” in each library was divided by 5. Having the weighted expression for all genes belonging to at least one category we added the weighted expression of genes in each category to obtain the final category expression. The expression (number of reads) of all genes which were not classified into a category were added together to form the last row of the matrix of group expression. This last row is taken as an offset and was not tested. It is important to notice that this procedure conserve the amount of statistical evidence existent in the original matrix, given that the transformed matrices have the same total of reads (sum per column) than the original matrix. Transformed matrices were subjected to the same statistical procedure than the one performed for single genes (see previous paragraph), obtaining the Q values corresponding to each pair of neighboring intervals. All information resulting from these analyses was stored in the MySQL relational database for interpretation and complete results are presented in Additional file 3 for BP and in Additional file 4 for MP categories.
Availability of supporting data
The data sets supporting the results of this article are included within the article (and its additional files). Raw reads for the eight libraries sequenced are available in the NCBI, Gene Expression Omnibus (GEO) repository, http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54123.
This research was funded by grants to NO-A Conacyt project 177063, OM Conacyt project 165778, LAM-L Conacyt scholarship 262926. We thank Alejandra Aguilar-Barragán for her technical help in NO-A’s lab, Fernando Hernández Godínez for technical help in OM lab and the Data Intensive Academic Grid (DIAG)  for computing facilities used in the transcriptome assembly. We also acknowledge valuable suggestions from six anonymous referees.
- Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM:Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature methods. 2008, 5 (7): 613-619. 10.1038/nmeth.1223.PubMedView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M:RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralPubMedView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson Da, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A:Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29 (7): 644-652. 10.1038/nbt.1883.PubMed CentralPubMedView ArticleGoogle Scholar
- Auer PL, Doerge R:Statistical design and analysis of RNA sequencing data. Genetics. 2010, 185 (2): 405-416. 10.1534/genetics.110.114983.PubMed CentralPubMedView ArticleGoogle Scholar
- Ekblom R, Galindo J:Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity. 2010, 107: 1-15.PubMed CentralPubMedView ArticleGoogle Scholar
- Mochida K, Shinozaki K:Advances in omics and bioinformatics tools for systems analyses of plant functions. Plant Cell Physiol. 2011, 52 (12): 2017-2038. 10.1093/pcp/pcr153.PubMed CentralPubMedView ArticleGoogle Scholar
- Hudson NJ, Dalrymple BP, Reverter A:Beyond differential expression the quest for causal mutations and effector molecules. BMC Genomics. 2012, 13: 356-10.1186/1471-2164-13-356.PubMed CentralPubMedView ArticleGoogle Scholar
- Ochoa-Alejo N, Ramírez-Malagón R:In vitro chili pepper biotechnology. In Vitro Cell Dev Biol-Plant. 2001, 37 (6): 701-729. 10.1007/s11627-001-0121-z.View ArticleGoogle Scholar
- Aza-González C, Núñez-Palenius HG, Ochoa-Alejo N:Molecular biology of capsaicinoid biosynthesis in chili pepper (Capsicum, spp.). Plant Cell Rep. 2011, 30 (5): 695-706. 10.1007/s00299-010-0968-8.PubMedView ArticleGoogle Scholar
- Aza-González C, Núñez-Palenius HG, Ochoa-Alejo N:Molecular biology of chili pepper anthocyanin biosynthesis. J Mexican Chem Soc. 2012, 56: 93-98.Google Scholar
- Kim HJ, Baek KH, Lee SW, Kim J, Lee BW, Cho HS, Kim WT, Choi D, Hur CG:Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome. BMC Plant Biol. 2008, 8: 101-10.1186/1471-2229-8-101.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu S, Chen C, Chen G, Cao B, Chen Q, Lei J:RNA-sequencing tag profiling of the placenta and pericarp of pungent pepper provides robust candidates contributing to capsaicinoid biosynthesis. Plant Cell, Tissue Organ Culture (PCTOC). 2012, 110: 111-121. 10.1007/s11240-012-0135-8.View ArticleGoogle Scholar
- Liu S, Li W, Wu Y, Chen C, Lei J:De novo transcriptome assembly in chili pepper (Capsicum frutescens) to identify genes involved in the biosynthesis of capsaicinoids. PloS one. 2013, 8: e48156-10.1371/journal.pone.0048156.PubMed CentralPubMedView ArticleGoogle Scholar
- Lu FH, Cho MC, Park YJ:Transcriptome profiling and molecular marker discovery in red pepper,Capsicum annuumL. TF68. Mol Biol Rep. 2012, 39 (3): 3327-3335. 10.1007/s11033-011-1102-x.PubMedView ArticleGoogle Scholar
- Ashrafi H, Hill T, Stoffel K, Kozik A, Yao J, Chin-Wo SR, Van Deynze A:De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes. BMC Genomics. 2012, 13: 571-10.1186/1471-2164-13-571.PubMed CentralPubMedView ArticleGoogle Scholar
- Góngora-Castillo E, Fajardo-Jaime R, Fernández-Cortes A, Jofre-Garfias AE, Lozoya-Gloria E, Martínez O, Ochoa-Alejo N, Rivera-Bustamante R:The capsicum transcriptome DB: a hot tool for genomic research. Bioinformation. 2012, 8: 43-10.6026/97320630008043.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee S, Kim SY, Chung E, Joung YH, Pai HS, Hur CG, Choi D:EST and microarray analyses of pathogen-responsive genes in hot pepper (Capsicum annuum) non-host resistance against soybean pustule pathogen (Xanthomonas axonopodispv. glycines). Funct Integr Genomics. 2004, 4 (3): 196-205.PubMedView ArticleGoogle Scholar
- Lee S, Yun SC:The ozone stress transcriptome of pepper (Capsicum annuumL.). Mol Cells. 2006, 21 (2): 197-PubMedGoogle Scholar
- Góngora-Castillo E, Ibarra-Laclette E, Trejo-Saavedra DL, Rivera-Bustamante RF:Transcriptome analysis of symptomatic and recovered leaves of geminivirus-infected pepper (Capsicum annuum). Virol J. 2012, 9: 295-10.1186/1743-422X-9-295.PubMed CentralPubMedView ArticleGoogle Scholar
- Lee S, Choi D:Comparative transcriptome analysis of pepper (Capsicum annuum) revealed common regulons in multiple stress conditions and hormone treatments. Plant Cell Rep. 2013, 32 (9): 1351-1359. 10.1007/s00299-013-1447-9.PubMedView ArticleGoogle Scholar
- Abraham-Juárez M, Rocha-Granados M, López MG, Rivera-Bustamante RF, Ochoa-Alejo N:Virus-induced silencing of Comt, pAmt and Kas genes results in a reduction of capsaicinoid accumulation in chili pepper fruits. Planta. 2008, 227 (3): 681-695. 10.1007/s00425-007-0651-7.View ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A:Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29 (7): 644-652. 10.1038/nbt.1883.PubMed CentralPubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL:Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10 (3): R25-10.1186/gb-2009-10-3-r25.PubMed CentralPubMedView ArticleGoogle Scholar
- Kuznetsov Va, Knott GD, Bonner RF:General statistics of stochastic process of gene expression in eukaryotic cells. Genetics. 2002, 161 (3): 1321-1332.PubMed CentralPubMedGoogle Scholar
- Good IJ:The population frequencies of species and the estimation of population parameters. Biometrika. 1953, 40 (3): 237-264.View ArticleGoogle Scholar
- Chao A, Lee SM:Estimating the number of classes via sample coverage. J Am Stat Assoc. 1992, 87: 210-217. 10.1080/01621459.1992.10475194.View ArticleGoogle Scholar
- Changjiang X, Luzhou X, Fahong Y, Weihong T, Leonid L, Jian L:Nonparametric estimation of the number of unique sequences in biological samples. IEEE Trans Signal Process. 2006, 54 (10): 3759-3767.View ArticleGoogle Scholar
- Robinson M, McCarthy D, Smyth G:edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26: 139-140. 10.1093/bioinformatics/btp616.PubMed CentralPubMedView ArticleGoogle Scholar
- The Arabidopsis Information Resource. [http://arabidopsis.org],
- The Tomato Genome Consortium:The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012, 485 (7400): 635-641. 10.1038/nature11119.View ArticleGoogle Scholar
- R Core Team: R: A Language and Environment for Statistical Computing. 2012, Vienna: R Foundation for Statistical Computing, [http://www.R-project.org/] [ISBN 3-900051-07-0]Google Scholar
- Robinson MD, Smyth GK:Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008, 9: 321-332.PubMedView ArticleGoogle Scholar
- Storey JD, Tibshirani R:Statistical significance for genomewide experiments. Proc Natl Acad Sci. 2003, 100: 9440-9445. 10.1073/pnas.1530509100.PubMed CentralPubMedView ArticleGoogle Scholar
- Miedes E, Lorences EP:Xyloglucan endotransglucosylase/hydrolases (XTHs) during tomato fruit growth and ripening. J Plant Physiol. 2009, 166 (5): 489-498. 10.1016/j.jplph.2008.07.003.PubMedView ArticleGoogle Scholar
- Cenik ES, Zamore PD:Argonaute proteins. Current Biol. 2011, 21 (12): R446-R449. 10.1016/j.cub.2011.05.020.View ArticleGoogle Scholar
- González-Álvarez DL, Vega-Rodríguez MA:Analysing the scalability of multiobjective evolutionary algorithms when solving the motif discovery problem. J Glob Optimization. 2013, 57 (2): 467-497. 10.1007/s10898-013-0069-7.View ArticleGoogle Scholar
- Martínez O, Reyes-Valdés MH:Defining diversity, specialization, and gene specificity in transcriptomes through information theory. Proc Natl Acad Sci USA. 2008, 105 (28): 9709-9714. 10.1073/pnas.0803479105.PubMed CentralPubMedView ArticleGoogle Scholar
- Khatri P, Sirota M, Butte AJ:Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012, 8 (2): e1002375-10.1371/journal.pcbi.1002375.PubMed CentralPubMedView ArticleGoogle Scholar
- Salgado H, Gama-Castro S, Peralta-Gil M, Diaz-Peredo E, Sanchez-Solano F, Santos-Zavaleta A, Martinez-Flores I, Jimenez-Jacinto V, Bonavides-Martinez C, Segura-Salazar J, Martinez-Antonio A, Collado-Vides J:RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions. Nucleic Acids Res. 2006, 34 (suppl 1): D394-D397.PubMed CentralPubMedView ArticleGoogle Scholar
- Väremo L, Nielsen J, Nookaew I:Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. Nucleic Acids Res. 2013, 41 (8): 4378-4391. 10.1093/nar/gkt111.PubMed CentralPubMedView ArticleGoogle Scholar
- The Gene Ontology. [http://www.geneontology.org/],
- Kyoto Encyclopedia of Genes and Genomes. [http://www.genome.jp/kegg/],
- Aizat WM, Dias DA, Stangoulis JC, Able JA, Roessner U, Able AJ:Metabolomics of capsicum ripening reveals modification of the ethylene related-pathway and carbon metabolism. Postharvest Biol Technol. 2014, 89: 19-31.View ArticleGoogle Scholar
- Sukrasno N, Yeoman M:Phenylpropanoid metabolism during growth and development ofCapsicum frutescensfruits. Phytochemistry. 1993, 32 (4): 839-844. 10.1016/0031-9422(93)85217-F.View ArticleGoogle Scholar
- Salgado-Garciglia R, Ochoa-Alejo N:Increased capsaicin content in PFP-resistant cells of chili pepper (Capsicum annuumL.). Plant Cell Rep. 1990, 8 (10): 617-620. 10.1007/BF00270067.PubMedView ArticleGoogle Scholar
- Bulley S, Wright M, Rommens C, Yan H, Rassam M, Lin-Wang K, Andre C, Brewster D, Karunairetnam S, Allan AC, Laing WA:Enhancing ascorbate in fruits and tubers through over-expression of the l-galactose pathway gene GDP-l-galactose phosphorylase. Plant Biotechnol J. 2012, 10 (4): 390-397. 10.1111/j.1467-7652.2011.00668.x.PubMedView ArticleGoogle Scholar
- Alós E, Rodrigo MJ, Zacarías L:Transcriptomic analysis of genes involved in the biosynthesis, recycling and degradation of L-ascorbic acid in pepper fruits (Capsicum annuumL.). Plant Sci. 2013, 207: 2-11.PubMedView ArticleGoogle Scholar
- Gómez-García MR, Ochoa-Alejo N:Biochemistry and molecular biology of carotenoid biosynthesis in chili peppers (Capsicum spp.). Int J Mol Sci. 2013, 14: 19025-19053. 10.3390/ijms140919025.View ArticleGoogle Scholar
- Romer S, Hugueney P, Bouvier F, Camara B, Kuntz M:Expression of the genes encoding the early carotenoid biosynthetic-enzymes inCapsicum annuum. Biochem Biophys Res Commun. 1993, 196 (3): 1414-1421. 10.1006/bbrc.1993.2410.PubMedView ArticleGoogle Scholar
- Ha SH, Kim JB, Park JS, Lee SW, Cho KJ:A comparison of the carotenoid accumulation inCapsicumvarieties that show different ripening colours: deletion of the capsanthin-capsorubin synthase gene is not a prerequisite for the formation of a yellow pepper. J Exp Bot. 2007, 58 (12): 3135-3144. 10.1093/jxb/erm132.PubMedView ArticleGoogle Scholar
- Livak KJ, Schmittgen TD:Analysis of relative gene expression data using real-time quantitative PCR and the 2Δ ΔCT method. Methods. 2001, 25 (4): 402-408. 10.1006/meth.2001.1262.PubMedView ArticleGoogle Scholar
- Data Intensive Academic Grid (DIAG). [http://diagcomputing.org/],
- Li B, Dewey C:RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011, 12: 323-10.1186/1471-2105-12-323.PubMed CentralPubMedView ArticleGoogle Scholar
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, Robles M:Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics. 2005, 21 (18): 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M:KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007, 35 (suppl 2): W182-W185.PubMed CentralPubMedView ArticleGoogle Scholar
- GenBank. [http://www.ncbi.nlm.nih.gov/genbank/],
- Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JA:Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 2007, 35 (suppl 2): W71-W74.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.