Transcriptome sequencing, sequence quality control and de novo assembly
Illumina 101 bp Paired-End sequencing run representing the cDNA library from leaf and flower tissue produced 23,200,000 reads for flower (PYT_F) and 28,500,110 reads for leaf (PYT_L) respectively. Total reads encompassed nearly 2.5 Gb of sequencing data in FASTAq format. Sequence data was filtered to remove low quality reads and reads containing adapter sequences. After quality control a total of 11,617,581 high quality sequencing reads for flower and 14,290,110 for leaf tissues remained, which were assembled into 65,968 and 80,972 unique sequences (contigs) for flower and leaf tissues, respectively. The lengths of these unique sequences were sufficient to enable functional annotations with high accuracy. The reads obtained were assembled by using Trinity software [16] (Fig. 1).
Several de novo assembly output parameters de novo were analyzed including total number of contigs, contigs with smallest length, N50 length, N80 length, longest contig length and smallest contig length as a function of k-mer. For above mentioned data set, N50 length was 575 and 1293, N80 length was 303 and 615, largest and smallest lengths were 8718 and 7717 and 201 and 201 for flower (PYT_F) and leaf (PYT_L) respectively. Total contig bases found for flower and leaf comprised 31.57e6 and 69.24e6 bp respectively.
Gene ontology and functional annotation
A total of 65,968 and 80,972 unique sequences from the flower and leaf tissue were assigned with gene ontology (GO) terms based on sequence similarity to proteins in TAIR database. The T. cinerariifolium transcripts were assigned for GO terms to describe functions of genes and associated gene products into three, major categories namely; biological process, molecular function, and cellular component, and their sub-categories using plant specific GO that broadly provides an overview of the ontology content of the genes related to the pyrethrin biosynthetic pathway. The molecular function, biological process, and cellular component categories included 26190, 23862 and 22328 unigenes, respectively which were assigned into 34, 44 and 34 GO terms, respectively (Additional file 1).
In biological process group, 719, 283, 210 and 178 transcripts were assigned to metabolism, biosynthesis, nucleic acid metabolism and transport categories respectively. Similarly, 415, 381, 346 and 191 transcripts were assigned to cellular component cellular protein intracellular and cytoplasmic components categories respectively. In molecular function category, a total of 1409, 150, 363 and 271 transcripts were assigned to molecular activity, catalytic activity, transferase activity and hydrolase activity respectively (Additional files 2, 3, and 4; Fig. 2). These GO annotations provide a substantial information on potential functions of the transcripts identified in the T. cinerariifolium tissues. For the annotation and validation of the assembled unigenes, all the assembled unigenes were searched against the NCBI non-reduntant (Nr) and Swissprot protein databases using BLASTX program with an E-value threshold of 1E-5 (Fig. 3). A total of 9,304 unigenes were assigned to different GO category in leaf and flower tissues of T. cinerariifolium. Out of these 4558 unigenes belongs to flower and 4746 unigenes are found in leaf tissue separately (Fig. 4a, b).
Comparison of proteome of T. cinerariifolium with the proteome of other plant species
To further evaluate the quality of the sequenced data, the comparison of the differential proteome data of T. cinerariifolium leaf and flower tissue with the published proteome data of other plants viz; Arabidopsis thaliana, Sorghum bicolor, Vitis vinifera, Oryza sativa and Solanum tuberosum was performed. Total 63,960 clustered transcripts (contigs) from flower were used for proteome comparison studies. Out of these contigs of flower, 39,214, 36,712, 38,084, 39,946 and 37,704 showed a match with O. sativa, A. thaliana, S. tuberosum, V. vinifera and S. bicolor respectively. Similarly, in leaf tissue 41629, 44246, 43379, 45554 and 42765 contigs showed match with O. sativa, A. thaliana, S. tuberosum, V. vinifera and S. bicolor respectively. Leaf tissue transcripts showed comparatively higher match with other plants. In both the tissues, maximum match was found with Vitis vinifera protein (Fig. 5).
HPLC analysis
High performance liquid chromatography was used to confirm the de novo biosynthesis of pyrethrin in flower verses leaf tissue of T. cinerariifolium. UV-visible absorption spectrum of flower and leaf extracts as well as of standard pyrethrin was recorded at 225 nm. The chromatograms of the standard pyrethrin and T. cinerariifolium (PYT_L and PYT_F) extracts recorded peaks corresponding to pyrethrin (Fig. 6). All pyrethrin esters were separated well in the sequential order as Cinerarin II, Pyrethrin II, Jasmolin II and subsequently followed by Cinerarin I, Pyrethrin I, Jasmolin I respectively.
Metabolic pathway analysis through KEGG database
In order to identify the biological pathways present in T. cinerariifolium, the assembled unigenes were annotated with corresponding enzyme commission (EC) numbers through BLASTX (NCBI, USA) analysis, against the (Kyoto Encyclopedia of Genes and Genomes) KEGG database [17]. Pathway analysis helped us to understand the presence of biological function and interaction of genes. After assembly, 4,443 unigenes in flower and 8,901 unigenes in leaf were found to have match with KEGG database and assigned to approximately 13344 KEGG pathways (Additional file 5). These data provide a valuable resource in finding out unigenes involved in secondary metabolic pathways specially terpenoid biosynthetic pathway for pyrethrin biosynthesis (Fig. 7).
Identification and quantification of up/down regulated transcripts involved in pyrethrin biosynthetic pathways
The pyrethrin belongs to monoterpenoid backbone and constitutes two moieties; acid and alcohol moiety. The formation of acid (chrysanthemic/pyrethric) moiety utilizes both methylerythritol 4-phosphate (MEP) as well as mevalonate (MVA) pathways, however, the formation of rethrolones utilize oxylipin pathway. In studied transcriptome data, most of the unigenes related to pyrethrin biosynthesis pathway were successfully identified with their respective gene ontology.
The identified 291 enzymes (unigenes) in flower and 484 enzymes (unigenes) in leaves involved in pyrethrin biosynthesis pathway. Out of 565 unigenes, involved in MEP pathway; 211 and 354 upregulated transcripts in the flowers and leaves respectively were assigned for the formation of acid moiety of pyrethrin. Also 43 and 106 upregulated unigenes in flower and leaf respectively encoded enzymes involved in the oxylipin pathway, which forms the rethrolone moiety of the pyrethrin compound.
Analysis of metabolic pathway genes which might be involved in pyrethrin biosynthesis
Biosynthesis of pyrethrin constitutes three pathways involving MVA, Oxylipin pathway and MEP pathway (Fig. 7). Total number of enzymes involved in MVA was more in flower (37 transcripts) as compared to leaf (24 transcripts). There are two steps working in the MEP pathway i.e. Step 1; the number of enzymes in leaf (36 transcripts) was higher than in flower (19 transcripts). Step 2 of MEP pathway the number of enzymes in flower (58 transcripts) was more than in leaf (48 transcripts). The enzymes involved in the oxylipin pathway of leaf were found to be expressed more than two folds than in flower. This result indicates that the potential candidate genes for pyrethrin synthesis in MEP and MVA pathways were more expressed in flower than in leaf. Although, the genes/enzymes involved in the synthesis of rethrolones were higher in leaf than in flower i.e. 106 and 43 respectively. In addition to this, 270 transcripts related to CyP450 were found to be expressed in leaves and 134 transcripts related to CyP450 with oxidoreductase activity were found to be expressed in flowers. The exact biosynthetic pathway involved in pyrethrin biosynthetic pathway in planta is not yet well known.