The 454 pyrosequencing technology is regarded as a prime choice for novel gene discovery in non-model organisms. In the present study, this technology was applied with the main goal of identifying the P450s and UGTs involved in the biosynthesis of SSs in B. chinense. In previous reports, CYP93E1 from Glycine max was shown to hydroxylate β-amyrin and sophoradiol with the formation of olean-12-ene-3β, 24-diol, and soyasapogenol B, respectively . CYP88D6 from Glycyrrhiza uralensis was identified as a β-amyrin 11-oxidase . UGT73K1 and UGT71G1 from Medicago truncatula [26, 27] and UGT74M1 from Saponaria vaccaria  have been identified to be involved in triterpene biosynthesis. Thus far, no P450 or UGT were identified in SS-producing plant species. All known triterpenes and sterol hydroxylases have been classified into the CYP71 and CYP85 clans [24, 25, 51, 52]. In our 454 dataset, 114 unique sequences in 8 families and 52 unique sequences in 4 families belong to the CYP71 and CYP85 clans, respectively. Of these 49 were UGTs representing nine families, namely, UGT71, UGT72, UGT73, UGT74, UGT76, UGT84, UGT85, UGT91, and UGT94. Our data provide a promising opportunity for identifying the P450s and UGTs involved in SS biosynthesis. In the present study, 14 unique sequences of P450s and 20 UGTs were screened. Two P450s and three UGTs that may be involved in the biosynthesis of SSs, based on MeJA inducibility and tissue-specific expression patterns, were found. They are currently being identified by their heterologous expression in Escherichia coli or yeast, as well as by their overexpression and gene silencing in transgenic B. chinense plants. More candidate SS-related P450s and UGTs may be found among the annotated P450s and UGTs. Along with the identified P450s and UGTs, our results may also be helpful in revealing the formation mechanism of diverse monomer SSs and in elucidating other saponin biosynthetic pathways.
In the present study, the full-length cDNA clones of seven P450s and seven UGTs were obtained. Two of the P450s belong to the CYP736 family and the other five P450s belong to the CYP82, CYP712, CYP90, CYP707, and CYP716 families. The catalytic function of the CYP736 family is still unknown. Recent reports have shown that CYP736B in grapes may be involved in the host response to Xylella fastidiosa infection . CYP736A34 in soybean is also highly co-expressed with genes involved in root and Rhizobium-induced nodule development [; review in ]. CYP82 and CYP712 are part of the CYP71 clan family. Some members of the CYP82 family were found to mediate plant-specific alkaloid pathways, for example, CYP82E4 and CYP82E5v2 in tobacco were identified with nicotine N-demethylase activity [56, 57]. Arabidopsis CYP82C2 and CYP82C4 are 8-methoxypsoralen hydroxylases that mediate modifications of toxic furanocoumarin . However, a recent study has shown that CYP82G1 functions in the terpene pathway as a DMNT/TMTT (C11-homoterpene (E)-4, 8-dimethyl-1, 3, 7-nonatriene/C16-homoterpene (E, E)-4, 8, 12-trimethyltrideca-1, 3, 7, 11-tetraene) homoterpene synthase . CYP712 and the CYP93s may catalyze successive steps in the same pathway(s) in different plants . One of the CYP93s, CYP93E1, was found to participate in the triterpene pathway . CYP90, CYP707, and CYP716 are part of the CYP85 clan family. CYP90 is the first family of CYPs required for brassinosteroid synthesis. CYP90Bs, -As, -Ds, and -Cs successively act in the brassinosteroid pathway [60, 55]. The CYP707s inactivate ABA via 8'-hydroxylation to form phaseic acid, and thereby, play a key role in the regulation of ABA-mediated physiological processes . The CYP716s do not have a known function, but their closest non-plant relatives, CYP26As, are involved in the hydroxylation of retinoic acid . Based on sequence similarity, CYP716 was close to CYP725 in the neighbor-joining tree (Figure 4). A previous study using a broader range of plants also showed some overlap in CYP716 and CYP725. This overlap is evidence of the extensive divergence occurring within this subset of genes in the CYP85 clan. CYP725A has been shown to act on taxane diterpenoids . However, it is still unclear whether these two families share similar functions. The seven UGTs for which full-length cDNAs were generated in the present study have sequence similarities with members of different UGT families. This finding implied that the UGTs identified in the present study may be members of these different UGT families. Based on the neighbor-joining tree (Figure 5), BcUGT3 was found to be close to members of the UGT73 family, in particular to GmSGT2 (UGT73P2), MtGT3 (UGT73F3) and GeGT (UGT73F1); BcUGT6 was close to a UGT709 member. BcUGT2 and BcUGT7 were also close to UGT73 members and to other UGTs without definite family ascriptions. BcUGT1 was close to a UGT90 member. Previous studies [62, 63] have indicated that UGT73 and UGT90 belong to the same orthologous group, OG1 . UGT73B2 was shown to exhibit flavonoid 7-O-glucosyltransferase activity , UGT73A7 has been reported to exhibit 4, 2', 4', 6'-tetrahydroxy chalcone 4'-glucosyltransferase activity , and UGT90A7 was shown to exhibit luteolin 4'/7-O-glucosyltransferase activity . BcUGT4 and BcUGT5 may belong to the UGT94 family because they have sequence similarities with UGT94s. Previous studies have shown that UGT94D1 has UDP-glucose: sesaminol 2'-O-glucoside-O-glucosyltransferase activity and UGT94F1 has anthocyanin 3-O-glucoside-2''-O-glucosyltransferase activity . Although the definite functions of the seven P450s and seven UGTs from B. chinense identified in the present study still have to be verified by further experiments, the isolation of their full-length cDNAs will be significant for elucidating their biofunctions in the growth and development of B. chinense.
The biosynthesis and regulation of bioactive components was the main focus of the present study on B. chinense. In addition to SSs isolated from members of the genus Bupleurum that exhibit pharmacological activity, several other groups of secondary metabolites with relevant biological activity have been characterized, for example, polysaccharides with anti-ulcer activity and lignans with anti-proliferative activity . Genes involved in polysaccharides and lignans were searched for in the present 454 dataset. For example, enzymes encoded by genes related to polysaccharides include (1, 3)-beta-D-glucan synthase, alpha-1, 6-xylosyltransferase, alpha-(1, 4)-galacturonosyltransferase, xylan 1, 4-beta-xylosidase, etc.  and enzymes encoded by the genes related to lignans, are phenylalanine ammonia lyase, cinnamate 4-hydroxylase, 4-coumarate-CoA ligase, hydroxycinnamoyl CoA: shikimate/quinate hydroxycinnamoyltransferase, caffeoyl-CoA O-methyltransferase, isoeugenol synthase, and dirigent protein oxidase . Therefore, the present 454 dataset is valuable not only in the exploration of genes involved in SS biosynthesis, but also for the discovery of genes involved in other bioactive secondary metabolites derived from the genus of Bupleurum. Additionally, the agronomical traits of B. chinense, such as drought resistance, have been investigated [71, 72]. In our 454 dataset, 2, 933 and 3, 280 unique sequences were annotated as related to responses to abiotic or biotic stimulus and to stress, respectively. These annotations were based on the GO terms. These sequence data may be beneficial to further molecular studies on the stress response of B. chinense. Further, a total of 415 and 209 unique sequences were annotated with transcription factor activity and signal transduction, respectively. Some of these sequences may play roles in regulating SS metabolism and the stress response. These unique sequences deserve to be cloned and functionally analyzed in future studies.
Currently the 454 pyrosequencing technology is considered as a rapid and economical method to generate high-quantity sequence data. Although a large number of 454 reads were obtained by a quarter run in the present study, nearly a quarter of the ESTs from the Sanger-sequenced 3, 111 clones from our previous cDNA library were not sequenced. The different cDNA libraries (the Sanger sequenced root cDNA library and the 454 sequenced combined cDNA library with roots, seeds, and seedlings) and the fact that only the 5' end of the cDNA was sequenced in the Sanger sequencing may explain, to some extent, this difference. In some reports that compared 454 pyrosequencing and traditional Sanger sequencing, bias was found because of differences in the two sequencing methods . Combinations of these two methods have been used in some studies: (1) to generate a high number of good-quality ESTs with improved clustering analysis and with more full-length sequences ; (2) to obtain a less biased method for the identification and diversity analysis of microbes and fungi [74, 75]; and (3) to assemble genome sequences . Recently, Radix bupleuri has aroused global interest, especially in Europe [review in ]. However, studies on the molecular biology of Bupleurum are still limited. More transcriptome data will facilitate a deeper understanding and enable the rapid development of Radix bupleuri applications.