Identification and functional analysis of cytochrome P450 complement in Streptomyces virginiae IBL14

Background As well known, both natural and synthetic steroidal compounds are powerful endocrine disrupting compounds (EDCs) which can cause reproductive toxicity and affect cellular development in mammals and thus are generally regarded as serious contributors to water pollution. Streptomyces virginiae IBL14 is an effective degradative strain for many steroidal compounds and can also catalyze the C25 hydroxylation of diosgenin, the first-ever biotransformation found on the F-ring of diosgenin. Results To completely elucidate the hydroxylation function of cytochrome P450 genes (CYPs) found during biotransformation of steroids by S. virginiae IBL14, the whole genome sequencing of this strain was carried out via 454 Sequencing Systems. The analytical results of BLASTP showed that the strain IBL14 contains 33 CYPs, 7 ferredoxins and 3 ferredoxin reductases in its 8.0 Mb linear chromosome. CYPs from S. virginiae IBL14 are phylogenetically closed to those of Streptomyces sp. Mg1 and Streptomyces sp. C. One new subfamily was found as per the fact that the CYP Svu001 in S. virginiae IBL14 shares 66% identity only to that (ZP_05001937, protein identifer) from Streptomyces sp. Mg1. Further analysis showed that among all of the 33 CYPs in S. virginiae IBL14, three CYPs are clustered with ferredoxins, one with ferredoxin and ferredoxin reductase and three CYPs with ATP/GTP binding proteins, four CYPs arranged with transcriptional regulatory genes and one CYP located on the upstream of an ATP-binding protein and transcriptional regulators as well as four CYPs associated with other functional genes involved in secondary metabolism and degradation. Conclusions These characteristics found in CYPs from S. virginiae IBL14 show that the EXXR motif in the K-helix is not absolutely conserved in CYP157 family and I-helix not absolutely essential for the CYP structure, too. Experimental results showed that both CYP Svh01 and CYP Svu022 are two hydroxylases, capable of bioconverting diosgenone into isonuatigenone and β-estradiol into estriol, respectively.


Background
Cytochrome P450 (CYP) genes refer to such genes that encode a superfamily of iron-containing hemoproteins with a maximum absorption spectrum near 450 nm, often characterized by conserved Cys residue in hydrophobic pocket(s) [1]. Most of the ORFs of CYP have three distinct characteristics used often for their identification and analysis, i.e., the I-helix of putative CYPs (a highly conserved threonine involved in oxygen activation), the conserved EXXR motif located in the K-helix and the cytochrome P450 cysteine heme-iron ligand signature motif (GXXXCXG, there are exceptions) [2]. According to a widely-accepted taxonomy, CYPs within a family share more than 40% amino acid identity and members of subfamilies share more than 55% amino acid identity [3]. Occasionally, the decision to accept a sequence in a known family depends greatly on how it clusters on a tree, not so much on the absolute amino acid identity [4].
CYPs have been confirmed existing in all eukaryotic (human, animals, plants, fungi, etc.) and prokaryotic organisms (bacteria, archaea, and even in viruse) [5][6][7][8]. They often are monooxygenases involved in oxidation of a range of endogenous compounds, such as cholesterol, lipids and steroidal hormones, as well as xenobiotics such as drugs and toxic chemicals in environment [9][10][11]. CYPs catalyse diverse reactions, including C-H hydroxylation, epoxidation, hetero-atom oxidation, aromatic ring oxidation and dealkylation [11][12][13]. In the catalytic reaction process of P450 monooxygenase, one atom of O 2 is inserted into substrate while the other is reduced to H 2 O. CYP genes responsible for secondary metabolism are often laid in antibiotic biosynthetic gene clusters to catalyze stereo-and region-specific reaction of substrates to related derivatives.
The biotransforming capabilities of bacterial CYPs have been widely elucidated. P450soy (CYP105D1) from Streptomyces griseus was involved in the degradation of a diverse array of complex agrochemicals and environmental pollutants [14]. CYP105C1 from Actinomycete spp. had the ability to transform benanomicin A into two derivatives, 10-hydroxybenanomicin A and 11-O -demethylbenanomicin [15]. The functions of related CYP107 family members have been reported. CYP107E from Micromonospora griseorubida was found to govern the hydroxylation and epoxidization in mycinamicin biosynthesis [16], P450 Terf (107 L) from Streptomyces platensis to catalyze hydroxylation of terfenadine [17] and hydroxylase PikC (107 L1) of Streptomyces venezuelae to convert narbomycin to picromycin [18]. CYP124 of Mycobacterium tuberculosis demonstrated omega-hydroxylase activity of relevant methyl-branched lipids [19]. YbdT (CYP152A) of Bacillus subtilis was involved in fatty acid beta-hydroxylation [20]. CYP154 of Nocardia farcinica IFM10152 had the functions of the O-dealkylation and ortho-hydroxylation of formononetin [21] and 154H1 from Clostridium acetobutylicum performed biocatalytic reactions with different aliphatic and aromatic substrates [22].
Genome sequencing is an effective way to predict and annotate all the possible CYPs genes in an organism. Streptomyces coelicolor A3 (2), a typical strain which is often used for the study of physiological function and antibiotic production, is the first Streptomyces species sequenced in 2001. Its linear chromosome is 8.7 Mb [23] which contains 7825 open reading frames (ORFs) with 18 putative CYPs [24]. S. avermitilis, known for producing the antiparasitic agent avermectin, contains 7600 ORFs with 33 putative CYPs in the 9 Mb chromosomes [25,26]. The genome of Streptomyces peucetius ATCC27592 with the size of 8.7 Mb contains 19 putative CYPs [27].
S. virginiae IBL14, isolated from activated sludge for treatment of waste from a steroidal drug factory, is an effective degradative strain of various steroidal compounds, including progesterone, isotestosterone, dihydrotestosterone, hydrocortisone, cholesterol and ostrone [28]. To comprehensively understand the function of CYPs of S. virginiae IBL14 in degradation and biotransformation of diosgenin, the whole genome sequencing of S. virginiae IBL14 isolated by our lab was carried out for the first time. Using in silico technology, we predict and annotate all of the putative CYPs of S. virginiae IBL14 and analyze these CYPs evolutionarily and functionally via comparison with those of other Streptomyces species. Furthermore, functions and characteristics of CYP genes svh01 and CYP svu022 in this strain are experimentally identified and analyzed.

Results and discussion
Genome sequencing and CYPs in S. virginiae IBL14 By in silico analysis of newly-sequenced S. virginiae IBL14 8.0 Mb genome, 8288 ORFs are identified and the total GC content exceeds 70%. The annotated results via Rpsblast display that there are a total of 33 putative CYPs in the genome of this strain IBL14, contributing to approximately 0.4% of all the coding sequences. The number of CYPs is identical to that in S. avermitili and almost two times as that in S. coelicolor A3(2) and S. peucetius ATCC27952 (18 and 19 CYPs, respectively). Such high level of CYP diversity suggests the high diversity of the secondary metabolism pathways in S. virginiae IBL14.
Features of CYPs from S. virginiae IBL14 Table 2 displays the three characteristic motifs of CYPs of S. virginiae IBL14. The critical residues are highlighted with bold fonts, which are threonine (T) in GXXTT motif of I-helix, glutamic acid (E) and arginine (R) in EXXR motif of K-helix and cysteine (C) in the GXXXCXG heme-binding domain signature, respectively.
From the Table 2, we can find the I-helix is absent in Svu001(new family), and the I-helix and K-helix missing in Svu002 (105 L, often for hydroxylation activity) [29], which reflects I-helix is not absolutely essential for the CYP structure. The 2 members of CYP157 family Svu023 (E 276 VLW 279 )/157A and Svu024 (E 284 QILW 288 )/ 157C do not have arginine residue in K-helix like the CYP157C1 from S. coelicolor A3(2) having a motif E 297 QSLW [30] and the CYP157A2 and CYP157C2 from S. avermitilis exhibiting a 257 EVLW motif and a 257 EQSLW motif [26]. The CYP157 family proteins that lack consensus EXXR motifs but genetically are linked to their upstream conservons imply that they have functions linked to the upstream pathway(s) [30]. Besides, Svu002, Svu018, Svu021, Svu023 and Svu031 do not strictly follow the GXXXCXG motif of heme-binding.

Multiple alignments and phylogenetic analysis
The phylogenetic tree of the combined CYPs of S. virginiae IBL14, S. avermitilis MA-4680, S. venezuelae ATCC 10712 and Streptomyces sp. Mg1 is presented in Figure 1. From Figure 1 Figure 1, respectively) have more closely evolutionary relationship. Further, the paralogous relationship of the 33 CYPs in S. virginiae IBL14 was generated with the neighborjoining methods (Clustal W and MEGA 5.0). From Figure 2, we can find that svh01 and svu03 and svu04 as well as svu022 and svu005 in S. virginiae IBL14 have the closest homologous evolutionary relationship, respectively. It's worth noting that most members belonging to the same CYP family are clustered together as expected, e.g., the 11 members of CYP107 family.

The prediction of functions of CYPs in S. virginiae IBL14
A high identity over 70% among different protein sequences reasonably suggests that they may hold similar function [26]. As shown in the Table 1, we can find a sum of 26 CYP sequences of S. virginiae IBL14 have best matches to those of other Streptomyces, which are helpful in function prediction. CYP105 and CYP107 are the most studied bacterial cytochromes which are associated with the degradation and biotransformation of a diverse array of xenobiotics and antibiotic biosynthesis. Analysis of CYPs sequence of S. virginiae IBL14 shows that there are 11 CYPs belonging to CYP107, five to CYP105, four to CYP197, three to CYP191, two to CYP157 and one to each other family, which indicates the diversity and importance of the two groups CYP105 and CYP107. The predicted functions of several putative CYPs in S. virginiae IBL14, combined with reported experimental evidences, were listed in Table 3.

CYPs in S. virginiae IBL14 and their ferredoxin reductase and ferredoxin
The catalytic activity of CYPs depends greatly on individual ferredoxin or/and ferredoxin reductase associated with. It was reported that there are three, six and four ferredoxin reductase genes and six, nine and two ferredoxin genes in S. coelicolor A3 (2), S. avermitilis and S. peucetius, respectively. In S. coelicolor A3 (2) only CYP105D5 is arranged in an operon with a ferredoxin gene [24]. In S. peucetius CYP147F is clustered with ferredoxin reductase [27]. In S. avermitilis both CYP105P1 and CYP105D6 are clustered with ferredoxin, CYP147B1 is arranged in an operon with a ferredoxin and ferredoxin reductase, CYP105Q1 is associated in an operon containing both a ferredoxin and ferredoxin reductase, and CYP102 is fused to a P450 reductase [26].
Three ferredoxin reductase genes and seven ferredoxin genes are found in S. virginiae IBL14 after annotation of S. virginiae IBL14 genome. That is, the activities of many of the CYPs in S. virginiae IBL14 are supported by different combinations with the three ferredoxin reductases and seven ferredoxins. Also in S. virginiae IBL14, svu005 (CYP105D), svh01 (CYP105C) and svu019 (CYP124B) is found to cluster with ferredoxin svf03, svf09 and svf07, respectively and svu020 (CYP147A) clustered with ferredoxin reductase svfr03 and ferredoxins svf06. The facts suggest that the functional realization of CYPs  S v u 0 2 7 S s m C Y P 1 9 1 A S a m C Y P 1 2 5 A 2 S v a C Y P 1 9 1 A S v u 0 1 9 S v a C Y P 1 6  Svu005, Svh01, Svu019 and Svu020 needs the participation of electron transfer. The result of homology analysis by Blast-searching the Genbank are listed in the Table 4.

Regulatory elements and functional genes clustered with CYPs
The CYPs in S. peucetius ATCC27952 clustered with regulatory elements were reported [27]. In the annotations of gene arrangement around the putative CYPs on the S. virginiae IBL14 chromosome, svu022, svu023 and svu024 were found to cluster with the genes of ATP/GTP binding proteins (having a phosphate-binding loop for energy requiring metabolic reactions) [34], svu001, svu015 to cluster with LysR-family transcriptional regulator (regulating a diverse set of genes, including those involved in virulence, metabolism, quorum sensing and motility) [35], svu011 to cluster with two component transcriptional regulators and LuxR family (quorum sensing signals in Gram-negative bacteria often regulated by acylated homoserine lactones) [36], svu018 to cluster with a transcriptional regulator, AraC family (transcriptional regulators having diverse functions ranging from carbon metabolism to stress responses to virulence) [37] and two component transcriptional regulators, LuxR family and svu020 to cluster with the ATP-binding protein fbpC and TetR-family transcriptional regulators (among bacteria with an HTH DNA-  binding motif for the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity) [38].
As described above, the CYPs in S. virginiae IBL14 chromosome are responsible for the transcriptional regulation of many functional genes related with primary, secondary metabolism, as well as the responses to environmental factors as expected. Besides, CYPs are clustered with other functional genes. svh01 is adjacent to the genes of MdlB, ABC-type multidrug transport system, ATPase and permease components, which may be involved in the transportation of substrates [39]. svu009 lies next to alcohol dehydrogenase, suggesting that svu009 may take part in alcohol bioconversion and biodegradation. svu013 is next to 4, 5-DOPA dioxygenase which is a member of the class III extradiol dioxygenase family (a group of enzymes which use a non-heme Fe (II) to cleave aromatic rings between a hydroxylated carbon and an adjacent nonhydroxylated carbon), suggesting that the combination of svu013 and 4, 5-DOPA dioxygenase may be responsible in biodegradation of substrates with aromatic rings. svu026 is adjacent to MbtH-like protein which is found in known antibiotic synthesis gene clusters [40]. The cholesterol oxidase ChoL from S. virginiae IBL14 in the bioconversion and biodegradation of diosgenin responsible for the conversion of diosgenin to diosgenone (a 4-ene-3-keto steroid) via a couple of C3-dehydrogenation and C4-5 -isomerization was reported [41]. In S. virginiae IBL14 the

Functional identification and characteristics of svh01 and svu022
To elucidate all putative CYPs' functions in S. virginiae IBL14, four CYP genes of the strain IBL14 were firstly selected. Among them, the functional identities of CYP genes Svh01(105C1) and svu022 (154H) has been finished. The cytochrome P450 Svh01 (responsible for the C25hydroxylation of diosgenin) [32] belongs to the class I (prokaryotic/mitochondrial) P450 system based on a taxonomic split, in which electrons are transferred from NADPH or NADH to ferredoxin reductase and ferredoxin. Sequence analysis revealed the complete sequence of svh01 with ATG as the start codon has 70% G + C content. The sequence of possible ribosome-binding site is located on the upstream of svf09 (a coenzyme of Svh01).
Both svh01 and svf09 contain 1200 bp and 243 bp, respectively, based on sequence analysis. To obtain the expressed products of them, both svh01 and svf09 sequences were first ligated into a pET22b vector in a cluster to generate the expression plasmid pET22b-svh01-svf09 that was then cloned into E. coli JM109 (DE3) to form a recombinant strain E. coli IBL161 [JM109 (DE3)/ pET22b-svh01-svf09]. The PCR results of svh01 and svf09 from the recombinant strain E. coli IBL161 were analyzed by gel electrophoresis (Figure 3A and B) and also confirmed by gene sequencing.
The svu022 with a G + C content of 73% (clustering with the gene of ATP/GTP binding protein) consists of 1239 nucleotides. Similarly, the complete sequence of svu022 was first inserted to the shuttle plasmid pHCMC05 to form the recombinant plasmid pHCMC05-svu022, and then cloned in B. subtilis WB800N (improving the extracellular expression level of Svu022 for the analysis of enzymatic biotransformation) to produce the recombinant strain B. subtilis IBL 241 [WB800N/pHCMC05-svu022]. The PCR result of svu022 from the recombinant strain B. subtilis IBL 241 is shown in Figure 3C. Svh01 (105C1) is a peptide of 399 amino acids, with a molecular weight of 44.04 kDa and a pI value of 4.97 estimated by the ExPASy (a computing pI/MW tool). To obtain its expressed product and study product characteristics, the recombinant strain E. coli IBL161 was incubated and induced. The expression of Svh01 was shown in Figure 4A. From the SDS-PAGE, we can find that the two distinctly additional protein bands should be Svh01 with an about MW of 44 kDa and Svf09 with an about MW of 8.0 kDa, respectively. The further functional identification of the Svh01/FcpC of S. virginiae IBL14, hydroxylating the C25-tertiary carbon of diosgenin to form isonuatigenone, was experimentally confirmed [32].
Svu022 (154H) is a deduced protein of 412 amino acids which shares 91% identity with that in Streptomyces sp. Mg1. The estimation of MW and pI of SVU022 are 44.59 kDa and 5.00, respectively. Similarly, the recombinant strain B. subtilis IBL 241 was incubated and induced to study the product expression and its characteristics. The expressed result of Svu022 from the recombinant strain B. subtilis IBL 241 was shown in Figure 4B. The SDS-PAGE displays a distinct protein band with about MW of 45.0 kDa as expected. The further experimental results from TLC, HPLC and LC/MS indicated that the CYP Svu022 enables to biotransform β-estradiol into estriol. Figure 5 shows the profiles of the biotransformation of β-estradiol by strains B. subtilis WB800N and B. subtilis IBL 241 in HPLC. The functional identification of the Svu005 (CYP105D) and Svu019 (CYP124B) is in progress.  IBL14 belong to the CYP107 (11 members) family and CYP105 (5 members) family. Compared phylogenetically with CYPs from 3 typical Streptomycete spp., S. virginiae IBL14 appears to be closest to those of Streptomyces sp. Mg1.
Further analysis showed that among all of the 33 CYPs in S. virginiae IBL14, three CYPs are clustered with ferredoxins, one with ferredoxin and ferredoxin reductase and three CYPs with ATP/GTP binding proteins, four CYPs arranged with transcriptional regulatory genes and one CYP locates on the upper of ATP-binding protein and transcriptional regulators as well as four CYPs associated with other functional genes involved in secondary metabolism and degradation.
The new characteristics found in CYPs from S. virginiae IBL14 suggest that the EXXR motif in the K-helix is not absolutely conserved in CYP157 family as reported [30] and I-helix not absolutely essential for the CYP structure. Particularly, one new family was found based on the CYP svu001 in S. virginiae IBL14 which shares 66% identity only to that from Streptomyces sp. Mg1.

Strains and plasmids
S. virginiae IBL-14 (CCTCCM 206045) [42] as the strain of interest was used for the Cytochrome P450 gene identification and functional analysis. E. coli JM109, JM109 (DE3) and B. subtilis WB800N were used as the host for plasmid construction and target protein expression in the functional identification of the CYPs, respectively. The vector pET22b was used for cloning and expression of genes of interest in E. coli. The shuttle plasmid pHCMC05 was used for the expression of target proteins in B. subtilis (a GRAS strain by FDA). The features of the bacterial strains and plasmids used in this study are listed in Table 5.

Sequencing and in-silico identification analyses of CYPs
The S. virginiae IBL14 genome sequencing was performed at 454 platform (Encode Genomics Co. Ltd., Suzhou, China) for the first time (sequence data will be published step by step). All of the ORFs of this genome were predicted using glimmer3.0 and prodigal, respectively. To dig out all possible CYP gene function information in S. virginiae IBL14, the genome sequence of the strain was compared with the SWISSPROT, TrEMBL, KEGG databases by using Blastp and the CDD and COG databases by using Rpsblast, respectively.
The deduced amino acid sequences of the putative CYPs of S. virginiae IBL14 were aligned with the CYPs from S. avermitilis MA-4680, S. venezuelae ATCC 10712 and Streptomyces sp. Mg1 by using ClustalW [43]. Then the molecular evolution and phylogenetic analyses by neighbor-joining methods were carried out using MEGA5.0 [44]. To forecast the possible functions involved in secondary metabolism, comparison between all putative CYPs of S. virginiae IBL14 with those in other organisms based on homologues was done by using Blastp too.
Using the three motifs as described above as criteria, the CYP gene candidates of S. virginiae IBL14 were blast searched against GenBank non-redundant protein database to identify their closest bacterial homologues and tentatively distribute all of the CYPs of S. virginiae IBL14 into the corresponding family or subfamily [26]. Similar procedure was performed to the putative ferredoxin and ferredoxin reductase genes to identify their closest bacterial homologues.

Construction and cloning of expression plasmids
The genes of svh01, svf09 and svu022 from the genomic DNA of S. virginiae IBL14 were amplified by using PCR method (Pfu DNA Polymerase, Fermentas, Thermo Fisher Scientific Inc.) and the primers used are listed in Table 6. The PCR products of svh01 and svf09 were digested with NdeI/EcoRIand EcoRI/Hind III, respectively, then ligated  After adding 0.2 mM IPTG in logarithmic growth phase, the culture was continuously cultivated for another 48 h at the same conditions. The harvested recombinant cells were resuspended and subjected to ultrasonication in 50 mM PBS (pH 7.4), and then centrifuged at 6000 rpm for 5 min. The supernatant was analysed by SDS-PAGE.

Biotransformation and product extraction
One milliliter of β-estradiol/diosgenin (a final concentration of 0.2 mg/ml) for each flask was added for biotransformation analysis after E. coli IBL161 was induced by IPTG at 25°C for 2 h. After cultivated for another 24 h under the same conditions, the cultures were extracted two times with a half volume of 100% ethyl acetate (Sinopharm Chemical Reagent Co., Ltd). The extracts were evaporated to dryness, then re-dissolved in 1 ml anhydrous ethanol, and finally detected and analyzed (thin layer chromatography/TLC, high performance liquid chromatography/HPLC and liquid chromatography-mass spectrometry LC-MS).

DNA and protein analytical methods
DNA electrophoresis for recombinant plasmid analysis was carried out in agarose gels at 110 V for 30 min [45]. SDS-PAGE with a 15% (w/v) acrylamide gel for expressed protein analysis was run at 110 V for 2 h according to Schagger's publication [46]. The bands were visualized by Coomassie R-250 staining.

HPLC analysis of biotransformation products
To identify and analyze the metabolites, high performance liquid chromatography (HPLC) was carried out. Simply, the sample of 10 μl was first loaded onto 250 mm Symmetry C 18 (4.6 mm × 250 mm, Waters Co., USA) and eluted with ethanol/water (60/40, v/v). The flow rate, the wavelength for UV-detection and the temperature of the column on the HPLC system (Breeze 1525 series, Waters Co., USA) were set at 1 ml/min, 245 nm and 35°C, respectively. The products after biotransformation were qualitatively and quantitatively analyzed by comparing with corresponding standard material.