Skip to main content

Contrasted evolutionary constraints on secreted and non-secreted proteomes of selected Actinobacteria



Actinobacteria have adapted to contrasted ecological niches such as the soil, and among others to plants or animals as pathogens or symbionts. Mycobacterium genus contains mostly pathogens that cause a variety of mammalian diseases, among which the well-known leprosy and tuberculosis, it also has saprophytic relatives. Streptomyces genus is mostly a soil microbe known for its secondary metabolites, it contains also plant pathogens, animal pathogens and symbionts. Frankia, a nitrogen-fixing actinobacterium establishes a root symbiosis with dicotyledonous pionneer plants. Pathogens and symbionts live inside eukaryotic cells and tissues and interact with their cellular environment through secreted proteins and effectors transported through transmembrane systems; nevertheless they also need to avoid triggering host defense reactions. A comparative genome analysis of the secretomes of symbionts and pathogens allows a thorough investigation of selective pressures shaping their evolution. In the present study, the rates of silent mutations to non-silent mutations in secretory proteins were assessed in different strains of Frankia, Streptomyces and Mycobacterium, of which several genomes have recently become publicly available.


It was found that secreted proteins as a whole have a stronger purifying evolutionary rate (non-synonymous to synonymous substitutions or Ka/Ks ratio) than the non-secretory proteins in most of the studied genomes. This difference becomes statistically significant in cases involving obligate symbionts and pathogens. Amongst the Frankia, secretomes of symbiotic strains were found to have undergone evolutionary trends different from those of the mainly saprophytic strains. Even within the secretory proteins, the signal peptide part has a higher Ka/Ks ratio than the mature part. Two contrasting trends were noticed amongst the Frankia genomes regarding the relation between selection strength (i.e. Ka/Ks ratio) and the codon adaptation index (CAI), a predictor of the expression rate, in all the genes belonging to the core genome as well as the core secretory protein genes. The genomes of pathogenic Mycobacterium and Streptomyces also had reduced secretomes relative to saprophytes, as well as in general significant pairwise Ka/Ks ratios in their secretomes.


In marginally free-living facultative symbionts or pathogenic organisms under consideration, secretory protein genes as a whole evolve at a faster rate than the rest and this process may be an adaptive life-strategy to counter the host selection pressure. The higher evolutionary rate of signal peptide part compared to mature protein provides an indication that signal peptide parts may be under relaxed purifying selection, indicative of the signal peptides not being secreted into host cells. Codon usage analysis suggests that in actinobacterial strains under host selection pressure such as symbiotic Frankia, ACN, FD and the pathogenic Mycobacterium, codon usage bias was negatively correlated to the selective pressure exerted on the secretory protein genes.


Frankia is a taxon comprising nitrogen-fixing actinobacteria that establish a root symbiosis with dicotyledonous plants belonging to 8 plant families [1]. The phylogeny of these bacteria as determined by classic 16S rRNA sequence analysis clusters them into 4 clusters [2]. The genus Frankia appears to have emerged from a group of soil and rhizosphere actinobacterial genera [3], many of which are extremophiles such as the thermophilic Acidothermus[2], the gamma-radiation resistant Geodermatophilus[4] or the compost-inhabiting Antarctic-dwelling Sporichthya[5]. Frankia exhibits only a few distinct morpho-physiological features including a distinctive wall sugar [6] and unique specialized structures (termed vesicles) that are surrounded by an envelope containing oxygen-impermeable bacteriohopanetetrol, which serves to protect nitrogenase [7].

Little is known about the nature of the symbiotic determinants involved in the actinorhizal symbiosis. On the host plant-side, the SymRK gene, a transmembrane kinase, has been identified and shown to control development of both actinorhizal nodulation and the mycorhizal infection processes [8]. Furthermore, the actinorhizal host plants, Alnus and Casuarina, have homologs of this whole symbiotic cascade [9]. On the bacterial-side of the symbiosis, the absence of a well-established reliable genetic system has hindered attempts to identify essential genes involved in the process. However, sequence analysis of three Frankia genomes representing contrasting host specificity ranges failed to reveal the presence of genes homologous to the Rhizobium nod genes [10] or symbiotic islands [11]. These genomes were found to have undergone contrasted evolutionary pressures resulting in marked differences in their size, transposase content and loss-gain of several determinants. Relative to their saprophytic neighbors, Frankia genomes have a reduced number of secreted proteins [12] although these predictions have not been consistently confirmed experimentally [13]. This is evocative of a genome-wide strategy to keep a chemical “low-profile” inside host plant cells.

Streptomyces genus is emblematic of soil microbes with its rich array of secondary metabolites that have been exploited for a long time with many powerful drugs ever since streptomycin was characterized [14]. Most Streptomyces species are described as saprophytes except for a few such as the potato pathogens S. scabiei and related species [15] and the human pathogen S. somaliensis[16]. There have also been a number of strains recently described as symbionts or commensals but the exact nature of their interaction with their host is still not clear.

Mycobacterium genus is better known for its two terrible disease-causing species, M. tuberculosis, agent of tuberculosis [17] and M. leprae, agent of leprosy [18] as well as a few less known ones such as M. ulcerans. Beside these pathogens, there is a number of saprophytic species such as the pyrene-degrading soil M. vanbaalenii[19], or the commensal/environmental M. smegmatis[20].

Intracellular bacteria interact in an intimate fashion with host cells, thus facing a paradoxical challenge. Their interactions with their cellular environment through secreted proteins and effectors transported through transmembrane systems may trigger a host defense response that they would then need to fight off. Host cells have elaborate sensing systems to detect motifs that are specific for different classes of pathogens and subsequently trigger defense reactions including synthesis of cysteine-rich defensins, oxygen radicals, or toxic aromatics, which would be detrimental to symbionts [21]. Pathogenic microbes have thus evolved in close interaction with their hosts, in a gene-for-gene pattern that effectively restricts the pathogen to a subset of hosts and modulates genetic diversity as a function of host resistance [22]. For certain lineages, Frankia has been shown, to have coevolved with its host plants [23], dramatically altering its transcriptome upon symbiosis onset [24], and is thus expected to have underwent pressures at the level of gene composition. One way of monitoring evolutionary pressures on genomes is to follow rates of silent and non-silent mutations [25]. For pathogens, both diversifying (positive) selection [26, 27] and purifying (negative) selection [28, 29] have been reported. The situation in symbionts has not been extensively studied, except for few brief reports on Wolbachia or Rhizobium[30, 31]. We undertook this investigation on genomes of three important but diverse genera of Actinobacteria to analyze the selection pressures working on them and have also looked into the evolutionary rate of secreted proteins to assess their biochemical adaptations to the environment. The genera include Frankia, a predominantly plant symbiont, Streptomyces, a group of soil-dwelling mostly free-living actinobacteria with a few pathogens, and Mycobacterium, which contains both free-living and pathogens.

Results and discussion

Background of the strains chosen for the analysis

Candidatus Frankia datiscae (FD) is a non-isolated symbiont that forms effective nodules in Rosaceae, Coriariaceae, Datiscaceae [2]. Hundreds of attempts at isolation of the bacteria in pure culture have failed [32] and these strains are thus considered by many as obligate symbionts [33]. The extent of cospeciation is unknown because cross-inoculation assays have yielded conflicting results. Frankia CcI3 (CcI) can be isolated and grown in defined media [34], however it belongs to a homogenous clade that in general is difficult to isolate and culture [2]. Historically, several attempts to isolate the Casuarina microsymbionts failed or else yielded atypical strains later found to belong to cluster 3 that could not fulfill Koch’s postulates [35]. On the other hand, Frankia alni ACN14a (ACN) can be isolated in pure culture and can nodulate Alnus and Myricaceae [36]. It is abundant in soils devoid of host plants and will grow well in the rhizosphere of Betula, a close relative of Alnus[37]. The two Elaeagnus isolates, EAN1pec (EAN) and EuI1c (EuI) grow well and rapidly in pure culture, are abundant in soils without host plants, grow rapidly in pure culture and have the most extensive host range that includes Elaeagnaceae, Myricaceae, Casuarinaceae (Gymnostoma), Rhamnaceae as well as Datiscaceae and Coriariaceae where they are present as co-inoculants. The range of substrates on which these strains grow is more extensive than that of other groups [1]. Based on the above information and the genome size, we have divided Frankia strains into 3 major groups as Group A: Predominantly free-living Facultative symbiont; Group B: Partly free-living Facultative symbiont and Group C: Marginally free-living or obligate symbiont (Table 1).

Table 1 Grouping of Frankia strains based on characteristic features

Mycobacterium species include both pathogenic as well as non-pathogenic ones. Pathogenic species include M. leprae TN, M. tuberculosis CDC1551 and M. ulcerans Agy99. Non-pathogenic strains include M. smegmatis MC2 155 and M. vanbaaleni PYR-1, both of them fast growing Mycobacterium that exist as saprophytes in the environment (Table 2).

Table 2 Grouping of Mycobacterium species based on characteristic features

Streptomyces species considered in the analysis are S. coelicolor A3 (2), S. avermitilis MA-4680, S. griseus NBRC 13350, S. scabiei 87.22 and S. somaliensis DSM. The first three are soil-dwelling saprophytes which are grown in chemostat cultures for the industrial production of various secondary metabolites including a wide range of antibiotics. The last two are either pathogens of a plant (S. scabiei 87.22) or of animals (S. somaliensis DSM 40738) (Table 3). In all the three Tables (1, 2 and 3), (√) denotes present; (≠) denotes absent and (-) denotes not known.

Table 3 Grouping of Streptomyces species based on characteristic features

Core genome

For the five Frankia genomes examined, a core Frankia genome of 982 genes was identified. Since the Frankia EuI genome was devoid of any nif genes, the core genome did not include them. This Frankia strain will induce nodule formation on its host plant, Elaeagnus umbellata, but produces ineffective nodules that are unable to fix nitrogen [38]. Amongst the Mycobacterium genomes, 665 genes were identified as belonging to their core genomes. Since the Mycobacterium leprae genome is undergoing reductive evolution, its inclusion in the analysis may have resulted in a considerable decrease in the number of genes in the core genome for the Mycobacterium. The five genomes of Streptomyces contain 1304 genes in the core genome. Table 4 shows the Average Ka/Ks values of all of the gene orthologs belonging to the core genome.

Table 4 Average Ka/Ks value of all the orthologous genes belonging to the core genome

The silent mutation rate (Ks) of all Frankia strains was found to range from 6.458 substitution/site between ACN and CcI to 39.412 between EuI and Ean, evocative of saturation. The non-silent rate (Ka) was much lower, ranging between 0.092 substitution/site between ACN and CcI to 0.205 or twice as much between FD and EuI. The Ka/Ks fluctuated in a narrow range of 0.029-0.047, a very low value indicative of a strongly purifying selection, lower than that seen in the pol gene of the bovine immunodeficiency virus [39]. This greater than 20-fold difference in mutation rates also illustrates why protein-based phylogenies are better for reconstructing distant relationships than DNA-based ones.

The trends are also more or less similar in Mycobacterium. The silent mutation rate in Mycobacterium ranges from 2.155 between M. tuberculosis and M. leprae to 28.61 between M. tuberculosis and M. smegmatis. The non-silent rate ranges between 0.097 between M. tuberculosis and M. ulcerans and 0.196 between M. vanbaalenii and M. leprae. The silent mutation rate of Mycobacterium is thus in general much higher than that of Frankia while the non-silent rates are comparable between the two taxa. The Ka/Ks fluctuated in a range of 0.026 to 0.089, larger than in Frankia.

The silent mutation rate in Streptomyces ranges from 7.197 between S. scabiei and S. avermitilis to 28.933 between S. somaliensis and S. avermitilis. The non-silent rate ranges from 0.098 between S. scabiei and S. avermitilis and 0.156 between S. somaliensis and S. coelicolor. The Ka/Ks fluctuated in a range of 0.035 to 0.057, smaller than in Mycobacterium and comparable to that in Frankia.

The core secretome (Additional file 1: Table S1) of Frankia is represented by 69–89 genes with the nitrogen-fixing symbiotic strains having between 69 and 79 while the non-efficient cluster 4 EuI has 89 genes. The COG categories (besides the poorly defined “R” and “S”) that were mostly represented in the Frankia core secretome were “M” (Cell wall/membrane/envelope biogenesis), E (Amino acid transport and metabolism), O (Posttranslational modification, protein turnover, chaperones) and U (Intracellular trafficking, secretion, and vesicular transport). The categories that varied the most between the symbiotic strains and the more saprophytic ones were M and V (Defense mechanisms). Mycobacterium had a smaller core secretome of 31–40 genes with the pathogenic M. leprae and M. tuberculosis having the smallest number of genes. The COG categories that were most abundant were M, E and C (Energy production and conversion). Streptomyces had the largest core secretome of the three genera with 72–89 genes. The COG categories that were most represented were M, E, P (Inorganic ion transport and metabolism) and T (Signal transduction mechanisms). A correspondence analysis shows those strains that interact closely with eukaryotic hosts have their secretome positioned close to one another (FD and MT) and away from the more saprophytic strains (Figure 1). Curves joining the genomes as a whole to the secretomes were horizontal in the case of the FD and MT genomes while they were more vertical in the other cases.

Figure 1
figure 1

Factorial correspondence analysis of protein coding genes (all), core genome (cor) and secretory proteins genes in the core genome (sec) for Frankia EAN1pec (FE), Frankia DG (FD), S. coelicolor (SC), S. scabiei (SS), M. tuberculosis (MT), and M. vanbaaleni (MV), within functional COG groups. The horizontal axis explains 37.7% of the total inertia and the second one 28.2%.

Secretory proteins evolve faster than non-secretory proteins

The non-synonymous mutation rate (Ka) of secretory proteins was found to be higher than that of the non-secretory proteins except in one pair (Frankia CcI/EuI) where it was equal. The Ka/Ks ratio reflects the rate of adaptive evolution against the background rate. This parameter has been widely studied in the analysis of adaptive molecular evolution, and is regarded as a general method of measuring the rate of sequence evolution. To assess the intensity of mutational constraints, we have considered all of the genes belonging to the core genome for all studied strains of Frankia, Mycobacterium and Streptomyces. When these core genes of all Frankia, Mycobacterium and Streptomyces genomes were studied in all possible pairwise combinations separately for each genus for evolutionary rate analysis, we did find statistically significant differences in Ka/Ks ratios between the secretory and non-secretory protein genes (Mann–Whitney U test significance at P < 0.001 level) in Frankia ACN/CcI pair and Frankia CcI/FD. Complete list of Signal peptide bearing genes belonging to the core genome of Frankia along with their annotation is provided in Additional file 1: Table S1. For the other Frankia cases, the differences were not significant. A similar analysis of the Mycobacterium genomes showed significant differences with the M. tuberculosis/M. leprae and in M. tuberculosis/M. ulcerans pairings, while in Streptomyces genomes significant differences with S. coelicolor/S. scabies and S. avermitilis/ S. scabies pairings were found. Interestingly, all of the Frankia and Mycobacterium and some of the Streptomyces genomes, which showed significant evolutionary rate differences between secretory and non-secretory protein genes, were either pathogenic, marginally free-living facultative symbiont or at least partly free-living facultative (for grouping refer to Table 1, 2 and 3). This observation prompted us to study these genomes in greater details through pairwise Ka/Ks ratio analysis of all the orthologous genes; both core and non-core (please refer to ‘Secretory protein vs. non-secretory protein in Pairwise comparison’ section).

The normal distribution (Gaussian) curve of the Ka/Ks value for the Secretory protein genes is somewhat skewed (data not shown). The skew of the Ka/Ks in the case of secretory proteins may be associated with biochemical adaptations to the environment. There have been many instances where Ka/Ks values were found to be skewed. For instance, secreted proteins were found to be under low purifying selection in human-mouse sequence alignments [40]. On the other hand, essential genes of E. coli were shown to be under strong purifying selection [41] while on the contrary, in the case of plant R genes [42], CHIK envelope proteins [43] and Shigella effector gene [44], diversifying selection was shown. In some cases like flu virus HA protein, both purifying and diversifying selection occur at the same time in different sites [45].

Signal peptides evolve faster than mature regions

A secretory protein is functional only when it reaches the appropriate cellular compartment. The translocation of secretory proteins across the bacterial cytoplasmic membrane can be mediated by N-terminal signal peptides. After translocation across the membrane, signal peptides are normally cleaved from the preprotein by signal peptidases and it has even been suggested signal peptides may end up in the membrane there to play a role unrelated to that of the rest of the proteins [46]. Numerous analyses have indicated that there are considerable rate variations among genes and across different gene regions or subdomains [47]. This suggests that signal peptide (SP) parts might have rates of molecular evolution that are different from that of the mature peptide (MP) parts. In all possible Frankia pairs, significant differences were found in the degree of evolutionary change (i.e. Ka/Ks) between SP and MP (Mann–Whitney U test, P < 0.001) (Table 5). However, the Frankia ACN/EAN, and ACN/Eul pairings showed more prominent differences between signal peptide and mature parts. Similar trends were also observed among the Mycobacterium and Streptomyces genomes. In many cases, the Ka/Ks values of signal peptides were found to be 2–7 times higher than those of the mature proteins. Similar results for an increased rate of evolution of signal peptides were reported for yeast [48] and avian growth hormone genes [49]. Although there might be a tendency, we failed to find a strong correlation between Ka/Ks values of the mature and signal peptides. For all of our datasets, the Ka/Ks value of the signal peptide was found to strongly co-vary with Ka/Ks value of the entire peptide. Thus, it seems that the rate of evolution of the entire peptide may be correlated with the rate of evolution of the signal peptide.

Table 5 The rate of synonymous (Ks) and non-synonymous (Ka) nucleotide substitution for secretory (signal peptide and mature peptide) and non-secretory proteomes

Distribution into COGs

In order to detect if the core genome and conserved secretome had similar contents, they were distributed into functional categories (COGs) and compared with the whole genome of two representative strains for the three genera Frankia, Streptomyces and Mycobacterium. It thus seems that the two intracellular bacteria (MT and FD) shared a similar distribution of their secretomes into COGs. The full genomes have similar tendencies in that the pairs of genomes belonging to the three genera were close to one another especially Streptomyces and Frankia and associated with categories I (lipid transport) in the case of Mycobacterium, with categories P (Inorganic ion transport and metabolism) and C (Energy production and conversion) in the case of Frankia and with categories T (signaling), K (transcription) and V (defense) in the case of Streptomyces. When core secretomes were considered, the three pairs were not maintained with the three strains (MT, FD and SS) comprising pathogens being close to one another while the saprophytes were more distant (Figure 1). With regards to their secreted proteomes, Frankia FD and Mycobacterium MT were closer to one another than either was to Streptomyces (Figure 1).

Codon usage bias affecting the selection pressure

We examined whether evolutionary constraints on the genes are influenced by the codon usage bias. For Frankia ACN and Frankia FD (Figure 2), the evolutionary rate, particularly the Ka/Ks ratio, was negatively correlated to CAI values for all of the genes belonging to the core including the secretory protein genes (Pearson correlation coefficient, R = -0.017 for core genes and R = -0.16 for secretory protein genes). Similar trends were also found with the Mycobacterium strains. One explanation for this negative correlation is that codon usage bias correlated positively with the intensity of purifying selection [50]. Therefore, genes with a stronger codon usage bias (i.e. with high CAI value) will undergo higher negative selection pressure and thus, the evolutionary rate will be slower at non-synonymous or synonymous sites. On the other hand, Frankia strains CcI, EAN, and Eul, and Streptomyces, showed a negative correlation between Ka/Ks ratio and CAI values for the core genes as a whole, while the secretory proteins exhibited a reverse trend (i.e. the Ka/Ks ratio was positively correlated to CAI, with R values ranging from 0.188 to 0.262). This kind of unusual relationship between evolutionary rate and CAI value in signal-peptide-bearing genes was reported earlier for Streptomyces[48]. They have proposed that intensity of purifying selection was significantly relaxed in such genes.

Figure 2
figure 2

Scatter plot of Ka/Ks value versus Codon Adaptation index (CAI) in various Actinobacteria (left to right & Top to bottom) (A) Frankia EAN1pec (B) Frankia FD (C) Frankia ACN14a (D) Frankia Eul1c (E) Frankia CcI3 (F) S. coelicolor (G) M. tuberculosis . The x-axis represents the Ka/Ks value and y-axis represents CAI value.

Secretory protein vs. non-secretory protein in pairwise comparison

Various combinations of Frankia, Mycobacterium and Streptomyces genes were used for pairwise calculation of Ka/Ks. For this analysis, we have first screened out the orthologous gene pairs between genome pairs and then calculated the Ka/Ks value for all orthologous gene pairs. From these data, the secretory protein genes were identified as those predicted to have a signal peptide in both members of the orthologous pair. Their Ka/Ks values were compared to the rest of the genes. Average Ka/Ks values for pair-wise genome comparisons among Frankia, Mycobacterium and Streptomyces are provided in Additional file 2: Table S2. In Table 6, a matrix format is provided with each cell representing the difference between average Ka/Ks value of secretory protein genes and non-secretory protein genes. Generally among the marginally free-living facultative symbiont (Group C strains) Frankia strains (i.e. Frankia CcI and FD), the difference in evolutionary rates of secretory proteins and non-secretory proteins was quite robust. The Mann–Whitney U-test showed that the difference was highly significant (P < 0.001 in a two-tailed test). Similarly, in all combinations of Frankia ACN, CcI and FD also showed statistically significant differences. Whereas, in other pairing with Frankia ACN and the two Elaeagnus-infecting strains, which are predominantly free-living facultative symbionts (Group B strains), showed no significant differences in the Ka/Ks values of secretory proteins and non-secretory proteins with the exception of the EAN–CcI pairwise combination. Here, the difference in Ka/Ks ratios of secretary and non-secretary proteins was significant at p < 0.05. A trend was also observed for the other actinobacteria strains analyzed. Only pairing comprised of pathogenic Mycobacterium strains (i.e. M. tuberculosis, M. leprae and M. ulcerans) showed a significant difference between the evolutionary rate of secretory and non-secretory proteins. The Ka/Ks ratio of secretory and non-secretory proteins was not significant for pairing among non-pathogenic strains (like M. vanbaalenii and M. smegmatis) or with a combination of pathogenic and non-pathogenic strain (like M. tuberculosis/M. vanbaalenii, or M. leprae/M. smegmatis). Analysis of the Streptomyces pairing also showed significant differences in the Ka/Ks ratio of secretory and non-secretory proteins in pairs like S. coelicolor/S. scabies, S. avermitilis/ S. scabies and S. scabies/S. somaliensis. Incidentally, S. scabies and S. somaliensis are the pathogenic strains of Streptomyces.

Table 6 Pairwise comparison of Ka/Ks ratio in various strains of studied genera

These above results in total indicate an overall trend that the evolutionary constraints on secretory proteins as a whole in marginally free-living facultative symbiont or pathogenic strains were significantly increased compared to those occurring in saprophytic or free-living organisms. A possible explanation for this trend is that high Ka/Ks ratios of secretory proteins in pathogens and symbionts may reflect adaptive evolution of their sequences.


A definite trend emerged from our analysis of the evolutionary rates and patterns for various gene types among five Frankia, five Mycobacterium and five Streptomyces genomes. Secretory protein genes for obligate symbionts, marginally free-living facultative symbionts or pathogenic organisms, evolved significantly faster than non-secretory protein genes, whereas genomes of saprophytes or predominantly free-living facultative symbionts did exhibit significant changes in rate. This difference may be a telling genomic signature of loss of autonomy. Although robust purifying selection was encountered in most of the analyses, the secretory protein genes were found to be under stronger evolutionary selection pressure than non-secretary protein genes in symbiotic and pathogenic strains. This difference could be an adaptive strategy for them to interact better with their hosts. Further, within the secretory protein genes, the evolution rate (Ka/Ks) of signal peptide, on average, was 2–7 times higher than that of mature proteins. This result suggests that signal peptides might be under relaxed purifying selection. Codon usage analysis of actinobacterial strains under host selection pressure (such as symbiotic Frankia, ACN, FD and the pathogenic Mycobacterium) suggests that codon usage bias had a negative impact on the selective pressure exerted on the secretory protein genes. These organisms remain in continuous cross-talk with their host particularly through the signal peptides. It thus appears symbiotic and pathogenic bacteria try to remain in a discrete expression mode to avoid elicitation of host defense responses, while concurrently accumulating evolutionary neutral synonymous substitutions.

The expected arrival of a large number of genomes, in particular in genus Frankia and relatives, may yield more closely related genomes on which to calculate a larger number of conserved genes than is possible in strains with different host infectivity spectra that have diverged for several millions of years with a reduced core genome. This should help identify proteins and domains subject to strong evolutionary constraints, in particular in lineages where little or no isolates are available among which those determinants involved in symbiotic interactions.


Selection of genomes used in this study

The nucleotide sequences along with their deduced amino acid sequences for all the protein coding gene sequences of five Frankia strains namely ACN14a (NC_008278), CcI3 (NC_007777), EAN1pec(NC_009921), EuI1c(NC_014666) and symbiont of Datisca glomerata (CP002801) and hereafter will be referred to as ACN, CcI, EAN, EuI and FD respectively along with five Streptomyces strains : S. coelicolor A3(2)(NC_003888), S. avermitilis MA-4680(NC_003155), S. griseus NBRC 13350 (NC_010572), S. scabiei NC_013929.1, S. somaliensis AJJM01000000 [12] and five Mycobacterium strains : M. leprae TN (NC_002677), M. tuberculosis CDC1551(NC_002755), M. ulcerans Agy99 (NC_008611.1), M. smegmatis MC2 155 (NC_008596) and M. vanbaaleni PYR-1(NC_008726); were downloaded from the JGI-IMG Database (

Identification of orthologous genes

Orthologous genes were identified based on the Reciprocal Best Hits (RBH) approach on amino acid sequences for all the protein coding gene sequences with an E-value threshold of 1e-10; an identity ≥ 50% over at least 50% of the alignable region. This approach and parameters had been used previously for screening orthologs in Streptomyces[51].

Identification of secretory protein genes

Secretory protein genes belonging to the core genome were identified using the SignalP 3.0 [52] and TMHMM 2.0 [53] software. Only those genes predicted as secretory proteins by both artificial neural networks and hidden Markov models were selected. Sequences predicted to contain a signal peptide by SignalP were analyzed with TMHMM 2.0 to determine the number of transmembrane (TM) domains. Those having 0–2 transmembrane domains were further considered as done by Mastronunzio et al. [12]. Individual examination of selected genes was made to ensure only genes with viable peptide leader were selected. For the comparison of evolutionary rates of the mature part and the signal peptide part, a dataset of orthologs which signal peptide cleavage site have been detected in both entities was compiled. Mature peptides (complete sequence minus signal peptide) were analyzed by editing out the predicted signal peptide from the alignment file using a Perl script developed by us.

Evolutionary rate analysis

Orthologous gene alignments were utilized for evolutionary rate analyses. The number of nonsynonymous or synonymous substitutions per site (Ka or Ks, respectively) and their ratio (Ka/Ks) was estimated with Codeml in the PAML software program [54]. A bioperl script was used with the PAML program to estimate the pairwise Ka and Ks values. The script first translated cDNAs into proteins and aligned the protein sequences. The protein alignments were projected back into cDNA coordinates and used by the PAML module to calculate the Ka/Ks ratio using the maximum likelihood method. To study the evolutionary rate of the signal peptide part and the mature part of a protein, the Ka/Ks value of each component of the protein was determined separately.

Codon bias analysis

Codon adaptation index (CAI) is a measure of directional synonymous codon usage bias [55]. The index uses a reference set of highly expressed genes from a species to assess the relative usage of each codon, and the score of each gene is calculated from the frequency of use of all codons in that gene. The index assesses the extent to which selection has been effective in molding the pattern of codon usage. The CAI value for each gene belonging to core genome was calculated with the help of CAI Calculator 2 ( [56].


  1. Benson DR, Silvester WB: Biology of Frankia strains, actinomycete symbionts of actinorhizal plants. Microbiol Rev. 1993, 57 (2): 293-319.

    PubMed Central  CAS  PubMed  Google Scholar 

  2. Normand P, Orso S, Cournoyer B, Jeannin P, Chapelon C, Dawson J, Evtushenko L, Misra AK: Molecular phylogeny of the genus Frankia and related genera and emendation of the family Frankiaceae. Int J Syst Bacteriol. 1996, 46 (1): 1-9. 10.1099/00207713-46-1-1.

    Article  CAS  PubMed  Google Scholar 

  3. Normand P, Chapelon C: Direct characterization of Frankia and of close phyletic neighbors from an Alnus viridis rhizosphere. Physiol Plant. 1997, 99: 722-731. 10.1111/j.1399-3054.1997.tb05377.x.

    Article  CAS  Google Scholar 

  4. Normand P: Geodermatophilaceae fam. nov., a formal description. Int J Syst Evol Microbiol. 2006, 56: 2277-2278. 10.1099/ijs.0.64298-0.

    Article  CAS  PubMed  Google Scholar 

  5. Babalola OO, Kirby BM, Le Roes-Hill M, Cook AE, Cary SC, Burton SG, Cowan DA: Phylogenetic analysis of actinobacterial populations associated with Antarctic Dry Valley mineral soils. Environ Microbiol. 2009, 11 (3): 566-576. 10.1111/j.1462-2920.2008.01809.x.

    Article  CAS  PubMed  Google Scholar 

  6. Mort A, Normand P, Lalonde M: 2-O-methyl-D-mannose, a key sugar in the taxonomy of Frankia. Can J Microbiol. 1983, 29: 993-1002. 10.1139/m83-156.

    Article  CAS  Google Scholar 

  7. Berry AM, Harriott OT, Moreau RA, Osman SF, Benson DR, Jones AD: Hopanoid lipids compose the Frankia vesicle envelope, presumptive barrier of oxygen diffusion to nitrogenase. Proc Natl Acad Sci USA. 1993, 90: 6091-6094. 10.1073/pnas.90.13.6091.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Gherbi H, Markmann K, Svistoonoff S, Estevan J, Autran D, Giczey G, Auguy F, Peret B, Laplaze L, Franche C, et al: SymRK defines a common genetic basis for plant root endosymbioses with arbuscular mycorrhiza fungi, rhizobia, and Frankia bacteria. Proc Natl Acad Sci USA. 2008, 105: 4928-4932. 10.1073/pnas.0710618105.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Hocher V, Alloisio N, Auguy F, Fournier P, Doumas P, Pujic P, Gherbi H, Queiroux C, Da Silva C, Wincker P, et al: Transcriptomics of actinorhizal symbioses reveals homologs of the whole common symbiotic signaling cascade. Plant Physiol. 2011, 156: 1-12. 10.1104/pp.111.900410.

    Article  Google Scholar 

  10. Normand P, Queiroux C, Tisa LS, Benson DR, Cruveiller S, Rouy Z, Medigue C: Exploring the genomes of Frankia sp. Physiol Plant. 2007, 13: 331-343.

    Article  Google Scholar 

  11. Normand P, Lapierre P, Tisa LS, Gogarten JP, Alloisio N, Bagnarol E, Bassi CA, Berry AM, Bickhart DM, Choisne N, et al: Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography. Genome Res. 2007, 17 (1): 7-15.

    Article  PubMed Central  PubMed  Google Scholar 

  12. Mastronunzio JE, Tisa LS, Normand P, Benson DR: Comparative secretome analysis suggests low plant cell wall degrading capacity in Frankia symbionts. BMC Genomics. 2008, 9: 47-10.1186/1471-2164-9-47.

    Article  PubMed Central  PubMed  Google Scholar 

  13. Alloisio N, Félix S, Maréchal J, Pujic P, Rouy Z, Vallenet D, Medigue C, Normand P: Frankia alni proteome under nitrogen-fixing and nitrogen-replete conditions. Physiol Plant. 2007, 13: 440-453.

    Article  Google Scholar 

  14. Waksman SA, Reilly HC, Johnstone DB: Isolation of streptomycin-producing strains of Streptomyces griseus. J Bacteriol. 1946, 52: 393-397.

    PubMed Central  Google Scholar 

  15. Bouchek-Mechiche K, Gardan L, Andrivon D, Normand P: Streptomyces turgidiscabies and Streptomyces reticuliscabiei: one genomic species, two pathogenic groups. Int J Syst Evol Microbiol. 2006, 56 (Pt 12): 2771-2776.

    Article  CAS  PubMed  Google Scholar 

  16. Kirby R, Sangal V, Tucker NP, Zakrzewska-Czerwinska J, Wierzbicka K, Herron PR, Chu CJ, Chandra G, Fahal AH, Goodfellow M, et al: Draft genome sequence of the human pathogen Streptomyces somaliensis, a significant cause of actinomycetoma. J Bacteriol. 2012, 194 (13): 3544-3545. 10.1128/JB.00534-12.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, et al: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998, 393 (6685): 537-544. 10.1038/31159.

    Article  CAS  PubMed  Google Scholar 

  18. Cole ST, Eiglmeier K, Parkhill J, James KD, Thomson NR, Wheeler PR, Honore N, Garnier T, Churcher C, Harris D, et al: Massive gene decay in the leprosy bacillus. Nature. 2001, 409 (6823): 1007-1011. 10.1038/35059006.

    Article  CAS  PubMed  Google Scholar 

  19. Kweon O, Kim SJ, Jones RC, Freeman JP, Adjei MD, Edmondson RD, Cerniglia CE: A polyomic approach to elucidate the fluoranthene-degradative pathway in Mycobacterium vanbaalenii PYR-1. J Bacteriol. 2007, 189 (13): 4635-4647. 10.1128/JB.00128-07.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  20. Martino A, Sacchi A, Volpe E, Agrati C, De Santis R, Pucillo LP, Colizzi V, Vendetti S: Non-pathogenic Mycobacterium smegmatis induces the differentiation of human monocytes directly into fully mature dendritic cells. J Clin Immun. 2005, 25 (4): 365-375. 10.1007/s10875-005-4188-x.

    Article  CAS  PubMed  Google Scholar 

  21. Baron C, Zambryski PC: The plant response in pathogenesis, symbiosis, and wounding: variations on a common theme?. Ann Rev Genet. 1995, 29: 107-129. 10.1146/

    Article  CAS  PubMed  Google Scholar 

  22. Brown JK, Tellier A: Plant-parasite coevolution: bridging the gap between genetics and ecology. Ann Rev Phytopathol. 2011, 49: 345-367. 10.1146/annurev-phyto-072910-095301.

    Article  CAS  Google Scholar 

  23. Simonet P, Navarro E, Rouvier C, Reddell P, Zimpfer J, Dommergues Y, Bardin R, Combarro P, Hamelin J, Domenach AM, et al: Co-evolution between Frankia populations and host plants in the family Casuarinaceae and consequent patterns of global dispersal. Environ Microbiol. 1999, 1 (6): 525-533. 10.1046/j.1462-2920.1999.00068.x.

    Article  CAS  PubMed  Google Scholar 

  24. Alloisio N, Queiroux C, Fournier P, Pujic P, Normand P, Vallenet D, Médigue C, Yamaura M, Kakoi K, Kucho KI: The Frankia alni symbiotic transcriptome. Mol Plant Microb Interactions. 2010, 23: 593-607. 10.1094/MPMI-23-5-0593.

    Article  CAS  Google Scholar 

  25. Stukenbrock EH, Jorgensen FG, Zala M, Hansen TT, McDonald BA, Schierup MH: Whole-genome and chromosome evolution associated with host adaptation and speciation of the wheat pathogen Mycosphaerella graminicola. PLoS Genetics. 2010, 6 (12): e1001189-10.1371/journal.pgen.1001189.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Gong X, Padhi A: Evidence for positive selection in the extracellular domain of human cytomegalovirus encoded G protein-coupled receptor US28. J Med Virol. 2011, 83 (7): 1255-1261. 10.1002/jmv.22098.

    Article  CAS  PubMed  Google Scholar 

  27. Huang Y, Temperley ND, Ren L, Smith J, Li N, Burt DW: Molecular evolution of the vertebrate TLR1 gene family–a complex history of gene duplication, gene conversion, positive selection and co-evolution. BMC Evol Biol. 2011, 11: 149-10.1186/1471-2148-11-149.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Delport W, Scheffler K, Seoighe C: Frequent toggling between alternative amino acids is driven by selection in HIV-1. PLoS Pathogens. 2008, 4 (12): e1000242-10.1371/journal.ppat.1000242.

    Article  PubMed Central  PubMed  Google Scholar 

  29. Shabab M, Shindo T, Gu C, Kaschani F, Pansuriya T, Chintha R, Harzen A, Colby T, Kamoun S, van der Hoorn RA: Fungal effector protein AVR2 targets diversifying defense-related cys proteases of tomato. Plant Cell. 2008, 20 (4): 1169-1183. 10.1105/tpc.107.056325.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  30. Bailly X, Olivieri I, De Mita S, Cleyet-Marel JC, Bena G: Recombination and selection shape the molecular diversity pattern of nitrogen-fixing Sinorhizobium sp. associated to Medicago. Molec Ecol. 2006, 15 (10): 2719-2734. 10.1111/j.1365-294X.2006.02969.x.

    Article  CAS  Google Scholar 

  31. Brownlie JC, Adamski M, Slatko B, McGraw EA: Diversifying selection and host adaptation in two endosymbiont genomes. BMC Evol Biol. 2007, 7: 68-10.1186/1471-2148-7-68.

    Article  PubMed Central  PubMed  Google Scholar 

  32. Baker D, Torrey J: The isolation and cultivation of actinomycetous root nodule endophytes. Symbiotic Nitrogen Fixation in the Management of Temperate Forests. Edited by: JC Gordon CW, Perry DA, Corvallis OR. 1979, Oregon State University: Forest Research Laboratory, 38-56.

    Google Scholar 

  33. Becking JH: Frankiaceae fam. nov. (Actinomycetales) with one new combination and six new species of the genus Frankia Brunchorst 1886, 174. Int J Syst Bacteriol. 1970, 20: 201-220. 10.1099/00207713-20-2-201.

    Article  Google Scholar 

  34. Lancelle S, Torrey J, Hepler P, Callaham D: Ultrastructure of freeze-substituted Frankia strain HFPCcI3, the actinomycete isolated from root nodules of Casuarina cunninghamiana. Protoplasma. 1985, 127: 64-72. 10.1007/BF01273702.

    Article  Google Scholar 

  35. Gauthier D, Diem H, Dommergues Y: In vitro nitrogen fixation by two actinomycete strains isolated from Casuarina nodules. Appl Environ Microbiol. 1981, 41: 306-308.

    PubMed Central  CAS  PubMed  Google Scholar 

  36. Normand P, Lalonde M: Evaluation of Frankia strains isolated from provenances of two Alnus species. Can J Microbiol. 1982, 28: 1133-1142. 10.1139/m82-168.

    Article  Google Scholar 

  37. Smolander A, Ronkko R, Nurmiaho-Lassila E-L, Haahtela K: Growth of Frankia in the rhizosphere of Betula pendula, a nonhost tree species. Can J Microbiol. 1990, 36: 649-656. 10.1139/m90-111.

    Article  Google Scholar 

  38. Baker D, Newcomb W, Torrey JG: Characterization of an ineffective actinorhizal microsymbiont, Frankia sp. EuI1 (Actinomycetales). Can J Microbiol. 1980, 26: 1072-1089. 10.1139/m80-180.

    Article  CAS  PubMed  Google Scholar 

  39. Cooper CR, Hanson LA, Diehl WJ, Pharr GT, Coats KS: Natural selection of the Pol gene of bovine immunodeficiency virus. Virology. 1999, 255 (2): 294-301. 10.1006/viro.1998.9572.

    Article  CAS  PubMed  Google Scholar 

  40. Waterston RH, Lander ES, Sulston JE: On the sequencing of the human genome. Proc Natl Acad Sci USA. 2002, 99 (6): 3712-3716. 10.1073/pnas.042692499.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  41. Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12 (6): 962-968.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  42. Bergelson J, Kreitman M, Stahl EA, Tian D: Evolutionary dynamics of plant R-genes. Science. 2001, 292 (5525): 2281-2285. 10.1126/science.1061337.

    Article  CAS  PubMed  Google Scholar 

  43. Schuffenecker I, Ando T, Thouvenot D, Lina B, Aymard M: Genetic classification of "Sapporo-like viruses". Arch Virol. 2001, 146 (11): 2115-2132. 10.1007/s007050170024.

    Article  CAS  PubMed  Google Scholar 

  44. Lan R, Lumb B, Ryan D, Reeves PR: Molecular evolution of large virulence plasmid in Shigella clones and enteroinvasive Escherichia coli. Inf Imm. 2001, 69 (10): 6303-6309. 10.1128/IAI.69.10.6303-6309.2001.

    Article  CAS  Google Scholar 

  45. Shahsavandi S, Salmanian AH, Ghorashi SA, Masoudi S, Ebrahimi MM: Evolutionary characterization of hemagglutinin gene of H9N2 influenza viruses isolated from Asia. Res Vet Sci. 2012, 93 (1): 234-239. 10.1016/j.rvsc.2011.07.033.

    Article  CAS  PubMed  Google Scholar 

  46. Hegde RS: Targeting and beyond: new roles for old signal sequences. Molec Cell. 2002, 10 (4): 697-698. 10.1016/S1097-2765(02)00692-5.

    Article  CAS  PubMed  Google Scholar 

  47. Graur D, Li WH: Fundamentals of molecular evolution, vol. 2. Edited by: Associates S. 2000, Sunderland, MA

    Google Scholar 

  48. Li YD, Xie ZY, Du YL, Zhou Z, Mao XM, Lv LX, Li YQ: The rapid evolution of signal peptides is mainly caused by relaxed selection on non-synonymous and synonymous sites. Gene. 2009, 436 (1–2): 8-11.

    Article  CAS  PubMed  Google Scholar 

  49. Buggiotti L, Primmer CR: Molecular evolution of the avian growth hormone gene and comparison with its mammalian counterpart. J Evol Biol. 2006, 19 (3): 844-854. 10.1111/j.1420-9101.2005.01042.x.

    Article  CAS  PubMed  Google Scholar 

  50. Gu Z, David L, Petrov D, Jones T, Davis RW, Steinmetz LM: Elevated evolutionary rates in the laboratory strain of Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2005, 102 (4): 1092-1097. 10.1073/pnas.0409159102.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  51. Li W, Wu J, Tao W, Zhao C, Wang Y, He X, Chandra G, Zhou X, Deng Z, Chater KF, et al: A genetic and bioinformatic analysis of Streptomyces coelicolor genes containing TTA codons, possible targets for regulation by a developmentally significant tRNA. FEMS Microbiol Lett. 2007, 266 (1): 20-28. 10.1111/j.1574-6968.2006.00494.x.

    Article  CAS  PubMed  Google Scholar 

  52. Bendtsen JD, Nielsen H, von Heijne G, Brunak S: Improved prediction of signal peptides: SignalP 3.0. J Molec Biol. 2004, 340 (4): 783-795. 10.1016/j.jmb.2004.05.028.

    Article  PubMed  Google Scholar 

  53. Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Molec Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.

    Article  CAS  PubMed  Google Scholar 

  54. Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Molec Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.

    Article  CAS  PubMed  Google Scholar 

  55. Sharp PM, Li WH: The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15 (3): 1281-1295. 10.1093/nar/15.3.1281.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  56. Wu G, Culley DE, Zhang W: Predicted highly expressed genes in the genomes of Streptomyces coelicolor and Streptomyces avermitilis and the implications for their metabolism. Microbiology. 2005, 151 (Pt 7): 2175-2187.

    Article  CAS  PubMed  Google Scholar 

Download references


ST acknowledges CSIR for CSIR-SRF Fellowship. PN received funding from the ANR Sesam (SVSE7 2011–13). LST was supported in part by Hatch grant NH530 and USDA NIFA 2010-65108-20581. AS is grateful to the DBT, Government of India, for providing CREST Award and financial help in setting up Bioinformatics Centre, in the Department of Botany, University of North Bengal. We are also thankful to Mr. Ayan Roy (University of North Bengal) for developing some Perl scripts for this paper.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Arnab Sen.

Additional information

Competing interest

The authors declare they have no financial or non-financial competing interest of any sort.

Authors’ contributions

VD and ST performed preliminary analyses, PN, LST and AS conceived the experimental design, ST and AS performed data analyses, PN and AS drafted the manuscript. All authors prepared the final manuscript and approved the final version.

Electronic supplementary material


Additional file 1: Table S1: Signal peptide bearing genes belonging to the core genome of Frankia, Mycobacterium and Streptomyces. The COG category to which these belong was obtained from the IMGer site ( (XLS 244 KB)


Additional file 2: Table S2: Average Ka/Ks values for pair-wise genomes comparisons between Frankia, Mycobacterium and Streptomyces. The Ka/Ks was computed for secreted proteins (presence of a peptide leader) and for other proteins, and the difference between the two was analyzed for significance by the Mann–Whitney test. (XLSX 11 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Thakur, S., Normand, P., Daubin, V. et al. Contrasted evolutionary constraints on secreted and non-secreted proteomes of selected Actinobacteria. BMC Genomics 14, 474 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Actinobacteria
  • Pathogenesis
  • Purifying selection
  • Secretome
  • Symbiosis