Iron-related gene mutations driving global Mycobacterium tuberculosis transmission revealed by whole-genome sequencing

Background Iron plays a crucial role in the growth of Mycobacterium tuberculosis (M. tuberculosis). However, the precise regulatory mechanism governing this system requires further elucidation. Additionally, limited studies have examined the impact of gene mutations related to iron on the transmission of M. tuberculosis globally. This research aims to investigate the correlation between mutations in iron-related genes and the worldwide transmission of M. tuberculosis. Results A total of 13,532 isolates of M. tuberculosis were included in this study. Among them, 6,104 (45.11%) were identified as genomic clustered isolates, while 8,395 (62.04%) were classified as genomic clade isolates. Our results showed that a total of 12 single nucleotide polymorphisms (SNPs) showed a positive correlation with clustering, such as Rv1469 (ctpD, C758T), Rv3703c (etgB, G1122T), and Rv3743c (ctpJ, G676C). Additionally, seven SNPs, including Rv0104 (T167G, T478G), Rv0211 (pckA, A302C), Rv0283 (eccB3, C423T), Rv1436 (gap, G654T), ctpD C758T, and etgB C578A, demonstrated a positive correlation with transmission clades across different countries. Notably, our findings highlighted the positive association of Rv0104 T167G, pckA A302C, eccB3 C423T, ctpD C758T, and etgB C578A with transmission clades across diverse regions. Furthermore, our analysis identified 78 SNPs that exhibited significant associations with clade size. Conclusions Our study reveals the link between iron-related gene SNPs and M. tuberculosis transmission, offering insights into crucial factors influencing the pathogenicity of the disease. This research holds promise for targeted strategies in prevention and treatment, advancing research and interventions in this field. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-024-10152-1.


Background
Tuberculosis (TB) is an airborne infectious disease caused by Mycobacterium tuberculosis (M.tuberculosis) and is the leading cause of death worldwide among infectious diseases.Despite great progress over the past decades, TB remains a major global health problem.In 2022, TB remained the second leading cause of death from a single infectious agent, after coronavirus disease (COVID- 19), and caused almost twice as many deaths as HIV/AIDS [1].Globally, there were 7.5 million newly diagnosed cases of TB reported in 2022.Additionally, the total number of deaths attributed to TB, including those among individuals with HIV, reached 1.30 million during the same year [1].Despite the immense global burden of tuberculosis, our understanding of the factors influencing its transmission remains limited.Therefore, gaining a deeper insight into the mechanisms underlying the transmission of M. tuberculosis is imperative in order to inform and guide effective strategies for tuberculosis control, ultimately leading to a reduction in the societal burden imposed by this disease.
Iron holds paramount importance as an indispensable element for nearly all living organisms due to its involvement in a vast array of metabolic processes, encompassing oxygen transportation, DNA synthesis, and electron conveyance [2].In the context of M. tuberculosis, iron emerges as an essential catalyst for growth.The significance of iron in the growth and metabolism of bacteria is elucidated through its acquisition from host reservoirs like transferrin, lactoferrin, and ferritin, followed by subsequent assimilation and utilization within the bacterial framework.Crucial constituents participating in the procurement of iron (in the form of ferric ion) and its preliminary transference into the mycobacterium cell encompass extracellular iron-binding agents, known as siderophores.In pathogenic mycobacteria, carboxymycobactins fulfill this role, while exochelins perform analogous functions in saprophytic mycobacteria [3,4].Upon successful acquisition, the next imperative step entails transporting iron across the mycobacterium cell membrane.M. tuberculosis employs specialized systems to facilitate this process.Subsequently, inside the mycobacterium cell, iron finds employment in diverse metabolic pathways, functioning as a pivotal cofactor for enzymes engaged in critical processes such as DNA synthesis, respiration, and energy production [3,5].Furthermore, iron plays a crucial role in regulating gene expression and maintaining redox homeostasis [6].M. tuberculosis exploits iron to disrupt host immune responses, thereby enhancing its survival and dissemination.In summary, iron contributes to the establishment and survival of M. tuberculosis within the host.By utilizing the iron resources, the bacterium can better adapt to the host environment and increase its transmission capacity.Iron plays a key role in the growth, pathogenicity, immune evasion, and host adaptation of M. tuberculosis.However, the specific regulatory mechanisms of iron-related genes involved in the dissemination of M. tuberculosis remain unclear.Further research is needed to uncover these mechanisms, providing insights into the pathogenesis of tuberculosis and facilitating the development of more effective treatment strategies.
Whole-genome sequencing (WGS) is progressively being used to investigate the transmission dynamics of M. tuberculosis.In this study, we employed WGS to analyze the impact of mutations in iron-related genes on the global transmission of M. tuberculosis.Specifically, the genome cluster and clade were used to represent the transmission of M. tuberculosis.

Sample Collection
Between 2011 and 2018, a total of 1,550 culture-positive cases of M. tuberculosis were collected from two medical institutions in China, namely the Shandong Public Health Clinical Research Center (SPHCC) and the Weifang Respiratory Clinical Hospital (WRCH).The study did not include cases with positive culture of M. tuberculosis that were previously evaluated and subsequently treated.

DNA extraction and sequencing
A total of 1447 isolates were included in this study, and genomic DNA was extracted from these isolates using the Cetyltrimethylammonium Bromide (CTAB) method.Prior to analysis, quality control (QC) procedures were conducted on the extracted DNA.However, 103 isolates of M. tuberculosis were excluded from further analysis due to issues related to improper handling during DNA extraction and poor quality of the extracted DNA.For the remaining isolates, their genomes were sequenced utilizing the Illumina HiSeq 4000 system.The resulting sequence data were then deposited in the National Center for Biotechnology Information (NCBI) under the BioProject PRJNA1002108.In addition to the aforementioned isolates, this study also included a larger dataset consisting of 13,267 isolates of M. tuberculosis collected from 52 countries and 18 regions worldwide, as reported in previous studies [7][8][9][10][11][12][13][14][15].To accurately map the reference genome of the standard isolate M. tuberculosis H37Rv, we employed the BWA-MEM (version 0.7.17-r1188).Our analysis focused solely on samples with a coverage rate of 98% or higher and a minimum depth of at least 20× [16].In summary, a total of 13,532 genomes were analyzed in this study, please refer to Additional file 1: Tables S14-S15 for the specific sample numbers.

Single nucleotide polymorphism (SNP) analysis
We performed variant calling using Samclip (version 0.4.0) and SAMtools (version 1.15).Following variant calling, we applied additional filtering steps to refine the resulting variants.This involved utilizing Free Bayes (version 1.3.2) and Bcftools (version 1.15.1) for further variant filtering.To ensure the accuracy of our analysis, we excluded SNPs located within repeat regions.This included polymorphic GC-rich sequences found in PE/ PPE genes, direct repeat SNPs, and repeat bases identified through the use of Tandem Repeat Finder (version 4.09) and RepeatMask (version 4.1.2-P1)[17,18].Finally, SNP annotation was conducted using SnpEff v 4.1 l.The resulting output was obtained by utilizing the Python programming language [19].

Phylogenetic analysis
According to Coll et al [20] (Additional file 2: Tables S14-S15), the isolates in this study were classified into different lineages.To construct the maximum likelihood phylogenetic tree, we utilized the IQ-TREE software package (version 1.6.12).The JC nucleotide substitution model and gamma model of rate heterogeneity were used, with 100 bootstrap replicates included for statistical support [21].During the analysis, Mycobacterium canettii CIPT140010059 was identified as an outlier and was treated accordingly.The resulting phylogenetic tree was visualized using iTOL (https://itol.embl.de/)for better representation and interpretation.

Propagation analysis
We employed cluster and clade analysis to investigate the impact of mutations in iron-related genes on the transmission of M. tuberculosis [22].Expanding upon previous studies [23], clustering techniques were utilized to define transmitted clusters, using a threshold of less than 12 SNPs.Additionally, clade analysis was conducted to identify transmission clades, with a threshold of less than 25 SNPs.To further categorize the transmission clades, we adopted a classification system used by scholars.The clades were classified into three groups based on size: large (above the 75th percentile), medium (between the 25th and 75th percentiles), and small (below the 25th percentile) [24].For a comprehensive analysis of the global distribution patterns and transmission dynamics of M. tuberculosis isolates, we classified them into crosscountry and within-country clades.Cross-country clades consist of isolates from two or more different countries.Furthermore, based on geographic location, the M. tuberculosis isolates were classified into cross-regional and within-regional clades using the United Nations standard regions (UN M.49).Cross-regional clades include isolates from two or more different regions.

Acquisition of iron-related genes
In our study, we obtained genes related to iron in M. tuberculosis from the NCBI database, which were previously discovered by scholars.These genes encompass various aspects such as iron uptake transporters, iron storage proteins, iron-regulated transcription factors, and enzymes involved in iron-dependent processes.A total of 59 iron-related genes were retrieved from the NCBI database.Python was utilized to detect mutations in genes associated with iron (Additional file 1: Table S16).

Modeling and statistical analysis
The data were presented as percentages.Positions with mutation frequency below 0.01 in the iron-related genes were excluded from the analysis [25].For statistical analysis, we employed generalized linear mixed models in the R statistical language (R 4.2.3).To further analyze the data, random forest and gradient boosting decision tree algorithms were implemented using Python 3.7.4 with the Scikit-learn library (Python Software Foundation, USA; Packt Publishing, UK).The dataset was randomly divided into a training set and a test set in a 7:3 ratio.In order to assess the impact of mutations in iron-related genes on clade size, Spearman's rank correlation analysis was performed using R version 4.2.3.Confounding factors such as lineage and geographical location were taken into account during all analyses.All statistical analyses were conducted using SPSS 26.0.Two-tailed tests were used, and statistical significance was defined as a P-value below 0.05.

Sample description
We included a total of 13,532 isolates of M. tuberculosis from around the world, with 1,445 isolates collected between 2011 and 2018 at the Shandong Public Health Clinical Research Center (SPHCC) and the Weifang Respiratory Clinical Hospital (WRCH).Among these isolates, the highest proportion was observed in Eastern Asia (n = 3,172, 23.44%), followed by Eastern Africa (n = 1,728, 12.77%) and Northern America (n = 1,646, 12.16%), as depicted in Fig. 1.Additionally, the majority of these isolates (n = 6,499, 48.03%) belonged to lineage 4, while (n = 5,135, 37.95%) belonged to lineage 2, aligning with our expectations.Isolates were divided into clusters based on < = 12 single nucleotide polymorphisms (SNPs).Accordingly, a total of 6,104 isolates clustered together, resulting in a clustering rate of 0.45.Within the lineage 4 group, 2,971 (45.71%) isolates formed clusters, while within the lineage 2 group, 2,131 (41.50%) isolates formed clusters.When applying a threshold of 25 SNPs for clades, a total of 8,395 isolates clade together, resulting in a clade rate of 0.62.The M. tuberculosis isolates were further grouped into 2,218 clades, with the number of isolates per clade ranging from 2 to 224 isolates.Within these clades, there were 177 cross-country clades, consisting of 2 to 4 countries, and 171 cross-regional clades, consisting of 2 to 4 regions, as shown in Table 1.The phylogenetic tree of M. tuberculosis isolates was constructed as described in Fig. 2.

Relationship between iron-related gene mutations and transmission clusters of lineages
After excluding the mutation frequency below 0.01, we identified and included a total of 40 SNPs for further analysis.In comparison to non-clustered isolates, we conducted an analysis on the relationship between 40 SNPs and clustered isolates specifically belonging to lineage 2. The GLMM revealed that five SNPs showed statistical significance for clustering (P < 0.05) (Table 2).Among these, two nonsynonymous SNPs and two synonymous SNPs displayed a positive correlation with clustering.These significant SNPs included Rv0197 T2247G, Rv1553 (frdB, C87T), and Rv2869c (rip, C957T, G775T).Two prediction models were established using random forest and gradient boosting decision tree algorithms (Additional file 1: Table S5, Table S10 and Additional file 2: Fig. S2).Our findings demonstrated that Rv0197 T2247G, frdB C87T, and rip (C957T, G775T) contributed significantly to both the random forest and gradient boosting decision tree models.Overall, our results indicated that the SNPs Rv0197 T2247G, frdB C87T, and rip (C957T, G775T) were positively correlated with transmission clusters within M. tuberculosis isolates of lineage 2.

Relationship between iron-related gene mutations and clade size
After excluding sites with a mutation frequency less than 0.01, we identified and included a total of 90 iron-related gene SNPs.The results showed that 78 SNPs were significantly associated with clade size (P < 0.05), among which 22 nonsynonymous SNPs and 11 synonymous SNPs were positively correlated with clade size, including eccB3 C423T, ctpD C758T, etgB C578A, rip C957T, etgB G1122T, and ctpJ G676C.For further details refer to Fig. 3.

Discussion
We have identified a relationship between iron-related gene mutations and the transmission of M. tuberculosis in this study.Similarly, within the clades defined by 25 SNPs, lineage 4 (n = 4577, 54.52%) and lineage 2 (n = 2999, 35.72%) constitute the majority.This suggested that the transmission of M. tuberculosis was primarily driven by lineage 2 and lineage 4.Moreover, our findings also revealed 176 cross-country transmission clades.Among these, eight transmission clades involved three countries, while the transmission clade 254 extended across four nations: Peru, South Africa, India, and Thailand (see Fig. 4).These patterns of cross-continental transmission transcended the typical spread observed between neighboring countries.The distributional tendencies were likely intertwined with the prevalence of modern-day social activities, such as international trade, travel, and other forms of social interaction.According to our study, two nonsynonymous SNPs of G344T and T2247G in Rv0197 increased the risk of transmission clusters.We also noticed the SNP of T2247G in Rv0197 was positively associated with transmission clusters of lineage2 and lineage4, which has previously been shown to be associated with enhanced transmissibility in vivo [26].In addition, the frequent and independent occurrence in Lineage4.3/Latin American and Mediterranean sub-lineage clonal complex (TUN4.3_CC1) of the in vivo enhanced transmission-associated mutation in Rv0197 T2247G, could have contributed to its evolutionary success.We understand that the protein encoded by the Rv0197 gene plays a critical role in bacterial metabolism and respiration, particularly as a putative iron oxidoreductase enzyme.We understand that the protein encoded by the Rv0197 gene plays a critical role in bacterial metabolism and respiration, particularly as a putative iron oxidoreductase enzyme.Therefore, these SNP variations could potentially lead to structural or functional changes in the protein, thus influencing bacterial physiology [27].Furthermore, we speculate that these SNP variations might help the bacteria adapt and survive in specific environments or hosts, possibly through   [29].Based on these collective findings, we hypothesized that these two mutations (G775T and C957T) potentially induced functional changes in the Rv2869c protein, impacting the formation and structure of the bacterial cell envelope, thereby influencing the transmission potential of M. tuberculosis.The gene Rv0338c encoded IspQ, a membrane-bound protein containing 2Fe-2 S and 4Fe-4 S centers, which was believed to serve as an iron-sulfur binding oxidoreductase.Given its essential role in the β-oxidation process of M. tuberculosis, mutations in Rv0338c had the potential to affect oxygen reduction reactions involved in bacterial metabolism and respiration.Further research was needed to fully comprehend the specific consequences of this synonymous mutation on the functionality of Rv0338c and its impact on bacterial physiology.Our study findings demonstrated a positive correlation between the synonymous SNP C2478T in Rv0338c and transmission clusters, particularly within lineage 4.This indicated that this specific SNP variation may have contributed to the adaptation and transmission dynamics within distinct lineages.Notably, studies had shown that mutants lacking the etfD gene, which interacted with Rv0338c, exhibited impaired growth on fatty acids or cholesterol, as well as reduced survival and growth in murine infection models [30,31].
Our findings further underscored the significance of Rv0338c and its associated genes in mycobacterial physiology and pathogenesis.In our study, we also discovered that the nonsynonymous SNP G676C in Rv3743c, the nonsynonymous SNP G1122T in Rv3703c, and the nonsynonymous SNP C578A in Rv3703c were positively associated with transmission clusters, specifically those associated with lineage 4 isolates.Rv3743c is known as a cation transporter/ATPase, while Rv3703c is classified as an iron (II)-dependent oxidoreductase.However, the precise functional roles of these genes in the context of M. tuberculosis are not yet fully understood and require further investigation.
In our analysis of transmission clades, which includes cross-regional, cross-country, and clade size, we found a positive correlation between the nonsynonymous SNP C758T in Rv1469 and cross-regional transmission, cross-country transmission, and clade size.Additionally, the nonsynonymous SNP T980G and synonymous SNP C1350T in Rv1469 were positively associated with clade size.Rv1469 is one of the coding genes for homologous P1B4-ATPase [32].It belongs to the ATPase superfamily and functions as a transmembrane protein involved in the transport and regulation of metal ions.The Rv1469 gene encodes a membrane protein annotated as the M. tuberculosis paralog of Rv1469, a member of the metal cation-transporting P1B4-ATPase subgroup.It plays an essential role in M. tuberculosis survival within the host.Specifically, Rv1469 acts as a high-affinity Fe 2+ exporter required for overcoming redox stress and adapting to the host environment [32,33].the nonsynonymous SNPs T167G in Rv0104, A302C in Rv0211, and C578A in Rv3703c were positively correlated with cross-regional transmission, cross-country transmission, and clade size.However, the specific functions of Rv0104, Rv0211,  and Rv3703c remain unclear.It is worth noting that these associations suggest a potential link between these genetic variations and the sspread of tuberculosis across different regions and countries.However, without a clear understanding of the functions of these genes, it is difficult to determine the exact mechanisms underlying this correlation.
Additionally, our study also elucidated the association between SNPs in other iron-related genes and the transmission of M. tuberculosis.These genetic mutations have the potential to alter diverse physiological functions of the bacterium that are intricately linked to its transmission.By altering these iron-related pathways, SNPs in these genes may impact the fitness, virulence, or adaptive capabilities of the bacterium.This, in turn, could influence its ability to establish infections, replicate, evade host immune responses, and transmit to new hosts.Furthermore, our findings provided confirmation that both synonymous and non-synonymous mutations can impact the transmission of M. tuberculosis.This indicates that synonymous mutations in iron-related genes are not universally neutral, which aligns with previous studies by Xukang Shen suggesting that synonymous mutations in yeast genes are predominantly strong nonneutral mutations [34].
In this study, we have identified correlations between mutations in iron-related genes and the transmission of M. tuberculosis.However, it is important to acknowledge several limitations and shortcomings of our research.Firstly, although we have established these correlations, the specific impact of these mutations on the transmission dynamics of M. tuberculosis lacks experimental validation.Further research is needed to investigate the functional significance of these mutations and their direct influence on the transmission of the bacteria.Moreover, it is worth noting that mutations in iron-related genes may also affect other factors related to pathogenesis, such as bacterial virulence and immune response.These potential influences warrant further in-depth investigations.Understanding the broader implications of these mutations requires additional studies aimed at exploring their effects on various aspects of TB pathogenesis.

Conclusion
The findings of this study indicate that mutations in ironrelated genes could potentially elevate the risk of M. tuberculosis transmission, underscoring the importance of conducting additional research to explore the impact of these mutations on the control and dissemination of M. tuberculosis.These results offer significant insights that can inform the development of therapeutic interventions for tuberculosis.

Fig. 2
Fig. 2 Phylogenetic tree for the Mycobacterium tuberculosis isolates from China

Fig. 3
Fig. 3 Correlation analysis of iron-related gene mutations and clade size and YWH helped draft the manuscript.YWH, QLH, and YZZ overviewed and supervised the project.All authors read and approved the final manuscript.FundingThis research was supported by the Natural Science Foundation of Shandong Provincial.(No. ZR2020KH013; No. ZR2021MH006; No. ZR2022QH259), the Department of Science & Technology of Shandong Province (CN) (No. 2007GG30002033; No. 2017GSF218052), and the Jinan Science and Technology Bureau (CN) (No. 201704100).

Table 1
The characteristics of Mycobacterium tuberculosis isolates Fig. 1 Distribution of Mycobacterium tuberculosis in various regions

Table 2
Generalized linear mixed model analysis on clustered and non-clustered isolates in the lineage2 cohort OR, odds ratio; CI, confidence interval

Table 3
Generalized linear mixed model analysis on cross-country transmission clades