Skip to main content

Identification of novel single nucleotide variants in the drug resistance mechanism of Mycobacterium tuberculosis isolates by whole-genome analysis



Tuberculosis (TB) represents a major global health challenge. Drug resistance in Mycobacterium tuberculosis (MTB) poses a substantial obstacle to effective TB treatment. Identifying genomic mutations in MTB isolates holds promise for unraveling the underlying mechanisms of drug resistance in this bacterium.


In this study, we investigated the roles of single nucleotide variants (SNVs) in MTB isolates resistant to four antibiotics (moxifloxacin, ofloxacin, amikacin, and capreomycin) through whole-genome analysis. We identified the drug-resistance-associated SNVs by comparing the genomes of MTB isolates with reference genomes using the MuMmer4 tool.


We observed a strikingly high proportion (94.2%) of MTB isolates resistant to ofloxacin, underscoring the current prevalence of drug resistance in MTB. An average of 3529 SNVs were detected in a single ofloxacin-resistant isolate, indicating a mutation rate of approximately 0.08% under the selective pressure of ofloxacin exposure. We identified a set of 60 SNVs associated with extensively drug-resistant tuberculosis (XDR-TB), among which 42 SNVs were non-synonymous mutations located in the coding regions of nine key genes (ctpI, desA3, mce1R, moeB1, ndhA, PE_PGRS4, PPE18, rpsA, secF). Protein structure modeling revealed that SNVs of three genes (PE_PGRS4, desA3, secF) are close to the critical catalytic active sites in the three-dimensional structure of the coding proteins.


This comprehensive study elucidates novel resistance mechanisms in MTB against antibiotics, paving the way for future design and development of anti-tuberculosis drugs.

Peer Review reports


Tuberculosis (TB), primarily caused by Mycobacterium tuberculosis (MTB), is one of the major epidemics worldwide, with a high mortality rate surpassing that of any other infectious disease [1]. In 2021, the World Health Organization estimated that around 10 million people were affected by tuberculosis, leading to 1.6 million deaths [2]. TB is a treatable and curable disease, often managed through a combination of antibiotics. However, a critical challenge in TB treatment lies in the widespread drug-resistance mechanisms manifested by MTB [3].

Drug-resistant tuberculosis can be classified into various categories, including multidrug-resistant tuberculosis (MDR-TB) and extensively drug-resistant tuberculosis (XDR-TB) [4]. MDR-TB exhibits resistance to isoniazid and rifampicin, while XDR-TB exhibits resistance to all first-line drugs and at least one second-line drug [5]. Gupta et al. reported that ofloxacin resistance is significantly high among multidrug-resistant MTB strains across India, most of which were especially associated with the Beijing genotype and carried gyrA mutations [6]. Fluoroquinolone is the classical representative of first-line drugs in TB treatment. MTB acquires resistance to fluoroquinolones mainly through mutations in the quinolone resistance-determining region of the pknB gene [7]. Moxifloxacin is a common second-line drug in the treatment of pneumonia and tuberculosis infections [8]. Clinical trials have shown that moxifloxacin can improve standard treatment regimens with better bactericidal activity [9]. Understanding the mechanisms of drug resistance in MTB is crucial for developing new treatment strategies and improving the management of drug-resistant TB cases.

MTB is characterized by several genomic features that contribute to its adaptability, virulence, and ability to survive within host organisms. In 2013, the reference genome assembly ASM19595v2 of MTB H37Rv was published. Within the genomic landscape of MTB, high-frequency genomic mutations play pivotal roles in drug-resistance mechanisms [10]. In our previous studies, we have identified the function of hypothetical proteins in the MTB genome and explored the drug-resistance mechanism of insertions and deletions in 1110 MTB isolates [11, 12]. In bacteria, gene mutations can contribute to genetic diversity and lead to the emergence of strains resistant to specific drugs. For instance, it was reported that mutations in gene rpoB, which codes for the RNA polymerase beta subunit, were associated with rifampin resistance in MTB [13]. Mutations in gene katG were linked to isoniazid resistance [14]. Thus, investigation of genomic mutations in drug-resistant isolates could help uncover novel drug-resistance mechanisms in MTB.

Single nucleotide variant (SNV) is a mutation at a single position of the genome sequence, often used as a marker to study the association between drug-resistance features [15]. SNVs can be divided into synonymous mutations and non-synonymous mutations, according to their functional consequences. Non-synonymous mutations are genetic variations that result in a change to the amino acid sequence of the encoded protein, potentially altering the structure and function of the protein.

In this study, we analyzed the whole genome of 716 clinical MTB isolates to identify the non-synonymous SNVs in XDR strains. We determined SNVs associated with drug resistance by comparing the whole genomes of XDR isolates against the MTB reference genome. We compared the frequencies of resistant and susceptible strains for different antibiotics using Fisher’s exact test. The non-synonymous mutations in some key genes were reported and discussed. Our findings suggested a novel view of drug-resistance mechanisms in MTB.


WGS data analysis on antibiotic resistance

Our focus centered on exploring the resistance mechanisms of MTB isolates resistant to two fluoroquinolones, namely moxifloxacin and ofloxacin, as well as two second-line drugs, amikacin and capreomycin. The whole-genome sequences of 716 MTB isolates which were tested with at least one drug mentioned above were collected from the BV-BRC database (Fig. 1). Resistance to different drugs varies significantly between geographical locations. For example, for amikacin, Belarus, Russia, and South Korea show relatively high resistance, accounting for 40.01%, 29.2%, and 21.9% respectively, while Iran, Romania, and Uzbekistan have lower resistance. Other drugs such as capreomycin, moxifloxacin, and ofloxacin showed similar trends (Fig. 1C). In this study, a group of 83 MTB isolates were tested by all four antibiotics. The testing results revealed that a set of 251 MTB isolates exhibited resistance to moxifloxacin, while a larger set of 645 isolates demonstrated resistance to ofloxacin (Table 1). Additionally, 283 MTB isolates exhibited resistance to amikacin, and the same number of MTB isolates displayed resistance to capreomycin. The isolates resistant to ofloxacin accounted for the highest proportion, with 94.2% in testing samples, indicating that MTB is more prone to mutations induced by exposure to ofloxacin.

Table 1 Summary of testing results of drug resistance in obtained MTB isolates
Fig. 1
figure 1

Whole-genome sequencing samples obtained in this study. (A) Number of MTB isolates tested by at least 1, 2, 3, or 4 antibiotics; (B) Venn diagram of samples tested by four antibiotics; (C) prevalence of drug resistance

SNV calling by MuMer4

The WGS data of MTB isolates resistant to antibiotics were compared against the H37Rv reference genome using MUMmer4 to identify possible SNVs. The frequency and proportion of SNVs in MTB isolates resistant to the four antibiotics are also calculated (Table 2). Frequency represents the count of a certain mutation that occurs, while proportion represents the proportion of this mutation in the total count of mutations. As a result, we found that the mutation patterns were similar in the MTB isolates resistant to the four antibiotics. Notably, the proportions of T-> A mutations were consistently low across all antibiotics, accounting for only 1.6%, 1.53%, 1.55%, and 1.61%, respectively (Fig. 2A). In contrast, A-> G mutations were the most predominant, which accounted for around 14% of isolates resistant to three antibiotics (ofloxacin, moxifloxacin, and amikacin). Proportions of all the other mutations fell within the range of 2%-14%.

The average mutation number of the four antibiotics is presented in Fig. 2B. The greatest number of mutations were found in the ofloxacin-resistant MTB (an average of 3529 mutations in each MTB isolate). The fewest mutation number was found in the moxifloxacin-resistant MTB (an average of 1370 mutations in each MTB isolate). Considering the size of MTB genome (4,411,532 base pairs [16]), we estimated the mutation rate of one base as 0.080% in ofloxacin-resistant isolates and 0.031% in moxifloxacin-resistant MTB.

Table 2 Frequency and proportion of identified SNV mutations in MTB isolates
Fig. 2
figure 2

Mutation summary in antibiotic-resistant isolates. (A) Average proportion of different mutations in total amount; (B) Average mutation number in MTB isolates resistant to four different antibiotics

SNV screening by Fisher’s exact test

After identifying mutations in MTB, we employed Fisher’s exact test to determine SNVs that distributed differently between resistant and susceptible strains. The detailed number of significant SNVs and the type of mutation for each antibiotic are presented in Table 3. The number of non-synonymous (NS) mutations is larger than that of the synonymous mutations in all MTB isolates. The proportion of NS mutations is around 70% in three antibiotics (moxifloxacin, ofloxacin, amikacin) and 66.8% in capreomycin. A total of 3,536 and 3,541 NS mutations were found in MTB isolates resistant to moxifloxacin and ofloxacin, respectively. In addition, the frequency of screened SNVs showed significant differences in distribution between moxifloxacin-susceptible and moxifloxacin-resistant strains (P ≤ 0.001), but such differential distribution was not observed in the other three strains (Fig. 3). In moxifloxacin-resistant strains, the mutation frequency ranged from 0% to 20%, while in moxifloxacin-susceptible strains, it spanned from 5% to 15%. The median mutation frequency in susceptible strains surpassed that in resistant strains. These results indicated that moxifloxacin pressure induced more NS mutations and affected the evolution mechanism in MTB isolates.

Table 3 Distribution of mutation types of identified SNVs in MTB isolates resistant to each antibiotic
Fig. 3
figure 3

Box plot of frequency rate of SNVs in susceptible and resistant MTB isolates of four drugs

Common SNVs related to four antibiotics

By overlap analysis of Venn diagram, a small set of 18 significant SNVs was found in MTB strains resistant to all four antibiotics, accounting for 0.2% of total identified SNVs (Fig. 4A). The details of these 18 significant SNVs are detailed in Table 4. Among these, three SNVs (A300922G, T3379432C, T3381356C) are located in non-coding regions. Another SNV T103849C located in the coding region of Rv0094c, which encoded a hypothetical protein (accession number NP_214608.1). Information of codon mutations and amino acid mutations was also obtained. The rest 14 significant SNVs were specifically enriched in the coding region of gene PE_PGRS4. Among them, nine SNVs were NS mutations that cause amino acid changes in the encoded protein and five SNVs were synonymous mutations. Of the nine NS mutations, five SNV mutations resulted in an amino acid change from hydrophobic (pho) AA to hydrophilic (phi) AA, while three SNVs led to a reverse mutation from phi AA to pho AA (Fig. 4B). These results suggest that most SNVs could change the physiochemical characteristics of coding amino acids.

Fig. 4
figure 4

Common SNVs in MTB isolates are resistant to four antibiotics. (A) Venn diagram of common SNVs in MTB isolates resistant to different drugs; (B) Hydrophilic and hydrophobic analysis of translated amino acid of common SNVs.

Table 4 Details of 18 common significant SNVs in MTB isolates resistant to four antibiotics. AA: amino acid; S: synonymous; NS: non-synonymous; pho: hydrophobic; phi: hydrophilic; Ref: nucleotide of the reference genome; Alt: nucleotide of the alternative genome

Common genes in XDR-TB

The XDR-TB is defined as isolates that are resistant to first-line drugs (moxifloxacin and ofloxacin) and a least one second-line drug, i.e., the isolates resistant to three antibiotics (moxifloxacin, ofloxacin, capreomycin) or (moxifloxacin, ofloxacin, amikacin). In the case of XDR-TB resistant to moxifloxacin, ofloxacin, and capreomycin, we only found PE_PGRS4 and Rv0094c. In another case of XDR-TB resistant to moxifloxacin, ofloxacin, and amikacin, we identified more genes mutated, with a set of 60 significant SNVs (Fig. 5A and Table S1). Of the 60 SNVs, 51 were located within the coding region while the remaining nine were found in non-coding regions. This result indicated that the drug-resistance-associated SNVs were mostly enriched in the coding region of genes. In addition, 20 SNVs were NS mutations corresponding to 20 genes (ctpI, desA3, mce1R, moeB1, ndhA, PPE18, rpsA, secF, Rv0311, Rv0347, Rv0654, Rv0698, Rv0888, Rv0923c, Rv1431, Rv1672c, Rv1692, Rv1836c, Rv2028c, Rv3630). The details of these 20 genes are shown in Table 5. Out of these mutations, seven SNVs cause the coding AA to change from hydrophobic to hydrophilic, and other six SNVs do not change hydrophobic features (Fig. 5B). These results suggest that MTB proteins tend to mutate from hydrophilic to evade the hydrolysis by the host enzymes.

Table 5 Details of 20 genes in XDR-TB. The table did not include the PE_PGRS4 and Rv0094c, which are shown in Table 4. AA: amino acid; pho: hydrophobic; phi: hydrophilic
Fig. 5
figure 5

Common SNVs in XDR-TB with resistance to three antibiotics (moxifloxacin, ofloxacin, and amikacin). (A) Venn diagram of common SNVs in three antibiotics; (B) Hydrophilic and hydrophobic analysis of translated amino acid of common SNVs

Three-dimensional protein structures of identified genes

Understanding three-dimensional (3D) structures of proteins is crucial for unraveling their functions and interactions in biological processes. We so far have identified 22 genes related to drug-resistance mechanisms in MTB, including 13 genes with only locus information (Rv0094c, Rv0311, Rv0347, Rv0654, Rv0698, Rv0888, Rv0923c, Rv1431, Rv1672c, Rv1692, Rv1836c, Rv2028c, Rv3630) and nine genes with more detailed information (ctpI, desA3, mce1R, moeB1, ndhA, PE_PGRS4, PPE18, rpsA, secF). We constructed the 3D structure of coding proteins for these nine genes by SWISS-MODEL and the parameters of these structures are shown in Table 6. Results showed the sequence identity scores with all templates were larger than 70%, indicating a high similarity between the template and our proteins. In particular, the structural templates of six proteins (PE_PGRS4, ctpI, moeB1, rpsA, ndhA, and PPE18) were identical to the homologous protein. The Global Model Quality Estimation (GMQE) value, is common index used for model quality evaluation in SWISS-MODEL. The GMQE values of all proteins are larger than 0.6 in 3D model. Specifically, the GMQE values of five coding proteins (PE_PGRS4, desA3, mce1R, moeB1, and ndhA) are larger than 0.9, suggesting the robust and reliable of AlphaFold method in the prediction of the three-dimensional structures in SWISS-MODEL.

Table 6 Summary of three-dimensional structures of proteins related to drug resistance mechanisms predicted by SWISS-MODEL. GMQE: Global Model Quality Estimation; AFDB: AlphaFold Protein Structure Database

Our analysis revealed four important SNV cluster regions within the PE_PGRS4 gene, highlighting its key role in various antibiotic-resistant strains. The 3D structure analysis showed that the SNV mutations of PE_PGRS4 were exclusively located within its β-folding region (Fig. 6). Results showed that Arg at position 174 exhibited three types of mutations (Arg-> Gly, Arg-> Met, and Arg-> Ser), while Ala at position 188 had two types of mutations (Ala-> Asp and Ala-> Thr). Additionally, Thr, Thr, Ala, and Ala at positions 177, 179, 182, and 196 experienced mutations to Arg, Ile, Ser, and Gly, respectively. The active site residue Ala is at position 196. Taken together, the amino acid positions from 174 to 188 in β-folding region of PE_PGRS4 gene were speculated to serve as the important catalytic sites in the protein function. In addition, the other eight genes found in isolates resistant to the other three antibiotics testing also displayed a solitary NS mutation, as illustrated in Fig. 7. Mutations in the five genes (ctpI, moeB1, rpsA, secF, PPE18) were situated within the α-helix structure, while mutations of rest genes (desA3, mce1R, and ndhA) located within the β-sheet structure.

Fig. 6
figure 6

The positions of the amino acid mutation are caused by all non-synonymous SNVs in the 3D structure of the PE_PGRS4 protein

Fig. 7
figure 7

The positions of the amino acid mutations were caused by non-synonymous SNVs in 3D structure of the coding proteins of identified genes

Identify critical active sites affected by SNVs

To uncover possible drug-resistance mechanisms, active sites of coding proteins of above genes were found by PrankWeb. The active sites affected by SNVs are shown in Table 7. Among these proteins, six (PE_PGRS4, desA3, mce1R, moeB1, secF, ndhA) of them were found at active sites nearby the position of amino acid mutations translated by SNVs. In particular, the position of mutated amino acids of three genes (PE_PGRS4, DesA3, secF) were closely near the active sites. The active site 172 of PE_PGRS4 was only two residues away from the mutation site 174. The mutation site 196 of desA is also the active site. The mutation site 79 of secF was located between active sites 77 and 78. These findings underscored the structural alterations induced by genetic mutations in key MTB genes, shedding light on potential implications for protein function and, by extension, drug resistance mechanisms.

Table 7 Active sites of coding proteins predicted by PrankWeb. Three proteins (ctpI, rpsA, PPE18) were not found active sites nearby mutated amino acids


Mycobacterium tuberculosis is a bacterium that causes tuberculosis (TB), a potentially serious infectious disease that primarily affects the lungs. TB remains a significant global health concern, and efforts to control and eliminate the disease are ongoing. MTB is known for its slow growth rate compared to many other bacteria. However, MTB could be spread through the air when an infected person coughs or sneezes, releasing tiny droplets containing the bacteria [17]. Understanding the genome of MTB is essential for unraveling the mechanisms of its pathogenicity, drug resistance, and host interactions.

MTB have developed resistance to various drugs through a combination of genetic mutations and selective pressures [18]. The emergence of drug-resistant strains poses a significant challenge to TB control and treatment efforts. The drug resistance phenomenon could decrease the sensitivity of MTB strains and diminish the efficacy of available antibiotics [19]. This pathogen can carry antigenic mutation, allowing it to evade the immune response of host. Such mutations involve changes in surface antigens, making it more challenging for the immune system to recognize and eliminate the bacterium [20]. Understanding the mechanisms of drug resistance in MTB is crucial for developing new treatment strategies and improving the management of drug-resistant TB cases.

In this study, whole-genome sequencing (WGS) data of 716 clinical MTB isolates were analyzed for their drug-resistance mechanisms to two first-line drugs (moxifloxacin and ofloxacin) and two second-line drugs (amikacin and capreomycin). Moxifloxacin and ofloxacin are both fluoroquinolone antibiotics that target bacterial DNA synthesis and protein synthesis [21]. Amikacin inhibits translocation by binding peptide tRNA at the ribosomal A-site, thereby suppressing protein synthesis and rendering bacteria unable to survive [22]. Capreomycin, a ribosome-targeting peptide antibiotic, inhibits tRNA binding by interacting with the ribosome, thereby inhibiting protein synthesis [23]. We found a high proportion (94.2%) of testing isolates showed resistance to ofloxacin, indicating that this drug may not be suitable for the treatment of MTB. These MTB isolates were compared with H37Rv reference genome by MuMmer4 to identify SNVs. An average number of 3529 SNVs were observed per ofloxacin-resistant MTB isolate, with a mutation rate of around 0.08% under the selection pressure of ofloxacin.

As only a minority of SNVs influence MTB drug resistance mechanisms, Fisher’s exact test was employed to compare the mutation frequencies in the resistant and susceptible strains exposed to the four antibiotics (moxifloxacin, ofloxacin, amikacin, and capreomycin). At a significance threshold of p-value < 0.05, we found a total of 3536 and 3541 SNVs in moxifloxacin-resistant and ofloxacin-resistant MTB isolates respectively. The number of NS mutations is larger than that of the synonymous mutations in all four antibiotics.

To understand the resistance mechanisms in MTB, we examined 18 shared SNVs associated with these four antibiotics. Of the 18 SNVs, we found nine non-synonymous SNVs located in the coding region of PE_PGRS4 and Rv0094c. Previous study showed that the transcriptional expression profile of PE_PGRS4 was constitutively expressed and up-regulated under the circumstances of many antibiotics [24]. Besides, the mutations of PE_PGRS4 gene have been previously demonstrated in drug-resistant MTB [25]. Other SNVs in non-coding regions, such as A300922G, T3379432C, and T3381356C, may also have potential functional relevance. While these mutations may not directly impact protein coding, they could play a regulatory role in MTB. Regulatory variants have potential to exert a substantial influence on phenotype, emphasizing their significance in understanding the broader implications of these genetic mutations [26].

To further analyze the drug-resistant mechanism in XDR-TB, we identified a set of 60 shared SNVs related to three antibiotics (moxifloxacin, ofloxacin, and capreomycin). In addition to the 18 shared SNVs identified in MTB resistant to four antibiotics, we also identified 42 SNVs located in the coding region of eight genes (ctpI, desA3, mce1R, moeB1, ndhA, PPE18, rpsA, and secF). Previous studies indicated that these genes were highly related to drug-resistant mechanisms in XDR-TB, for example, rpsA, secF, and desA3. The rpsA protein plays a crucial role in translation initiation and mRNA binding during protein synthesis. Due to its essential role in protein synthesis, protein rpsA is considered a potential target for the development of antimicrobial drugs [27]. The secF protein is involved in the process of protein secretion across the bacterial inner membrane. The disruptions in protein secretion processes can potentially impact the physiology of the bacterium, thus the secF mutation is considered to be related to drug resistance in MTB [28]. The desA3 gene is associated with the biosynthesis of oleic acids, which is essential for the formation of the cell wall of actively replicating bacteria [29, 30]. The mutation of these genes may highly affect their function, which results in drug-resistant events in MTB.

We then constructed the 3D model of the coding protein of these nine genes. 3D structural analysis showed that SNV locations of three genes (PE_PGRS4, DesA3, and secF) were close to the active sites. These structural changes may have implications for protein structure and function, including protein stability, folding state, and interactions. For example, the mutations at position 174 are from arginine to glycine, resulting in the atom number also changed significantly. Specifically, arginine has six atoms on its side chain, while glycine has only one atom on its side chains. Previous studies showed that glycine-to-arginine mutations could cause conformational change and impaired transition metal transport in bacteria [31]. This result enhanced that the identified SNVs could be highly associated with drug-resistance mechanisms in MTB.


In this study, we obtained 716 MTB isolates tested by four antibiotics and identified the mutated SNVs by MuMmer4. Fisher’s exact test was applied to whole-genome sequencing data from clinically relevant MTB isolates to identify SNVs related to drug resistance. Non-synonymous mutations within the PE_PGRS4 gene were found in strains resistant to four antibiotics, highlighting the potential of PE_PGRS4 as a biomarker of broad-spectrum resistance. Our exploration of XDR-TB isolates revealed distinctive roles of another eight genes (ctpI, desA3, mce1R, moeB1, ndhA, PPE18, rpsA, and secF) in resisting second-line drugs. This study illustrates the complexity of drug-resistant tuberculosis and emphasizes the significance of non-synonymous SNVs in MTB drug resistance. These findings not only advance our comprehension of MTB resistance mechanisms but also offer insights for targeted intervention development.

Materials and methods

WGS data retrieving and analysis

Whole-genome sequencing (WGS) data of 716 MTB isolates (Table S2) were obtained from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) database [32]. The MTB H37Rv reference genome was downloaded from the National Center for Biotechnology Information (NCBI) Genome database. The gene annotation file was also retrieved from the NCBI database.

SNV calling by MuMmer4

SNVs are the most common type of genetic mutation among pathogens and are often used as biomarkers to study the association between genetic mutations and bacterial traits. These SNVs represent mutations arising from point mutations at precise gene locations, resulting in resistance against certain antibiotics in bacteria [33]. The WGS data of MTB isolates resistant to antibiotics were compared against the H37Rv reference genome using MUMmer4 to identify possible SNVs by the script ‘nucmer reference_file.fasta query_file.fasta show-snps -Clr’. MUMmer4 presents a notable performance in processing large genomes, enhanced speed, compatibility with scripting languages, and suitability for SNV calling [34].

SNV screening by Fisher’s exact test

While most genomic mutations have minimal effects on phenotype, only a few SNVs were expected to strongly influence complex traits of bacteria [35]. In this study, a group of 716 strains were categorized as either resistant or sensitive to four antibiotics (moxifloxacin, ofloxacin, amikacin, and capreomycin). Subsequently, Fisher’s exact test was employed to compare the frequencies of resistant and non-resistant strains for each antibiotic and SNVs were screened at a threshold of statistical p-value < 0.05. Fisher’s exact test, which is recognized for its better accuracy with a small sample test, was chosen to assess the null hypothesis of SNVs in drug-resistant features [36]. For isolates resistant to moxifloxacin, ofloxacin, amikacin, and capreomycin, we identified 4996, 5026, 1730, and 2864 significant SNVs, respectively.

The formula for Fisher’s exact test is as follows:

$$\displaylines{P = {{\left( {\matrix{A \cr {A + C} \cr } } \right)\left( {\matrix{B \cr {B + D} \cr } } \right)} \over {\left( {\matrix{{A + B} \cr N \cr } } \right)}} = {{\left( {\matrix{B \cr {A + B} \cr } } \right)\left( {\matrix{D \cr {C + D} \cr } } \right)} \over {\left( {\matrix{{B + D} \cr N \cr } } \right)}} \cr = {{\left( {A + B} \right)!\left( {C + D} \right)!\left( {A + C} \right)!\left( {B + D} \right)!} \over {A!B!C!D!}} \cr}$$

Presence of a SNV

Absence of a SNV


Resistant MTB strains



A + B

Non-resistant MTB strains



C + D


A + C

B + D

A + B + C + D = N

  1. A: The number of strains with a given SNV in resistant strains.
  2. B: The number of strains without the given SNV in resistant strains.
  3. C: The number of strains with the given SNV in non-resistant strains.
  4. D: The number of strains without the given SNV in non-resistant strains.

Common SNVs related to four antibiotics

SNV cluster regions, defined by the presence of multiple neighboring SNVs in the same or adjacent genomic regions, potentially influence MTB resistance [37]. We investigated the common resistance mechanism among the four antibiotic-resistant strains to identify shared SNVs among them. The genes associated with these SNVs were also identified based on their genomic location. Only PE_PGRS4 and Rv0094c were found in the SNVs related to four antibiotic-resistant isolates. Thus, the biological parameters of these two genes were further analyzed.

Common genes in XDR-TB

The XDR-TB was defined as those cases resistant to all first-line drugs (mainly fluoroquinolones) and a least one second-line drug [5]. To further investigate the resistance mechanisms associated with XDR-TB, we analyzed the shared significant SNVs for two fluoroquinolones (moxifloxacin and ofloxacin) and one of second-line drugs (amikacin and capreomycin). Thus, the two cases of XDR-TB (two fluoroquinolones with ofloxacin, and two fluoroquinolones with capreomycin) were further analyzed. The involved transcribed amino acids and proteins of these genes were identified to distinguish synonymous and non-synonymous (NS) mutations.

Construction of three-dimensional structures of identified proteins

Protein sequences, coupled with their three-dimensional (3D) structures, provide critical information for understanding protein functions, interactions, and biological processes [38]. In this study, three-dimensional structures of proteins with non-synonymous mutations were constructed by SWOSS-MODEL, the online server for automated protein homology modeling [39]. The SWISS-MODEL could incorporate features to predict the structure and stoichiometry parameters of complexes based on the amino acid sequences of interacting proteins. A novel modeling engine (AlphaFold) and a local model quality assessment method (QMEANDisCo) were applied to enhance the accuracy of protein modeling in SWISS-MODEL [40].

Identify critical active sites affected by SNVs

Mutations in the active site can lead to dramatic changes in protein activity and affect the efficiency of binding drugs, thus identification of the active site is necessary. In this study, protein active sites were predicted by PrankWeb, which adopted deep learning model to characterize the binding site of protein and the ligand [41]. PrankWeb is a user-friendly web tool that allows users to enter a UniProt accession number as the input. The predicted active sites were then compared with the locus of amino acid changes caused by SNVs to identify critical active sites.

Data availability

All genome sequencing data for this analysis are available in the BV-BRC database (



Animo acid




Mycobacterium tuberculosis


Whole-genome sequencing

NS mutation:

Non-synonymous mutation

S mutation:

Synonymous mutation


Single nucleotide variant






Extensively drug-resistant tuberculosis


Multidrug-resistant tuberculosis


National Center for Biotechnology Information


Bacterial and Viral Bioinformatics Resource Center




Global Model Quality Estimation


AlphaFold Protein Structure Database


  1. Fernandes GF, et al. Tuberculosis drug discovery: challenges and new horizons. J Med Chem. 2022;65(11):7489–531.

    Article  CAS  PubMed  Google Scholar 

  2. Ludi Z, et al. Diagnosis and biomarkers for ocular tuberculosis: from the present into the future. Theranostics. 2023;13(7):2088.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Miotto P, et al. Transcriptional regulation and drug resistance in Mycobacterium tuberculosis. Front Cell Infect Microbiol. 2022;12:990312.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Prasanna A, Niranjan V. Classification of Mycobacterium tuberculosis DR, MDR, XDR isolates and identification of signature mutationpattern of drug resistance. Bioinformation. 2019;15(4):261.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Organization WH. Meeting report of the WHO expert consultation on the definition of extensively drug-resistant tuberculosis, 27–29 October 2020

  6. Gupta A, et al. Genotype analysis of ofloxacin-resistant multidrug-resistant Mycobacterium tuberculosis isolates in a multicentered study from India. Indian J Med Res. 2020;151(4):361–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Gupta A, et al. PknB remains an essential and a conserved target for drug development in susceptible and MDR strains of M. Tuberculosis. Ann Clin Microbiol Antimicrob. 2017;16:1–10.

    Article  Google Scholar 

  8. Prasad R, Singh A, Gupta N. Adverse drug reactions with first-line and second-line drugs in treatment of tuberculosis. Ann Natl Acad Med Sci (India). 2021;57(01):15–35.

    Article  Google Scholar 

  9. Budak M, et al. Optimizing tuberculosis treatment efficacy: comparing the standard regimen with moxifloxacin-containing regimens. PLoS Comput Biol. 2023;19(6):e1010823.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Wang L, et al. Whole-genome sequencing of Mycobacterium tuberculosis for prediction of drug resistance. Epidemiol Infect. 2022;150:e22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Yang Z, Zeng X, Tsui K-WS. Investigating function roles of hypothetical proteins encoded by the Mycobacterium tuberculosis H37Rv genome. BMC Genomics. 2019;20(1):394.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Zeng X, et al. Whole genome sequencing data of 1110 Mycobacterium tuberculosis isolates identifies insertions and deletions associated with drug resistance. BMC Genomics. 2018;19:1–10.

    Article  Google Scholar 

  13. Ghosh A, S. N, and, Saha S. Survey of drug resistance associated gene mutations in Mycobacterium tuberculosis, ESKAPE and other bacterial species. Sci Rep. 2020;10(1):8957.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hsu L-Y, et al. Two novel katG mutations conferring isoniazid resistance in Mycobacterium tuberculosis. Front Microbiol. 2020;11:1644.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Chaiyachat P, et al. Whole-genome analysis of drug-resistant Mycobacterium tuberculosis reveals novel mutations associated with fluoroquinolone resistance. Int J Antimicrob Agents. 2021;58(3):106385.

    Article  CAS  PubMed  Google Scholar 

  16. Maladan Y, et al. The whole-genome sequencing in predicting Mycobacterium tuberculosis drug susceptibility and resistance in Papua, Indonesia. BMC Genomics. 2021;22(1):844.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Patterson B, Wood R. Is cough really necessary for TB transmission? Tuberculosis. 2019;117:31–5.

    Article  PubMed  Google Scholar 

  18. Leong KW, et al. Comparative genomic analyses of multi-drug resistant Mycobacterium tuberculosis from Nepal and other geographical locations. Genomics. 2022;114(2):110278.

    Article  CAS  PubMed  Google Scholar 

  19. Waller NJ, et al. The evolution of antibiotic resistance is associated with collateral drug phenotypes in Mycobacterium tuberculosis. Nat Commun. 2023;14(1):1517.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Huang H, et al. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nat Biotechnol. 2020;38(10):1194–202.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Bhatt S, Chatterjee S. Fluoroquinolone antibiotics: occurrence, mode of action, resistance, environmental detection, and remediation–A comprehensive review. Environ Pollut. 2022;315:120440.

    Article  CAS  PubMed  Google Scholar 

  22. Takahashi Y, Igarashi M. Destination of aminoglycoside antibiotics in the ‘post-antibiotic era’. J Antibiot. 2018;71(1):4–14.

    Article  CAS  Google Scholar 

  23. Polikanov YS, et al. The mechanisms of action of ribosome-targeting peptide antibiotics. Front Mol Biosci. 2018;5:48.

    Article  PubMed  PubMed Central  Google Scholar 

  24. De Maio F, et al. PE_PGRS proteins of Mycobacterium tuberculosis: a specialized molecular task force at the forefront of host–pathogen interaction. Virulence. 2020;11(1):898–915.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Bekier A, et al. Imidazole-thiosemicarbazide derivatives as potent anti-mycobacterium tuberculosis compounds with antibiofilm activity. Cells. 2021;10(12):3476.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Xu C, et al. Quantifying functional impact of non-coding variants with multi-task bayesian neural network. Bioinformatics. 2020;36(5):1397–404.

    Article  CAS  PubMed  Google Scholar 

  27. Shi W, et al. Introducing RpsA point mutations ∆438A and D123A into the chromosome of Mycobacterium tuberculosis confirms their role in causing resistance to pyrazinamide. Antimicrob Agents Chemother. 2019;63(6):e02681.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Miller BK, Zulauf KE, Braunstein M. The Sec pathways and exportomes of Mycobacterium tuberculosis. Tuberculosis Tuber Bacillus, 2017: 607–25.

  29. Singh A, et al. Identification of a desaturase involved in mycolic acid biosynthesis in Mycobacterium smegmatis. PLoS ONE. 2016;11(10):e0164253.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Comín J, et al. Transcriptomic profile of the most successful Mycobacterium tuberculosis strain in Aragon, the MtZ strain, during exponential and stationary growth phases. Microbiol Spectr. 2023;11(6):e04685.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Bozzi AT, et al. Crystal structure and conformational change mechanism of a bacterial nramp-family divalent metal transporter. Structure. 2016;24(12):2102–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Olson RD, et al. Introducing the bacterial and viral bioinformatics resource center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic Acids Res. 2023;51(D1):D678–89.

    Article  CAS  PubMed  Google Scholar 

  33. Petkau A, et al. SNVPhyl: a single nucleotide variant phylogenomics pipeline for microbial genomic epidemiology. Microb Genomics. 2017;3(6):e000116.

    Article  Google Scholar 

  34. Marçais G, et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput Biol. 2018;14(1):e1005944.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Visscher PM, et al. 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet. 2017;101(1):5–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Connelly LM. Fisher’s exact test. MedSurg Nurs. 2016;25(1):58–60.

    PubMed  Google Scholar 

  37. Liu L, et al. Analysis of tigecycline resistance development in clinical Acinetobacter baumannii isolates through a combined genomic and transcriptomic approach. Sci Rep. 2016;6(1):26930.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Kuhlman B, Bradley P. Advances in protein structure prediction and design. Nat Rev Mol Cell Biol. 2019;20(11):681–97.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Silva MA, et al. Comparative homology of Pleurotus Ostreatus laccase enzyme: Swiss model or Modeller? J Biomol Struct Dynamics. 2023;41(18):8927–40.

    Article  CAS  Google Scholar 

  40. Waterhouse A, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46(W1):W296–303.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Jakubec D, et al. PrankWeb 3: accelerated ligand-binding site predictions for experimental and modelled protein structures. Nucleic Acids Res. 2022;50(W1):W593–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


This research was funded by National Natural Science Foundation of China (grant number 61903107).

Author information

Authors and Affiliations



Zhiyuan Yang and Stephen Tsui conceptualized and designed the project. Weiye Qian, Nan Ma, Xi Zeng, Mai Shi, and Mingqiang Wang analyzed data and draw the figures. Weiye Qian wrote the draft manuscript and Zhiyuan Yang reviewed the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Zhiyuan Yang or Stephen Kwok-Wing Tsui.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qian, W., Ma, N., Zeng, X. et al. Identification of novel single nucleotide variants in the drug resistance mechanism of Mycobacterium tuberculosis isolates by whole-genome analysis. BMC Genomics 25, 478 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: