Whole genome sequencing analysis to evaluate the influence of T2DM on polymorphisms associated with drug resistance in M. tuberculosis

Background Type 2 diabetes mellitus (T2DM) has been associated with treatment failure, and the development of drug resistance in tuberculosis (TB). Also, whole-genome sequencing has provided a better understanding and allowed the growth of knowledge about polymorphisms in genes associated with drug resistance. Considering the above, this study analyzes genome sequences to evaluate the influence of type 2 diabetes mellitus in the development of mutations related to tuberculosis drug resistance. M. tuberculosis isolates from individuals with (n = 74), and without (n = 74) type 2 diabetes mellitus was recovered from online repositories, and further analyzed. Results The results showed the presence of 431 SNPs with similar proportions between diabetics, and non-diabetics individuals (48% vs. 52%), but with no significant relationship. A greater number of mutations associated with rifampicin resistance was observed in the T2DM-TB individuals (23.2% vs. 16%), and the exclusive presence of rpoBQ432L, rpoBQ432P, rpoBS441L, and rpoBH445L variants. While these variants are not private to T2DM-TB cases they are globally rare highlighting a potential role of T2DM. The phylogenetic analysis showed 12 sublineages, being 4.1.1.3, and 4.1.2.1 the most prevalent in T2DM-TB individuals but not differing from those most prevalent in their geographic location. Four clonal complexes were found, however, no significant relationship with T2DM was observed. Samples size and potential sampling biases prevented us to look for significant associations. Conclusions The occurrence of globally rare rifampicin variants identified only in isolates from individuals with T2DM could be due to the hyperglycemic environment within the host. Therefore, further studies about the dynamics of SNPs’ generation associated with antibiotic resistance in patients with diabetes mellitus are necessary. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08709-z.


Introduction
Tuberculosis (TB) maintains a severe impact on global public health; according to the World Health Organization (WHO), in 2020 it caused 1.5 million deaths, and 10 million people sickened. One of the factors associated with the disease´s high mortality is the development of drug resistance. In 2018, approximately 500,000 Open Access *Correspondence: robzencue@gmail.com individuals developed resistance to rifampicin, and from this patient's group, 82% developed multi-drug resistance [1].
Studies have observed an increased risk of treatment failure, and the development of drug resistance in people with type 2 diabetes mellitus (T2DM). However, the impact of this condition on the development of drug resistance remains unclear [2][3][4]. A possible explanation may lie in the altered pharmacokinetics of T2DM-TB cases leading to heterogenous drug exposure during treatment [3].
Whole-genome sequencing (WGS) has opened the possibility to increase our understanding of drug resistance mechanisms. Particularly, the identification of the presence of polymorphisms in target genes associated with the performance of anti-tuberculosis drugs will aid in the development of the predictive diagnosis of drug resistance in TB [5].
Considering all the above, the present study aims to characterize the polymorphisms associated with drug resistance, and to determine the influence of diabetes in the occurrence of these mutations by analyzing whole genome sequences of M. tuberculosis (Mtb) from individuals with TB, and T2DM-TB binomial.

Patient characteristics
From more than a 1 million Mtb WGS currently available in the repositories analyzed, only 827 genomes had clinical information related to the resistance profile and confirmation of the presence or absence of T2DM. From this group, only 74 genomes from individuals with T2DM had the required sequencing depth, lineage, and coverage to be included in the study. An additional set of 74 WGS from individuals with drug resistance and without T2DM were retrieved and randomly included to match both study groups. Individuals carrying the 148 Mtb strains were mostly male 84 (57.4%), with a mean age of 47 ± 13 years at the time of sputum sample collection.
During this study were identified 431 SNPs confirmed to be associated with resistance [6]  Fifty-six high confidence resistance-related variants were identified, from which, 16 had a proportion greater than 1% in the dataset. Genotypic resistance was given by genomic variants associated with anti-tuberculosis drugs, and mainly concentrated in genes associated with this property. Resistance to isoniazid was given by in 58 isolates (78.3%) from the T2DM group, vs 55 on the isolates (71.6%) with only TB (p = 0.3368), rifampicin 50 (67.5%) vs 53 (71.6%) (p = 0.2874) and ethambutol 27 (36.4%) vs 37 (50%) (p = 0.2753).
Three mutations were observed with a frequency greater than 15% in both groups; katG S315T was found in 91 sequences (61.4%), and fabG1-inhA in 62 (41.8%); these were the only genes associated with resistance to isoniazid observed in the dataset. These mutations were followed by rpoB S450L in 58 isolates (39.1%) ( Table 2). Nevertheless, no differences on these mutations were observed in the groups. However, when comparing between the groups, no significant differences were observed in the variants associated with resistance to isoniazid (p = 0.9445) and (p = 0.6253), respectively, nor for rifampicin (p = 0.9675).
The occurrence of rifampicin resistance-associated variants identified only in patients with T2DM had a heterogeneous phylogenetic distribution suggesting that they are not driven by geographically prevalent strains. The variants observed in a single isolate were rpoB S441L, and rpoB Q432P, belonging to sublineage 4.10 (PGG3), whereas rpoB Q432L, was observed in another isolate, and was found in

Discussion
Drug resistance generation in Mtb is caused by chromosomal mutations resulting in clones that proliferate, and subsequently fixate on the host [7]. The mutation rate of Mtb is low with an estimated 0.3-0.5 SNPS changes per year [8,9], but this condition seems to be affected by the internal environment of the host, and by the pharmacokinetic variability of the plasma concentration of antituberculosis drugs [10]. In this study, genome sequences of Mtb from individuals with drug resistant tuberculosis, and with or without T2DM, were analyzed to determine the impact of T2DM on the generation of drug resistance, given that diabetes increases the risk of developing drug resistance, and multidrug resistance [2][3][4]11], also modifies the host's internal environment by promoting the generation of reactive oxygen species in mycobacteria [12], which could lead to changes in the mutation rate in these subjects.
The polymorphism analysis showed that the most frequent variants were associated with the first-line drugs, isoniazid, and rifampicin. Mutations in the katG S315T and fabG1-inhA genes showed the highest proportions in both groups of individuals, this is consistent with previous studies [13,14], and could be due to the fact that these variants are the first to appear in the genome of resistant strains from different parts of the world [15].
In the T2DM group four specific rpoB gene variants were found: H445L, S441L, Q432P, and Q432L. These variants have been reported in studies where comorbidities are not specified [16][17][18][19]. According to the recent Catalogue of mutations related with drug resistance in Mycobacterium tuberculosis complex of WHO, these mutations have a low prevalence (1.17%, 0.26%, 0.29% and 0.21% respectively), and have been classified as SNPs associated with phenotypic resistance against rifampicin [20]. These variants were found in four different sublineages: 4.10 (PGG3), 4.3.4.2 (LAM11), 4.1.2 (T1; H1) and 4.3.3 (LAM9). Only two isolates with the H445L variant were found in a clonal complex (C1), which could indicate that remaining mutations were not acquired through clonal transmission but were the product of processes specific of the host, possibly by consequence to the complications in the drug treatment in T2DM individuals, and partially explaining the increased risk for the development of drug resistance [2][3][4]. Also with relation to RIF resistance, the rpoB Q432L, and rpoB Q432P mutations have been identified in isolates with a high minimum inhibitory concentration (MIC), while rpoB Legend shows resistance classification. The scale represents the number of substitutions per site for each position of the alignment S441L, and rpoB H445L showed medium, and low MIC respectively [18]. Likewise, pharmacokinetics studies have found that individuals with T2DM had 25% lower average plasma concentration, and an increase in time to reach the maximum concentration of rifampicin [3]. The rpoB Q432L and rpoB S441L variants, have been associated with a moderate fitness cost in in vitro studies [21], the previously mentioned changes in drug distribution could favor the proliferation of bacterial population carrying these mutations.
As explained above, the specific occurrence of these mutations in the T2DM group could be due to poor glycemic control that alters tissue perfusion, and rifampicin pharmacokinetics; however, the sample size in which we found these variants prevented us from looking for associations between the groups. Further studies with a larger number of individuals are needed to determine whether certain variants in the rifampicin resistance determining region in subjects with diabetes could be due to internal environmental conditions resulting from a hyperglycemic condition.
The phylogenetic analysis showed heterogeneity in the 12 sublineages found in both groups of individuals. The 4.1.1.3 (X3/X1), and 4.1.2.1 (Haarlem) sublineages included the higher number of isolates coming from individuals with T2DM. In particular, the largest difference in the isolates' proportion between the groups was observed in the X3/X1 lineage, this could be linked to the country of origin of these isolates, since 83% came from Mexico. A similar situation happened with Haarlem sublineage, where 66% of the isolates from this sublineage, came from Mexico, and Indonesia. A total of 33 isolates were grouped into four clonal complexes, of which only C3, was composed of sublineage 4.1.1.3 (X3/X1) and showing a higher proportion of isolates from individuals with T2DM (six TB-T2DM vs. three TB). The high proportion, and differences observed in the distribution of isolates between diabetic, and non-diabetic individuals in the lineages from these countries could be explained with the fact that Mexico, and Indonesia are among the top 3 places in the worldwide prevalence of T2DM [22].
It has been described that the phylogenetic sublineages of M. tuberculosis complex are associated with specific geographic regions [23,24]; therefore, the fact that diabetes was found with higher proportions in certain sublineages could be a consequence of the disease burden in the specific countries; moreover, although the relationship between lineage and glycemic level has not been characterized, a possible association between dysglycemia, and the distribution of Haarlem strains has been recently described [25]. To establish a clearer relationship, are required studies incorporating a larger number of WGSs from individuals with T2DM from different sublineages.
The strength of this study lies in the fact that, to the best of our knowledge, there are no studies using WGS to analyze the impact of T2DM on the emergence of SNPs associated with drug resistance in L4 isolates the largest Mtb lineage distributed globally [26]. Furthermore, the main limitation was the lack of information about comorbidities in genomic sequences of M. tuberculosis isolates available in repositories. From the 827 drug resistant isolates initially identified, only 74 fulfilled the criteria to be included in the TB-T2DM group, this stands out the need to include information related to comorbidity in the metadata available in the genome repositories. This would be of great help to improve further analysis of genomes, also to evaluate the specific effect of the host in the development of drug resistant in TB.
Finally, the lack of differences in the polymorphisms between the T2DM-TB, and TB groups could be due to the fact that these sequences come from isolates that had already been diagnosed with a resistance profile, as evidenced by the high proportion of fixed SNPs identified in the sample [27], which hinders the identification of the time of appearance of the resistance mutations, and the generation of these changes as consequence of the internal environment caused by type 2 diabetes mellitus in the host.

Conclusions
With this study it was possible to provide preliminary information about the impact of T2DM on drug resistance in whole genome sequences of M. tuberculosis, the results showed no significant association among the number of polymorphisms associated with resistance and the presence of T2DM. However, four variants in rpoB were observed only in individuals with T2DM. Undoubtedly, further studies about the dynamics of SNPs' generation associated with antibiotic resistance in patients with diabetes mellitus are necessary, the results will have an important effect on the establishment of new therapeutic, attention, and control models for tuberculosis in the context of T2DM.

Sample selection
A search for whole genome sequences (WGSs) of Mtb from patients with drug resistance, and with or without T2DM was carried out in public databases; GenBank of the National Center for Biotechnology Information, the European Nucleotide Archive, and TB Portals of the National Institute of Health. In addition, 8 Mtb sequences were kindly provided by Dr. Iñaki Comas Espadas from Centro de Investigaciones Biomédicas de Valencia, Spain (manuscript in preparation). Only the WGS that incorporate the following information in the metadata associated were included in the study: presence/absence of a diagnosis of diabetes in the patient, phenotypic and genotypic profile of drug resistance, coverage greater than 99%, depth greater than 100X, and belonging to lineage 4 (L4). Furthermore, those sequences were grouped according to host characteristics into two groups: 1) Host with TB drug resistant, and diabetes mellitus type 2 (T2DM), and 2) Host with TB drug resistant, and without diabetes mellitus type 2.
Accession numbers of genomes selected are available in the Additional file 1.
Genomic variants that were present in at least 20 reads, and at ≥ 90% of frequency within each isolate were used to detect phylogenetic mutations and confirm the belonging to L4 and sublineage inclusion. Variants (InDels and SNPs) with less than 10 × were classified as low-coverage variants and were discarded from analyses. In addition, SNPs detected in high density variant regions were removed. We defined high-density regions whether we found > 3 SNPs in a window of 10 bp.
Variants with at least 10 reads at ≥ 10% to ≤ 90% frequency were called non fixed-SNP, and used to detect antibiotic resistance, and to confirm or predict the drug resistant profile. To determine drug resistance a combination of previously published mutation catalogs was used [30,31]. The MTBC lineages were identified by matching fixed SNPs (those with a frequency of > 90%) to specific phylogenetic positions as described in other publications [26,32].
The complete analysis pipeline can be found in the next Gitlab repository: https:// gitlab. com/ tbgen omics unit/ ThePi peline