Skip to main content

Table 1 The complete list of polymorphic microsatellites found in the coding regions of the three genomes, M. tuberculosis H37Rv (MTH), M. tuberculosis CDC1551 (MTC) and M. bovis (MB). Please note that the microsatellites in the intergenic regions are not reported here. The table lists the ORFs (given by their gene id) harboring the polymorphic microsatellites. The first column denotes microsatellite tract and its observed mutation in the form of insertion/deletion of repeat units leading to expansion or contraction of the microsatellite. As discussed in the text evolutionary relationship among the three genomes, is not established clearly. Therefore, we have followed a consensus approach where the observed event being a case of insertion or deletion of a repeat, is decided by the number of genomes in which the repeat number is conserved (given in bold text). For example, G4↔5 denotes that two of the genomes possess the tract G4 while in the third genome it exists as G5, and therefore it is regarded as an event of insertion leading to microsatellite expansion. Accordingly, the effect (fusion/fission, premature termination, length variation) on the coding region is also displayed.

From: Microsatellite polymorphism across the M. tuberculosis and M. bovis genomes: Implications on genome evolution and plasticity

Mutation

Function

MTH

MTC

MB

I Mutation leading to ORF splitting (33)

a) Fusion with overlaping orf (5)

    

G4 ↔ 5

Membrane protein

*Rv0192A (100), Rv0192 (366)

MT0202 (366), 2 nd might be 100aa #

Mb0198 (352)

G4 ↔ 5

Membrane protein mce2b

Rv0590A (84), *Rv0590 (275)

MT0619 (287), 2 nd might be 84 aa #

Mb0605 (343)

G7↔11

FrdB, frdC

*Rv1553 (247), *Rv1554 (126)

MT1604 (247), MT1605 (126)

Mb1579 (374)

T4↔5

Hypothetical protein

*Rv3338 (214) *Rv3337 (128)

MT3441 (248)

Mb3370 (297)

T2↔3

Cut5a, cut5b truncated cutinase

Rv3724A (80), *Rv3724B(187)

MT3827 (207)

Mb3751 (233)

b) Fusion with non-overlaping orf (4)

T5↔4

GmhA

*Rv0113 (196), *Rv0114 (190)

MT0122 (420)

Mb0117 (196) Mb0118 (190)

G5↔6

Pks1, pks15 polyketide synthase

*Rv2946c (1616), *Rv2947c(496)

MT3018 (1620), MT3021.1 (496)

Mb2971c (2112)

CG3↔2

Hypothetical protein

*Rv2974c (470), *Rv2975c (84)

MT3052 (470), MT3052.1(92)

Mb2999c (553)

T2↔3

*dTDP-glucose 4,6-dehydratase

Rv3784 (326), *Rv3785 (357)

MT3893 (712)

Mb3813 (326), Mb3814 (357)

c) Fission with overlaping orf (8)

G4↔5

Flavo protein, electron acceptor,

Rv2251A (139) *Rv2251(475)

MT2311 (529)

Mb2275 (529)

G5↔6

Conserved hypothetical protein

*Rv2879c (189) Rv2880c (275)

MT2947 (364)

Mb2904c (364)

G4↔3

Conserved hypothetical protein

*Rv0740 (175)

MT0765 (82), MT0766 (120)

31791926 (175)

G3↔2

fusA2

*Rv0120c (714)

MT0128 (714)

Mb0124c (597), Mb0125c (117)

G2 ↔ 3

*Pks6

Rv0405 (1402)

MT0418 (1402)

Mb0412 (460), Mb0413 (946)

T6↔7

PstB

*Rv0933 (276)

MT0960 (276)

Mb0957 (71), Mb0958 (213)

C2↔ 3

*drug transporter

Rv1877 (687)

MT1926 (687)

Mb1908 (511), Mb1909 (404)

C6↔5

LppO

*Rv2290 (171)

MT2347 (192)

Mb2312 (51), Mb2313 (121)

d) Fission with non-overlaping orf (16)

T5↔4

aceAa, aceAb

*Rv1915 (367 *Rv1916 (398)

MT1966 (766)

Mb1950 (766)

G5↔6

Conserved hypothetical protein

*Rv2561 (97) *Rv2562 (129)

MT2638 (212)

Mb2591 (245)

T2↔ 3

Conserved transmembrane protein

Rv3453 (110), *Rv3454 (422)

MT3561 (562)

Mb3483 (561)

C5↔ 4

mmpL1

*Rv0402c (958)

MT0412 (958)

Mb0408c (367), Mb0409c (591)

C7↔6

Hypothetical

*Rv0698 (203)

Might be 203 aa #

Mb0717 (109), Mb0718 (77)

T3↔2

3-ketosteroid-delta-1-dehydrogenase

*Rv0785 (566)

MT0809 (566)

Mb0807 (191), Mb0808 (368)

T2↔3

cobL

Rv2072c (390)

MT2132 (390)

Mb2098c (294), Mb2099c (62)

TG2↔1

*Probable transposase

Rv2424c (333)

MT2497 (333)

Mb2447c (230), Mb2448c (97)

G7↔15

PE_PGRS

*Rv2490c (1660)

MT2564 (1665)

Mb2517c (1150), Mb2518c (509)

GC3↔2

transglutaminase family protein

Rv2566 (1140)

MT2642 (1156)

Mb2595 (533), Mb2596 (597)

T4↔ 3

ugpA

*Rv2835c (303)

MT2901 (303)

Mb2859c (180), Mb2860c (123)

C2↔3

fadE22

*Rv3061c (721)

MT3147 (721)

Mb3087c (600), Mb3088c (114)

C4↔3

mesT

*Rv3176c (318)

MT3265 (339)

Mb3201c (105), Mb3202c (208)

C4↔3

Cyp142

Rv3518c (398)

MT3619 (372)

Mb3547c (193), Mb3548c (205)

A6↔7

hypothetical protein

*Rv3773c (194)

MT3882 (194)

Mb3801c (114), Mb3802c (78)

C2↔ 3

conserved membrane protein

*Rv3894c (1396)

MT4010 (1396)

Mb3923c (561), b3924c (833)

II) Muation leading to premature termination (13)

C7↔8

Oxido-reductase

*Rv0161 (449)

MT0170 (pt)

Mb0166 (449)

T5↔4

umaA1

*Rv0469 (286)

MT0485 (pt)

Mb0478 (286)

G4↔3

Cysteine synthase

*Rv0848 (372)

MT0871 (pt)

Mb0871 (372)

G3↔ 2

Membrane transport

*Rv0849 (419)

MT0872 (pt)

Mb0872 (419)

A2↔3

Hypothetical protein

-

MT1025.1 (46)

-

G4↔3

polyketide synthase pks5

*Rv1527c (2108)

Prematurely terminated

Mb1554c (2108)

G3↔2

Conserved hypothetical

*Rv1533 (375)

Prematurely terminated

31792719 (375)

G7↔ 8

$PE_PGRS(wag22) Antigen

$Rv1759c (914)

$MT1807 (pt) #

$Mb1789c (820), $Mb1790c (94)

G3↔2

PE_PGRS

Rv2126c (256)

MT2185 (pt)

Mb2150c (256)

G2↔3

Hypothetical protein

Not annotated as orf #

MT2401.2 (69)

Prematurely terminated

CGCGC2↔3

Oxidoreductase

*Rv3093c (334)

MT3177 (pt)

Mb3120c (334)

A3↔2

*Conserved hypothetical protein

Prematurely terminated

MT3855 (314) *

Not annotated as orf #

G3↔2

MycP2, membrane-anchored serine protease

Rv3886c (550)

MT4001 (pt)

Mb3916c (550)

III) Mutation leading to ORF splitting and 2 nd splitted part is annotated as psuedogene (4)

C7↔ 8

Glycolipid sulfotransferase

*Rv1373 (326)

MT1418 (320)

Mb1407 (265) 2 nd part is prematurely terminated

C6↔ 5

Hypothetical

*Rv1718 (272)

MT1757 (386) #

Mb1746 (207) Mb1747 (pt)

G7↔ 8

GlpK glycerol kinase

*Rv3696c (517)

MT3798 (517)

Mb3721c (pt) Mb3722c (251)

C2↔ 3

sigM

*Rv3911 (222)

MT4030 (196)

Mb3941 (196), Mb3942(pt)

IV) Mutation leading to length variation of orf (43)

a) Length increase from C-terminal (11)

    

T5↔ 4

CtpI

*Rv0107c(1632)

MT0116 (1625)

Mb0111c (1625)

G4↔3

Hypothetical protein

*Rv0607 (128)

MT0636 (147)

Mb0623 (128)

G3↔2

lldD1

*Rv0694 (396)

MT0721 (419)

Mb0713 (396)

C4↔5

NusB

*Rv2533c(156)

MT2608 (290)

Mb2562c (156 extra aa)

G3↔ 2

transport proteins

*Rv3239c (1048)

MT3337(1065)

Mb3267c (1048)

GC4↔5

Hypothetical protein

*Rv0739 (268)

MT0764 (268)

Mb0760 (282)

A3↔2

Hypothetical protein

*Rv1046c (174)

MT1075.1 (262)

Mb1075c (197)

C5↔4

Conserved hypothetical protein

Rv1760 (502)

MT1809 (531)

Mb1791 (509)

GC2↔1

hflX

*Rv2725c (495)

MT2797 (556)

Mb2744c (495)

T4↔3

integral membrane

*Rv3162c (145)

MT3251 (145)

Mb3187c (196)

C3↔2

$ESAT-6 like protein

$*Rv3890c (95)

$MT4005 (95)

$Mb3919c (124)

b) Length increase from N-terminal (3)

    

G3↔2

Conserved hypothetical protein

*Rv1246c (97)

MT1284 (143)

Mb1278c (97)

AC5↔ 6

lprJ

*Rv1690 (127) S prob 0.939

MT1729 (127) S prob 0.939

Mb1716 (139) S prob 0.005

G5↔ 4

Conserve membrane protein

*Rv3693 (440) S prob: 0.994

MT3795 (475) S prob: 0.0

Mb3718 (440)

c) Length decrease from N-terminal (6)

G2↔3

PBP-4 (penicilline binding)

*Rv0907(532)

MT0930 (562)

Mb0931 (516)

G6↔5

moac2

*Rv0864 (167)

MT0887 (167)

Mb0888 (142)

T3↔ 2

Membrane protein

*Rv1101c (385) s prob 0.708

MT1133 (385) s prob: 0.708

Mb1131c (342) s prob 1

A6↔5

aroE

*Rv2552c (269)

MT2629 (269)

Mb2582c (260)

G2↔ 3

Memrane protein

Rv2732c (204)

MT2802.1(180) S prob: 0.959.

Mb2791c (204) S prob 0.000

G3↔ 2

Conserve membrane protein

*Rv3885c(537) S prob: 0.993

MT4000 (422) S prob: 0.0

Mb3915c (537)

d) Length decrease from C-terminal (12)

G6↔5

membrane protein

*Rv0010c (141)

MT0013 (141)

Mb0010c (111)

A3↔2

Conserved hypothetical protein

Rv0025 (120)

MT0028 (90)

Mb0026 (120)

C3↔ 2

$NLP/P60 Antigen

$*Rv0024 (281)

$MT0027 (281)

$Mb0024 (277)

C7↔ 8

mce2D

*Rv0592 (508)

MT0622 (508)

Mb0607 (478)

A8↔7

PPE

*Rv0878c (443)

MT0901 (444)

Mb0902 (438)

C5↔6

PPE

*Rv1168c (346)

MT1205 (346)

Mb1201c (180)

CG5 ↔4

Secretory protein

*Rv1312 (147)

MT1352 (147)

Mb1344 (144)

G4↔3

Hypothetical protein

*Rv1725c (236)

MT1766 (187)

Mb1754c (236)

TG2↔1

SseB

Rv2291 (284)

MT2348 (268)

Mb2314 (256)

G2↔3

UDP-glucosyltransferases

*Rv2958c (428)

MT3034 (428)

Mb2982c (366)

G3↔2

Cyclase

*Rv3377c (501)

MT3487 (501)

Mb3411c (483)

G2↔3

Conserve hypothetical

*Rv3836 (137)

MT3944 (133)

Mb3886 (116)

V) Inframe mutation (11)

CGGCCC1↔2

Lipoprotein, s, lipid attach

*Rv0838 (256)

MT0860 (231)

Mb0861 (258)

GGC5↔4

PE-PGRS

*Rv0872c (606)

MT0894 (609)

Mb0896c (608)

CGG5↔4

PPE

*Rv2356c (615)

MT2425 (615)

Mb2377c (614)

GCC4↔3

PE_PGRS

Rv2396 (361)

MT2467.1 (382)

Mb2418 (360)

TCGACG1↔2

Hypothetical protein

*Rv1434 (45)

MT1478 (47)

Mb1469 (45)

G8↔ 11

membrane protein

*Rv2081c (146)

MT2143 (150)

Mb2107c (147)

GGC4↔3

Gdh

*Rv2476c (1624)

MT2551 (1624)

Mb2503c (1623)

G6↔3

Transcription regulatory

*Rv2621c (224)

MT2696 (224)

Mb2654c (223)

GCG5↔4

PPE

*Rv3159c (590)

MT3247 (603)

Mb3183c (589)

TGG4↔ 5

Memrane protein

Rv2799 (209)

MT2867.1 (209)

Mb2822 (210)

CCG4↔ 3

moeZ

*Rv3206c (392)

MT3301 (392)

Mb3231c (391)

  1. Sp: Signal peptide probability (predicted using SignalP [59])
  2. Pt: Prematurely terminated
  3. Underlined: Membrane proteins (predicted using THHMM [60])
  4. Italic: Second part becomes pseudo gene because of absence of Shine-Dalgarno sequence
  5. $: Known antigens (from Tuberculist [31])
  6. * Expression of ORFs of M. tuberculosis H37Rv known from (Tuberculist [31], Stanford microarray database [30], ArrayExpress [32]) and from references [33-37]. In some entries in column 2, the * mark denotes information on known expression from different literature but not from microarray data. The expression profile data of MTC and MB are not available on the public domain databases and therefore not given in this table.
  7. # Mutation is absent and also the region has not been annotated as ORF