Although peripheral blood transcriptional signatures discriminating between TB, LTBI and HC subjects have been identified in adult studies [4–6], concerns about the specificity of these signature sets have been raised . Furthermore, the performance of these signature sets in children, that show high rates of progressive tuberculosis due to immaturity of the immune response, has not been investigated so far. In this study, we identified a 116 signature gene set that discriminated TB from LTBI and HC with class errors of 11%, 22%, and 0% for the respective classes of TB, LTBI and HC (Table 2). While this 116 gene signature set also showed a good discriminative value between TB and LTBI in adults from South Africa, The Gambia and The United Kingdom, signature sets that were identified in those adult cohorts were unable to discriminate TB from LTBI in our childhood cohort (Table 3).
Gene clusters that were enriched in our signature set included genes in the categories of (programmed) cell death and calcium binding (Additional file 2: Table S2). Both the Gambian as well as the South-African study of Maertzdorf et al.[5, 6] also described enrichment of genes involved in cell death. Other similarities between the functional annotations in the South African study  and our study are the enrichment of genes involved in regulation of cell proliferation, regulation of caspase activity and protein kinase activity. Specifically, CD64 was identified as the most powerful discriminating gene seperating TB from LTBI cases in the South African study . As CD64 has also been identified as a marker for general innate immune response activity and sepsis, this marker may not be specific to TB . Berry et al. observed that genes downstream of type I interferon-αβ receptor signaling were over-respresented in patients with active TB . However, type I inferferon signaling is also induced in response to respiratory viruses  and Streptococcus pneumoniae, questioning the specificity of genes involved in type I interferon receptor signaling as biomarkers for active TB.
The enrichment of genes involved in calcium signaling in our TB biomarker set has not been described before in adult studies using whole-blood gene expression [4–6], nor in studies based on transcriptional profiling of peripheral blood mononuclear cells (PBMCs) [31, 32]. A close relation between abnormal calcium metabolism and radiological extent of disease has been described in pulmonary TB patients [33, 34]. Alterations in serum calcium, particularly cases of hypercalcemia, have been observed in adult TB patients [33–35]. Hypercalcemia in pediatric TB patients is an infrequently recognized and poorly understood phenomenon . In lung tissue, several processes related to calcium homeostasis are thought to contribute to M. tuberculosis persistence and the aggregation of macrophages in granulomas. Over-production of 1,2-dihydroxyvitamin D3, which plays a traditional role in calcium metabolism, in alveolar macrophages in granulomas has a protective effect against oxidative injuries from the nitric oxide burst from granulomatous macrophages [37–39]. Furthermore, M. tuberculosis inhibits a calcium-dependent phagolysosome formation pathway which leads to the prevention of maturation of M. tuberculosis-containing phagosomes into phagolysosomes. This process, referred to as the M. tuberculosis phagosome maturation arrest, is critical for M. tuberculosis persistence in the human host . S100P, which significantly discriminated TB from LTBI in our study children, and TAS2R46, which significantly distinguished TB cases from HC, are genes involved in calcium signaling [21, 23, 24]. Possibly, altered expression of these genes in TB patients reflects M. tuberculosis-mediated changes in calcium metabolism in lung tissue that can be measured in peripheral whole blood.
Although the groups of TB, LTBI and HC were reasonably well age-matched (Table 1), we cannot exclude the possibility that age-dependent differences in immune responses have influenced gene expression profiles. Age-related differences in both innate as well as antigen-specific responses to M. tuberculosis are well recognised [40, 41]. Alveolar macrophage antimicrobial activity and recruitment of monocytes as well as the production of cytokines and certain aspects of antigen presentation appear to be less efficient in young children. This is particularly true in children younger than one year of age . Therefore, the exclusion of children less than one year of age in our study is likely to have prevented a significant influence of age-related immune differences on gene expression results. Furthermore, the signature set that we identified showed a good discriminative value between TB and LTBI in adults from several regions [4–6]. This is an indication that the genes that were selected in our analysis make up a signature set that performs well in individuals of all ages.
We identified a minimal gene set of 42 genes that was able to separate TB cases from LTBI and HC in all previously described (adult) cohorts [4–6] as well as in our childhood cohort. However, as this minimal set was possibly over-optimized to fit exactly those sets that were used for its composition, this set might not perform well in a newly identified cohort from a different geographic region. As the datasets used for the composition of the minimal set were based on European, African and South American populations the minimal set may not be applicable to individuals from Asia, while this region carries almost two-third of the global TB burden . Furthermore, this signature set could be only indicative of damage to the lung epithelium, similar to what has been described for the overlap of the gene set determined by Berry et al. and the biosignature characteristic for sarcoidosis . Therefore, we used bootstrapping procedures to select a robust set of ten genes that had a high discriminative value in our population, in the two populations described by Maertzdorf et al.[5, 6] and in the comparison between TB, LTBI, HC and other inflammatory and infectious diseases in the dataset of Berry et al.. Although this approach probably leads to less overfitting of the selected set towards the source databases used and less overlap with other infectious diseases in comparison with the minimal gene set we identified, the discriminatory power of this ten gene set is less than that of the minimal set (Table 3). Future cohorts can be of help in the reduction of the 116 gene set to a dataset with similar performance in discriminating TB from LTBI, HC and other inflammatory diseases as the minimal gene set without overfitting the dataset to the source datasets.
From the ten gene set, a combination of five (S100P, HBD, PIGC, CHRM2 and ACOT7) could be used in decision tree analysis to differentiate TB from LTBI, HC and non-TB pneumonia with 78% sensitivity and 96% specificity in our dataset (Figure 2). Additionally, the expression profile of children that were treated for TB shifted from an active TB classification (oval in Figure 2) towards a classification as not suffering from active TB (hexagon in Figure 2) at five months post treatment initiation. This indicates that these biomarkers reflect a dynamic response that changes as mycobactericidal activity diminishes.
The discriminatory value of the 116 gene signature set for the classification of cases in the cohort described by Berry and colleagues  was significantly better in people from London compared to people from South Africa. An explanation for the greater similarity between our study population with people from London than with people from South Africa comes from population-genetic studies in which a decrease in the level of genetic variation between populations is observed with increasing geographic distance from Africa, consistent with the out-of-Africa spread of human populations . The finding that previously published signature sets based on individuals from South Africa [4, 6] do not provide a good discriminatory value between TB, LTBI and HC in The Gambia  points towards a high heterozygosity in TB immune response between different African countries. A high-resolution survey of genotype variation based on single-nucleotide polymorphisms, copy-number variants and haplotype analysis of a worldwide sample of 29 populations revealed that the genetic distance between individuals from Asia and Native American or Colombian individuals is significantly less than the genetic distance between Asian and South African populations . Bayesian cluster analysis clustered individuals from East Asia together with Native American or Colombian individuals, indicating their close phylogenetic relationship . Clustering of Native American individuals with Asian individuals based on their genetic similarities was also observed in a recently published quantitative assessment of human genetic variation worldwide . Therefore, we speculate that the applicability of our signature set in Asian populations might be better than the applicability of sets identified in African or European populations.