Patient information and P. aeruginosa infection patterns
Four CF patients were enrolled in this study, median age 24 years; range 15–31, at the time of metagenome sampling. From each of the patients we have previously genome sequenced 9 to 27 longitudinally collected P. aeruginosa isolates covering 1–7 years of infection [13]. From the four patients we collected either one (n = 3) or two (n = 1) sputum samples for metagenomic analysis (Fig. 1a). Accordingly, sputum samples S1, S2, and S3 were sampled from patients P41M3, P99F4, and P92F3, respectively, and sputum samples S4a and S4b, separated by two weeks, were sampled from patient P82M3. The sputum samples used for the metagenome sequencing were collected approximately 1 year after the most recently genome sequenced single isolate. The time period between the most recently genome sequenced isolate and the metagenome is not critical, since the main question addressed here is whether or not the genotypes of the single isolates can be rediscovered in the metagenomic samples.
Three of the four patients (P41M3, P92F3, and P82M3) have infection patterns that are characteristic for the majority of the P. aeruginosa infected CF patients at the Copenhagen CF Center at Rigshospitalet [27], with a single primary clone type in the entire collection period. One patient (P99F4) has a change in clone type, where one clone type was outcompeted by another (Fig. 1a). All four patients in this study were recently diagnosed as chronically infected with P. aeruginosa according to the Copenhagen definitions at the time of metagenome sampling [17].
Processing of sputum sample reads
The metagenome sequences were aligned to a database containing all bacterial, fungal, and viral genome sequences deposited at NCBI (see Methods). With a median of 96 % of all bacterial reads (Additional file 2: Table S2), P. aeruginosa was the dominating microbial species in the patients, corresponding to their clinical diagnosis as chronically infected with P. aeruginosa. We further aligned reads from the sputum metagenomes to the P. aeruginosa PAO1 reference genome, as we have previously done for the single isolates [13]. In all cases, the metagenomes had an average coverage of 5.99 Mbp (range: 5.90–6.04 Mbp) of the 6.3 Mbp PAO1 reference genome, by >3 reads and a phred score >30 (Additional file 3: Table S3). This high genomic coverage ensured that the presence or absence of polymorphisms in the metagenomes could be determined at the majority of genomic positions. On average, sequenced positions were covered by 10 to 31 reads giving us the opportunity to identify subpopulations that are present in more than 10 % of the population at the positions with the lowest coverage (Additional file 3: Table S3).
In order to compare the P. aeruginosa population structure and diversity as displayed by the single isolates and the compliance with the metagenomic read assemblies, we conducted a three step analysis: 1) Identification of the dominant clone type(s) in the sputum samples, 2) investigation if mutations in the genomes of the single isolates were also identified in the metagenomes, i.e. rediscovery of SNPs in the metagenomes, and 3) comparison of diversity measurements of the populations represented by the single isolates and the metagenomes.
Identification of the dominant clone type(s)
To identify the P. aeruginosa clone types represented in the metagenomes, de novo assemblies of single isolates and metagenomes were compared. For each patient the clone types represented by the single isolates were compared with the metagenome(s) from the same patient.
For all four patients, the clone type of the most recently sampled single isolate corresponded to the clone type identified from the metagenome with less than 528 SNP of differences (median 131 SNPs, range 91–527 SNPs). In contrast, when comparing the metagenomes with single isolates of other clone types they differed by more than 16,268 SNPs (median 17,844 SNPs, range 16,269–30,918 SNPs) (Additional file 4: Table S4A and Table S4B).
This shows that for each patient the most recent clone type identified by the genome of the single isolate matches the dominating clone type in the P. aeruginosa population identified in the sputum sample metagenome.
Rediscovery of SNPs in the metagenomes
Previous investigations of genome evolution in the clonal lineages of P. aeruginosa strains from each of the four patients [13] identified SNPs accumulating in the clonal populations. If these SNPs are indeed present in actual propagating lineages of the P. aeruginosa population of these patients, they should also be present in the metagenome(s). When looking at all the SNPs identified in all the single isolates, it is expected that the ratio of rediscovery of SNPs between single isolates and metagenomes from the same patients should exceed the ratio determined between single isolates and metagenomes of different patients. Further, this ratio should reach a value of one if all mutations found in the single isolates are also present in the metagenome.
With the exception of patient P99F4 and P92F3, who are infected with the same clone type (DK26), the rediscovery of SNPs from the single isolates in the metagenome(s) of the same patient was found to be significantly higher than between patients (Fig. 2, p <0.05, Fisher’s exact test with Holm correction). This supports the specific linkage between single isolates and the P. aeruginosa population as a whole, as hypothesised above.
In one case (S2 from P99F4), the ratio of the rediscovery of SNPs reached one, suggesting that all SNPs identified in the single isolates are present in more than 10 % of the whole population. In all other cases the ratio was below one, which could be due to 1) not all mutations being fixed in the population, i.e. they were lost during the time of sampling of the single isolates (harbouring the mutations) until sampling of the metagenome, or 2) some of the mutations being present in only a small fraction (<10 %) of the population and therefore not sampled by the metagenomic reads. In the case of P92F3 the SNPs that were not rediscovered were only present in 11–22 % (Additional file 5: Table S5) of the single isolates, and thus could be explained by mutations not being fixed in the population.
The metagenomes S4a and S4b from patient P82M3 illustrate both explanations above: Firstly, the much lower ratio of rediscovery of SNPs in patient P82M3 compared to the other patients, may be explained by the presence of hypermutators in the P. aeruginosa population of P82M3. Hypermutators are known to accumulate many unfavourable mutations [28], which are not expected to remain in the population, thus leading to a low ratio of rediscovery (assuming that the mutations are not hitch-hiking with more favourable mutations). Secondly, the low coverage of the metagenomic samples (Additional file 3: Table S3) resulted in a higher percentage of the SNPs being rediscovered in the later metagenome (S4b) than in the early metagenome (S4a). The rediscovery of SNPs in the two metagenomes correspond to 26 % (122 of 461) and 12 % (54 of 461), respectively (Additional file 5: Table S5). This is contradictory since the mutations were previously identified in the single isolates and therefore must be present to some degree in S4a in order to be identified in S4b. This suggests that the subpopulation represented by the S4b metagenome is present below the limit of detection in the S4a metagenome sequences and is therefore not identified.
For patients P99F4 and P92F3 the similar rediscovery ratios of SNPs between the metagenomes and the single isolates can be explained by a co-infection of the same clone type, DK26. This relationship was noted previously and seems to be the consequence of a patient-to-patient transmission event of the DK26 clone from P92F3 to P99F4 [13], explaining the lack of differentiation between the two P. aeruginosa populations. However, despite this close relationship between the populations, Fig. 3 shows that it is possible to distinguish between the SNPs of the single isolates and the respective metagenomes.
We have identified SNPs in genome sequences of longitudinal single isolates, which seem to be characteristic and representative for the patient community, including cases of infections caused by patient-to-patient transmitted clones. This patient specific relationship between metagenomes and single isolates is further documented by the phylogenetic analysis of the single isolates and metagenomes of the hypermutator population of patient P82M3 (Fig. 4), which shows that despite the highly increased mutation rate, the metagenomes are placed within the phylogeny of the single isolates from the patient (Fig. 4). This phylogeny also shows that the single isolates are not clustered depending on their origin of sampling, indicating that the population is mixed between the upper and lower airways and that the different subpopulations are not limited to a specific spatial position in the airways.
Diversity of the P. aeruginosa populations
In the single isolates, the diversity of the P. aeruginosa populations was determined from the phylogenies as the mean distance to the Line of Decent (LOD) (Fig. 5). For the metagenome-estimated diversity (Fig. 6) we used the number of polymorphisms normalised to the number of positions covered in the PAO1 genome in order to correct for differences in coverage between the different metagenomes (see Methods for details). Because S4a and S4b (patient P82M3) are representative of the same population we chose to merge the samples to carry out the inter-patient comparison of diversity (Fig. 6: “S4, avg.”). In both the LOD calculations and the number of polymorphisms we find, that the hypermutator population of patient P82M3 had the highest diversity and that the patient with the shortest period of infection (P92F3), as expected, harboured the least diverse population, to some degree validating our method of diversity calculations. We calculated 34.89 and 1.33 mean distances to LOD for the two single isolate populations, and diversity ratios of 7.08E-05, and 4.20E-05 for the metagenome populations from the two patients P82M3 and P92F3, respectively. Thus, in both cases of diversity measurements both single isolates and metagenomes we saw a significant difference between the diversity of the P. aeruginosa populations of patient P82M3 and P92F3 (p <0.05, Fisher’s Exact test with Holm correction) (Figs. 5 and 6).
When analysing further the single population of patient P82M3, the diversity calculations for the samples S4a and S4b illustrate that exhaustive sampling is essential, not only when using single isolates but also for metagenomic samples, in order to get the true picture of the population diversity. Because these two metagenomes represent a non-mutator and a hyper-mutator subpopulation, respectively, they have significantly different diversity ratios (4.90E-05 and 9.25E-05, respectively, p <0.05 Fisher’s Exact test with Holm correction).