Legionella pneumophila pangenome reveals strain-specific virulence factors
© D'Auria et al; licensee BioMed Central Ltd. 2010
Received: 11 September 2009
Accepted: 17 March 2010
Published: 17 March 2010
Legionella pneumophila subsp. pneumophila is a gram-negative γ-Proteobacterium and the causative agent of Legionnaires' disease, a form of epidemic pneumonia. It has a water-related life cycle. In industrialized cities L. pneumophila is commonly encountered in refrigeration towers and water pipes. Infection is always via infected aerosols to humans. Although many efforts have been made to eradicate Legionella from buildings, it still contaminates the water systems. The town of Alcoy (Valencian Region, Spain) has had recurrent outbreaks since 1999. The strain "Alcoy 2300/99" is a particularly persistent and recurrent strain that was isolated during one of the most significant outbreaks between the years 1999-2000.
We have sequenced the genome of the particularly persistent L. pneumophila strain Alcoy 2300/99 and have compared it with four previously sequenced strains known as Philadelphia (USA), Lens (France), Paris (France) and Corby (England).
Pangenome analysis facilitated the identification of strain-specific features, as well as some that are shared by two or more strains. We identified: (1) three islands related to anti-drug resistance systems; (2) a system for transport and secretion of heavy metals; (3) three systems related to DNA transfer; (4) two CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems, known to provide resistance against phage infections, one similar in the Lens and Alcoy strains, and another specific to the Paris strain; and (5) seven islands of phage-related proteins, five of which seem to be strain-specific and two shared.
The dispensable genome disclosed by the pangenomic analysis seems to be a reservoir of new traits that have mainly been acquired by horizontal gene transfer and could confer evolutionary advantages over strains lacking them.
Legionella pneumophila is a gram-negative facultative intracellular pathogen, identified as the infectious agent of the Legionnaire's disease (LD) or Legionellosis in 1977 . It is found in aquatic environments parasitizing its natural hosts, amoebae and protozoa. From this environment, Legionella can colonize water treatment plants, such as refrigeration towers, potable water pipes, etc., and can cause infections in humans, when infected aerosols are inhaled [2, 3]. Despite efforts to keep water systems free of Legionella, this pathogen is still causing infection throughout the world, including Spain, where it is endemic in some areas. From 1989 to 2005, around 310 outbreaks with 2,974 cases were recorded worldwide. In 2002 and 2005 there were two important epidemic events with 1,461 and 1,292 cases respectively. In Alcoy, an industrial town in the Valencian Region (Spain), a large outbreak occurred during 1999-2000. A strain that had caused several outbreaks and many cases, named "Alcoy 2300/99", was isolated from a patient in that outbreak . Since then, recurrent epidemics in Alcoy have harbored Alcoy 2300/99.
Currently, the genomes of five L. pneumophila strains are available: Philadelphia (Lpg, USA) , Lens (Lpl, France) and Paris (Lpp, France) , Corby (Lpc, England)  and Alcoy (Lpa, Spain) (reported in this work). As with the majority of other pathogenic Legionella strains, immunoassay analysis defined them as belonging to the serogroup 1 . A phylogeny based on Multi Locus Sequence Typing (MLST) showed that all strains are closely related, Alcoy and Corby being the closest .
Several features relating to the virulence of L. pneumophila are well known. For example, the mechanisms responsible for entry into the macrophages [10, 11], the intracellular (host) trafficking of effectors  and the membrane-associated protein involved in virulence [9, 13]. The data available disclose an almost complete physiology of this organism and its relationships with protozoa and human macrophages. An interesting question relating to L. pneumophila is its high rate of DNA exchange, not only within species and other closely-related bacteria, but also with eukaryotic organisms . Comparative genomics can give clues about the extent of this process. Nowadays, the genome sequencing of strains belonging to the same species offers the possibility of defining their pangenome, which helps in understanding the evolutionary dynamics of microbial species. The pangenome comprises the core-genome, made up of the genes shared by all strains, and the accessory or dispensable genome compartment, consisting of the genes that are strain-specific or shared by only some of the strains . Pangenome studies can disclose characteristics that are not easily perceptible using standard annotation analysis . For example, pangenome studies have facilitated identification of virulence factors or anti-drugs systems in Escherichia coli and Streptococcus agalactiae [17, 18]. The dispensable genome compartment can provide evidence of lateral gene transfer events that have occurred during the evolutionary history of a strain, probably offering additional evolutionary potential to the organism.
In this work, we report the main genomic features of L. pneumophila strain Alcoy 2300/99 and compare it with the four previously sequenced strains. A detailed description of the L. pneumophila pangenome is provided, and strain-specific features are catalogued in terms of "islands". Several islands containing virulence factors were identified and, where possible, their evolutionary origins were also hypothesized. Although the strains are phylogenetically closely related, the pangenomic approach allowed identification of distinctive features, such as anti-drug related islands, strain-specific transport or secretion systems, DNA transfer-related islands, CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) systems, and integrated phage insertions.
Results and Discussion
General features of the L. pneumophila Alcoy (Lpa) genome
Main features of L. pneumophila genomes
Genome length (bp)
GC content (%)
Coding genes (%)
Islands bp (%)
Average GC of islands (%)
Pangenome of L. pneumophila
Lpa and Lpc are the strains that share most genes; 2,560 out of 3,196 in Lpa (80%) and 3,207 in Lpc (79.8%). This result is in agreement with the phylogenetic tree obtained using MLST . Compared with the remaining genomes, Lpa and Lpc share 2,208 (69.1%) and 2,181 (68%) genes with Lpl; 2,271 (71.1%) and 2,284 (71.2%) with Lpp; and 1,802 (56.4%) and 1,776 (55.4%) with Lpg, respectively. Similarly, Lpp and Lpl also seem to be close-related with a shared genome of 2,207 genes out of 2,877 for Lpl (76.7%) and 3,026 for Lpp (73%). Finally, Lpg seems to be the most distantly related, sharing 2,207 genes with Lpl (75%) and 1,792 (60.1%) with Lpp.
Functional classification of core and dispensable genes
Accessory genes islands
Resistance related islands (R)
Contains several multi-drug related protein
68252 - 87883
69153 - 88336
68096 - 77398
Antibiotic persistance related system HipB/HipA; Phage related proteins Xre, and multi-drug efflux pump
66766 - 85124
Methylase, prophage integrase, TraK homologous
68898 - 83207
Stability system StbDE
1755441 - 1767402
Cobalt/zinc/cadmium efflux transporter helABC; ArsR regulatory proteins
1251900 - 1268281
Cobalt/zinc/cadmium efflux transporter helABC
2781725 - 2832225
DNA transfer (DT)
181578 - 230836
tra and trb conjugal transfer proteins, Rac integrase roteins, htpX protease prophage regulatory protein alpA
181312 - 241161
609680 - 648761
Putative RNA helicase, two putative restrictases, tra and trb conjugal transfer proteins.
613277 - 656674
Plasmid-like elements containing lvh, lvr
183831 - 234043
1353613 - 1400796
172914 - 239494
CRISPR systems (C)
3226572 - 3248046
Really similar CRISPR system
1169086 - 1179252
*Part of above P2 island
Integrated phage related (PR)
MviN virulence factor
1292822 - 1329842
1278158 - 1324830
1167775 - 1185079
173401 - 183804
2493848 - 2532367
Ankyrine containing domain
2486005 - 2509743
Plasmid maintenance killer/antidote system
1190582 - 1219661
2756698 - 2784130
Type IV secretion system
2296937 - 2366483
Probable phage integration
2408503 - 2419758
Not well defined (ND)
325750 - 334842
Carbon storage regulator
1192799 - 1199183
Nickel/Cobalt type II transport systems
1353408 - 1356362
No clear role
1439890 - 1450778
Mainly transposases and phages integrases
2892417 - 2904871
No clear role
1182245 - 1189318
Incomplete HipA/HipB system
2605322 - 2617191
No clear role
1733202 - 1746135
No clear role
2654264 - 2661604
No clear role
2725059 - 2776774
3003085 - 3015471
3443118 - 3451379
No clear role
2824095 - 2849378
No clear role
2833804 - 2842585
Resistance-related islands (R)
We have identified two types of resistance-related islands, R1 and R2 (see Figure 4 and Figure 5, blue track). R1 maintains the same position, around 60 Kb at the beginning of the chromosome, in the five strains, inserted in a tRNAasn site, although the organization and content is different. In Lpa, Lpc and Lpg, the island is similar and contains several hypothetical proteins as well as a methylase (lpa00094, LPC_0075, lpg0060), followed by a multi-drug resistance protein (lpa00095, LPC_0076, lpg0061). In the Paris strain, the island also begins with a methylase (lpp0063) and in the Lens strain with a putative transposase (lpl0064). However, although the position of the gene coding for the methylase is quite similar in both genomes, the alignment is different, due to a relatively big deletion in Lpl, whereas in Lpp it is followed by the antibiotic persistence-related system HipB/HipA (lpp0065, lpp0064) with no homolog in other Legionella strains. Although the mechanism of this system is still not well known, much evidence points to HipA as a toxin with bacteriostatic activity which binds DNA/RNA, blocking macromolecule synthesis until HipB binds HipA, releasing the DNA/RNA so microbial cells can survive extended exposure to drugs [22, 23]. This island is followed by various phage-related proteins, elements of the Xre family of transcriptional regulators, an LvrA protein, and three transporters (lpp0077, lpp0078, lpp0079) of which the lpp0077 is similar to the acriflavine multi-drug efflux pump . Finally, the island ends with three hypothetical proteins and an IS4-family transposase (lpp0083). Several genes such as the TraK, the LvrA-related protein and the phage-related integrases are also maintained in the same positions in the Alcoy, Corby, Lens and Philadelphia strains. A similar system (R2), was also found in the Lens strain. It is a small region containing transposases as well as two homologs of a stability system StbDE (lpl1587, lpl1588). This island was originally associated with plasmids , but it has also been found on the chromosomes of other pathogenic bacteria .
Transport/secretion systems (TS)
Only one TS island has been found in the Lpa and Lpc genomes. TS1 in the Alcoy strain is composed of 16 ORFs (lpa01590 to lpa01614). Three of these ORFs, lpa01601, lpa01599 and lpa01598, are related to the cobalt/zinc/cadmium efflux HelABC transport system that provides resistance against these heavy metals . They are followed by lpa01604, which codes for the metallo-regulator ArsR that, in the presence of metals, derepresses the operator/promotor DNA, thereby activating the transcription of downstream genes . As happens in other islands, this ends with a phage integrase and three transposase-related ORFs, indicating a possible exogenous origin in Legionella. It is worth mentioning that all five genomes carry a Hel ABC operon belonging to the core genome, while Alcoy and Corby strains also possess the two additional above-mentioned systems.
In the Corby strain, TS1 is bigger than in the Alcoy, spanning about 50 kb. It contains the previously mentioned Hel ABC operon (LPC_1847- LPC_1849), in addition to the Hel ABC systems present as part of the core genome (LPC_02269, LPC_02270, LPC_02271), a transposase (LPC_1856) and a phage repressor protein (LPC_1857). The first 9 ORFs of the TS1c island (mainly hypothetical proteins and one transposase) are syntenic with the Paris ND9 island (see below). The island continues for about 19 kb with apparently no synteny with other genomes and then regains synteny with the Paris ND10 island.
Interestingly, the TS1a-HelABC genes seem to be more similar to the core Lpl operon than to that of Lpc TS1 while TS1c-HelABC genes are more similar to those of the Lpg genome. Conversely, core HelABC genes in Lpa and Lpc are highly related.
DNA transfer-related islands (DT)
Three DT islands have been identified (DT1, DT2 and DT3). DT1 and DT2 correspond to the Trb-2 and Trb-1 islands described by Glöckner and collaborator for the Corby strain . We have found that both islands are also present in the Alcoy strain, although DT2a is shorter than DT2c, thus suggesting that Lpa and Lpc acquired these systems via DNA transfer prior to their divergence (see Figure 5, green track). Some remarkable features of DT1 are: a phage repressor (lpa00219, LPC_0166), a set of tra and trb (conjugal transfer proteins) operons, a putative lamboid prophage Rac integrase (lpa00266, LPC_0199), another integrase (lpa00270, LPC_0202), an htpX protease (lpa00275, LPC_0205), and a prophage regulatory protein alpA (lpa00278, LPC_0208). In the DT2 island, both strains share a putative RNA helicase (lpa00835, LPC_2785), two putative restriction enzymes (LPC_2788, LPC_2790, lpa00832, and lpa_00829) and the set of the tra and trb operons. Glöckner and collaborators  described that Trb-1 (here DT2) is active and could be transferred to other Legionella. In both genomes, DT1 and DT2 are integrated, respectively, at the tmRNA and tRNApro sites.
Finally, we have identified a DNA transfer island, DT3, in all strains with the exception of the Corby and Alcoy ones. It is worth mentioning that this island was previously described in Lpg and Lpp, as an integrated plasmid-like element [5, 29]. It contains lvh (Legionella vir homolog), a type IV secretion system involved in conjugation [30, 31]. The lvr (Legionella vir region) is located downstream where LvrA is homologous to the CsrA repressor, important for the inhibition of post-exponential phase activity (such as DNA transfer) . A CRISPR system was identified at the beginning of this island.
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) systems (C)
CRISPR are bacteriophage resistance systems  that have been identified in the Alcoy and Lens (C1), and in Paris (C2) strains (Figure 5, pink track). Lpl also possesses one almost identical CRISPR system on the plasmid. Phylogenetic analyses reveal how Alcoy CAS (CRISPR-associated genes) genes are more closely related to the ones on the plasmidic CRISPR of the Lens strain than to the chromosomal one. The CAS genes identified were Cas1 (lpl2837, plpl0052, lpa01472), Cas3-helicase (plpl0051, lpl2838, lpa01473), and lpa01474 and lpa01475, which are two conserved proteins in bacteria, but with no relatives in other Legionella genomes. lpa01476, plpl0047 and lpl2842 belong to the CRISPR-associated Csy4 family proteins (see Additional file 2). Downstream of this cluster of proteins begins the repeats of the CRISPR system (see Additional file 3). The Alcoy and Lens strain repeats are almost identical, except for one base (adenine in the Alcoy strain and cytosine in the Lens, GTT(A/C)ACTGCCGCACAGGCAGCTTAGAA). Fifty-seven direct repeats were found in Lpa, while 53 in the Lpl plasmid, and two clusters of 52 and 12 on the Lpl chromosome. None of the Alcoy spacers were found to be similar to other spacers from the CRISPR database .
The Paris strain genome hosts a CRISPR system, located at the beginning of the DT3 island. The first CRISPR-related protein is coded by lpp0160 and shows a weak similarity to a putative CRISPR-associated large protein. The cluster is followed by Cas1 (lpp0161), Cas2 (lpp0162), Cas4 (lpp0163) and 34 direct repeats. Interestingly, the repeats and the CRISPR-associated genes are not related to the ones found in the Lpa and Lpl genomes.
Phage related islands (PR)
Up to seven different phage-related islands have been found. The Alcoy and Corby strains share PR1 and PR3 whereas all the other PR islands are strain-specific. PR1 in Lpa and Lpc are almost syntenic and could be considered an ancient infection that took place before the split of the two lineages; a related region is also present in Lpg genome. This island contains several phage-related proteins, several transposases belonging to the IS4 family, an MviN virulence factor (lpa01685, LPC_2173, lpg1087), as well as other not well-defined proteins containing DNA cleavage or binding related domains. The MvinN-related protein is additional, but not equivalent to the constitutive one present on the chromosomes (see Additional file 4) of the five strains (lpa0385, LPC_0506, lpg2635, lpl2560, and lpp2688). MvinN has been described as an important factor of virulence in Salmonella typhimurium and Burkholderia pseudomallei . Although its role in pathogenicity is still not clear, this protein is a homolog of the proposed lipid II flippase protein [36, 37] that has no virulence activity.
PR2 is specific to the Philadelphia strain and relies on a probable hot spot region. It contains six residual transposase-related ORFs followed by four hypothetical proteins that correspond to the beginning of the DT1ac islands.
PR3, as stated above, has been found only in the Corby and Alcoy strains. It begins with an ankyrin repeat-containing protein (LPC_1606, lpa03089). Ankyrin-domain-containing proteins have a eukaryotic origin and are related to intracellular trafficking. In Legionella ankyrin-containing proteins may be secreted by the Dot/Icm system and could play a role in intracellular bacterial replication [14, 38]. The island also contains several hypothetical proteins (LPC_1607-LPC_1615, lpa03090-lpa03095) and a transposase (LPC_1616, lpa03096).
PR4 is specific to the Lens strain genome and spans around 29 kb. It seems to be a residual plasmid integrated into the chromosome, based on the presence of several proteins relating to DNA organization, such as lpl1068 containing SNF2-related protein domains that seem to be similar to helicases, and an Omp/MotB domain-containing protein (lpl1070) that may be related to structural flagella membrane proteins. Several phage-related proteins and transposases were found, together with another predicted inner membrane protein (lpl1073). Moreover, this island contains an ORF similar to DNA damage-inducible protein J (lpl1084), and the last two ORFs (lpl1092 and lpl1093) have high homology to other plasmid maintenance killer/antidote systems. The latter has been identified in several gram-negative-related plasmids and it is known as a regulator of bacterial programmed cellular death, although in some cases (e.g. E. coli) it is also integrated into the chromosome .
The Alcoy-specific island PR5 is integrated into a [CAT]-tRNAile site. The first three ORFs code for proteins related to transposases, whereas the remainder code for hypothetical proteins with no clear function. None of these proteins appear to have any relationship with other Legionella-related ORFs, and seem to be a clear case of acquisition by HGT. Three genes with no clear function, TraK gene (lpa03390), an inner membrane protein (lpa03394), and one hypothetical protein (lpa03395), have been found to be syntenic with the Lpg and Lpl genomes (lpg2365, lpg2366, and lpg2367; lpl0071, lpl0070, and lpl006, respectively). lpa03400 is similar to a phage-related integrase. The next ORFs (on the reverse strand) are related to a cluster of genes involved in bacilysin synthesis, known as one of the simplest antibiotic peptides active against some bacteria and fungi  (see Additional file 5).
PR6 in the Philadelphia genome includes homologs of a type IV secretion system, mobile genetic elements, and virulence factors. It has been described in detail by Brassinga and collaborator (2003) as a 65 kb pathogenicity island .
Finally, the island PR7 in the Paris genome codes almost exclusively for hypothetical proteins. Only a putative primase/helicase (lpp2117), and a phage integrase (lpp2123) homolog seems to relate this island to phage integration with an unpredicted function.
Not defined island (ND)
Up to thirteen islands, for which it has not been possible to establish a clear role, have been identified. Interestingly, all but one (ND11 in Lpa and Lpc), are strain-specific islands. ND1, ND2, ND3, ND8, ND9, and ND10 are found in the Paris genome. ND1 hosts a complete cytochrome o cluster (subunits II, I, III, IV on lpp0294, lpp0295, lpp0296, lpp0297, respectively) and a glycine/betaine transporter. ND2 contains hypothetical proteins, a CsrA (Carbon storage regulator), and is located close to the chromosomal heavy metals regulatory genes (HelABC). ND3 is formed by two hypothetical proteins followed by the HupE/UreJ membrane protein (lpp12118), homologs of Nickel/Cobalt type II transport systems . These elements are followed by a thiocyanate hydrolase cluster for subunits gamma, alpha and beta (lpp1219, lpp1220 and lpp122, respectively) that were previously identified as unique to the Paris genome when compared to that from Lens . This enzyme is the first key step in thiocyanate degradation and it is important in the detoxification processes . ND8 is a small island composed mainly of transposases. Finally, for ND9 and ND10, it was not possible to propose a role. ND4 and ND5 are specific to the Philadelphia strains. ND4 is a small island containing hypothetical proteins and transposases, whereas ND5 contains a whole set of transposases and phage integrases as well as hypothetical proteins. ND6 and ND13 are specific to the Corby strain. ND6 spans a 7 kb region in the Corby genome with no apparently exogenous genes; it contains mainly hypothetical proteins, although some ORFs seem to be related to acetyltransferases. ND13 is syntenic with the terminal part of the Lpp island ND10 and, similarly, consists of hypothetical proteins. ND11 was found to be syntenic in both the Alcoy and Corby genomes and contains transposases and hypothetical proteins. ND7 and ND12 islands were found only in the Lens strains. ND7 is another not well-defined island consisting mostly of hypothetical proteins. It contains an ORF related to filamentation induced by a cAMP protein (lpl2288), an incomplete homolog of a HipA system, as well as the R1p island Hip A/Hip B system. Finally ND12 is made up exclusively of hypothetical proteins.
The virulence and persistence of L. pneumophila are mainly due to specific mechanisms coded by part of its core genome that makes L. pneumophila able to infect, survive and replicate in macrophages [14, 44–46]. Lpc has been described as one of the most virulent strains , while Lpp is responsible for sporadic cases but is frequently recognized worldwide . Lpl was responsible for important outbreaks in France during 2003-2004 with 86 registered cases resulting in 17 deaths . Finally, although the Lpg strain was the first one isolated for which the genome sequence was defined, it turned out to be not so virulent as the other .
Comparative genomics of five strains isolated in different parts of the world of L. pneumophila disclosed the presence of several HGT-related islands and an evident history of recombination events. Here, we reported a number of features connected to virulence that could have been exchanged, or acquired by the strains along with their evolution. The traces of these events are mainly part of the dispensable genome compartment.
The islands encountered from the dispensable pangenome compartment of the five genomes revealed factors that can give additional virulence to each strain. Alcoy and Corby strains are those in which more islands have been found related to virulence and DNA transfer activities. Multi-drug efflux systems have been found in Lpa, Lpc and Lpg, while stability systems have been found in Lpl and Lpp genomes (R1). Lpa and Lpc strains are probably potentially more resistant in the presence of heavy metals, due to an additional HelABC system in the TS1 island. Moreover, Lpa and Lpc seem to have acquired, before to their lineage split, the ability to be more successful in DNA transfer by the DT1ac and DT2ac systems. Interestingly, Alcoy strain also acquired a complete bacilysin system (PR5 island), probably by precedent phage contact after separation from the Corby lineage, which could represent an environmentally competitive advantage for this virulent strain. Lpa, Lpc the Lpg also carry an additional Mvin virulence factor, although there is no experimental evidence of its activity (island PR1). Moreover Alcoy, as well as Lens and Paris, proved to carry phage resistance systems (CRISPR on C1al and C2p islands). Several additional specific features have also been reported, although their role could not be predicted.
Finally, the data reported in this work show that the Alcoy strain possesses additional features, making it different from other previously sequenced genomes, even with the most closely related Corby strain. This finding could be related to the recurrent and sometimes mortal outbreaks recorded in the Spanish town of Alcoy.
Strains used in this work
L. pneumophila strain Alcoy 2300/99 was isolated from sputum of a patient with Legionnaires' disease (LD) and associated with the LD outbreak detected in Alcoy (Spain) in 1999. It belongs to the most predominant serogroup 1 . The same strain was further isolated in other successive LD outbreaks in 2000 and 2002. The publicly available genomes of L. pneumophila used for comparison were retrieved from GenBank database http://www.ncbi.nlm.nih.gov/Genbank/index.html. Abbreviations and accession numbers are reported in Table 1.
DNA extraction, shotgun clone libraries and sequencing
DNA from L. pneumophila Alcoy was extracted as described in D'Auria et al. (2008) . Cloning and sequencing were carried out as follows: two libraries (inserts of 1-2 and 2-10 Kb) were generated by sonication of genomic DNA, followed by cloning of the fragments using the TOPO XL PCR Cloning Kit (Invitrogen, #K4700-10). Plasmid DNA purification was done with a Montage Plasmid Miniprep96 kit (Millipore, #LSKP09624) on a MULTIPROBE II-Robot Liquid Handling System (Packard Bioscience). Sequencing reactions were mainly performed using the ABI PRISM BigDye Terminator v3.0 Ready Reactions Kit and resolved using the 3730 Xl Genetic Analyzer (Applied Biosystems). To complete the assembly we used 454 pyrosequencing (Roche) performed on one half of a GS-FLX PicoTiter plate, obtaining a total of 52 Mb. The combination of both sequencing methods allowed the genome to be defined in 4 contigs. Finally, inverse PCR was employed to fill the remaining gaps and close the genome.
Genome assembly and annotation
Base-calling of each Sanger read was carried out with the "Pregap4" interface from Staden Package . All reads were then checked manually by the "Trev" program and the assembly of Sanger sequences was performed by Cap4 program, both from the Staden Package . The 454 reads were assembled by the Newbler assembler http://www.roche-applied-science.com and then integrated with the previous Sanger assembly. Open reading frame predictions were carried out with the Glimmer3 program  assigning the "lpa" locus tag to each sequence. All CDSs were searched by BLAST  searches against the non-redundant GenBank database, the Cluster of Orthologous Groups  and the Kyoto Encyclopedia of Genes and Genomes . Annotation was then improved by homology searches against previously sequenced genomes of the Philadelphia, Lens, Paris and Corby strains (see Table 1 for genome accession numbers). Ribosomal genes were identified by BLAST searches against "nt" databases. tRNAs were identified by the tRNAscanSE software . tRNA genes with anticodon CAT (tRNAIle, tRNAMet and tRNAfMet) were identified by the method described by Silva and collaborators .
CDSs from each genome were considered orthologous when reciprocal BLAST best hits gave at least 70% of overlap with a minimum of 80% similarity. A catalogue of orthologs was compiled. GenomeViz2 software was employed to draw genome plots . Several Perl scripts were compiled in our laboratory for massive data handling (available upon request).
To define the coverage of the L. pneumophila pangenome, rarefaction curves were calculated from pools of CDS from each genome. In ecology, rarefaction is a technique applied in order to standardize and compare species richness computed from samples of different size . Here, it is applied to compare gene cluster richness among multiple genomes from the same species. The L. pneumophila pangenome was then compared with pangenomes from strains belonging to E. coli (8 genomes), Streptococcus pyogenes (8 genomes), Staphylococcus aureus (9 genomes), Streptococcus agalactiae (8 genomes)(accession numbers are reported in Additional file 6). For each genome BLASTCLUST software was used to define gene clusters (70% similarity and 70% overlap). Gene abundance within each cluster was used to calculate rarefaction curves by the RarefactWin.exe http://www.uga.edu/~strata/software/Software.html program.
A comparative analysis among the five strains has been carried out using the Mauve, multiple genome alignment software .
All CDSs from each genome were pooled together and clustered by CD-HIT-EST software with at least 70% of overlapping and a minimum of 80% similarity . One gene from each cluster was characterized by RPSBLAST best match (e-values lower than 10-15) against the COG database (Cluster of Orthologous Groups, ).
Determination of specific islands
Discontinuity of the homology (synteny) between CDSs from a given genome and its ortholog in every comparison were considered to define an island. Generally, islands were defined when more than 5 consecutives CDSs were found to be specific for one strain. Syntenic Alcoy/Corby orthologous genes which did not match in the other three genomes were also considered islands. Islands were named according to their proposed function. A lowercase letter was added to the end of the name referring to the genome to which it belonged and letters were chosen according to the official locus tag definition ("a" for Alcoy, "c" for Corby, "g" for Philadelphia, "l" for Lens and "p" for Paris; e.g. "TS1a": Transport/Secretion island number 1 from the Alcoy genome). Due to the fact that the original annotations of Lpc, Lpg, Lpl and Lpp genomes often report CDSs as "hypothetical protein", similarity searches of genes within these islands were carried out against an updated Refseq (GenBank) database http://www.ncbi.nlm.nih.gov/RefSeq/.
This work has been funded by contract with Consellería de Sanidad of Valencian Government to AL and AM, and by grants BFU2009-12895-CO2-01 and SAF2009-13032-CO2-01 from Ministerio de Ciencia e Innovación (MICINN) to AL and AM, respectively. NJH is recipient of a fellowship from Carlos III and GD has a research contract from CIBERESP. Sanger sequencing was carried out using facilities of the SCSIE from University of Valencia.
- Fields BS, Benson RF, Besser RE: Legionella and Legionnaires' disease: 25 years of investigation. Clin Microbiol Rev. 2002, 15: 506-526. 10.1128/CMR.15.3.506-526.2002.PubMed CentralPubMedView Article
- Lu H, Clarke M: Dynamic properties of Legionella-containing phagosomes in Dictyostelium amoebae. Cell Microbiol. 2005, 7: 995-1007. 10.1111/j.1462-5822.2005.00528.x.PubMedView Article
- Sabria M, Alvarez J, Dominguez A, Pedrol A, Sauca G, Salleras L, Lopez A, Garcia-Nunez MA, Parron I, Barrufet MP: A community outbreak of Legionnaires' disease: evidence of a cooling tower as the source. Clin Microbiol Infect. 2006, 12: 642-647. 10.1111/j.1469-0691.2006.01447.x.PubMedView Article
- Fernandez JA, Lopez P, Orozco D, Merino J: Clinical study of an outbreak of Legionnaire's disease in Alcoy, Southeastern Spain. Eur J Clin Microbiol Infect Dis. 2002, 21: 729-735. 10.1007/s10096-002-0819-9.PubMedView Article
- Chien M, Morozova I, Shi S, Sheng H, Chen J, Gomez SM, Asamani G, Hill K, Nuara J, Feder M: The genomic sequence of the accidental pathogen Legionella pneumophila. Science. 2004, 305: 1966-1968. 10.1126/science.1099776.PubMedView Article
- Cazalet C, Rusniok C, Bruggemann H, Zidane N, Magnier A, Ma L, Tichit M, Jarraud S, Bouchier C, Vandenesch F: Evidence in the Legionella pneumophila genome for exploitation of host cell functions and high genome plasticity. Nat Genet. 2004, 36: 1165-1173. 10.1038/ng1447.PubMedView Article
- Glockner G, Albert-Weissenberger C, Weinmann E, Jacobi S, Schunder E, Steinert M, Hacker J, Heuner K: Identification and characterization of a new conjugation/type IVA secretion system (trb/tra) of Legionella pneumophila Corby localized on two mobile genomic islands. Int J Med Microbiol. 2008, 298: 411-428. 10.1016/j.ijmm.2007.07.012.PubMedView Article
- Yu VL, Plouffe JF, Pastoris MC, Stout JE, Schousboe M, Widmer A, Summersgill J, File T, Heath CM, Paterson DL, Chereshsky A: Distribution of Legionella species and serogroups isolated by culture in patients with sporadic community-acquired legionellosis: an international collaborative survey. J Infect Dis. 2002, 186: 127-128. 10.1086/341087.PubMedView Article
- D'Auria G, Jimenez N, Peris-Bondia F, Pelaz C, Latorre A, Moya A: Virulence factor rtx in Legionella pneumophila, evidence suggesting it is a modular multifunctional protein. BMC Genomics. 2008, 9: 14-10.1186/1471-2164-9-14.PubMed CentralPubMedView Article
- Horwitz MA: Phagocytosis of the Legionnaires' disease bacterium (Legionella pneumophila) occurs by a novel mechanism: engulfment within a pseudopod coil. Cell. 1984, 36: 27-33. 10.1016/0092-8674(84)90070-9.PubMedView Article
- Tachado SD, Samrakandi MM, Cirillo JD: Non-opsonic phagocytosis of Legionella pneumophila by macrophages is mediated by phosphatidylinositol 3-kinase. PLoS One. 2008, 3: e3324-10.1371/journal.pone.0003324.PubMed CentralPubMedView Article
- Ensminger AW, Isberg RR: Legionella pneumophila Dot/Icm translocated substrates: a sum of parts. Curr Opin Microbiol. 2009, 12: 67-73. 10.1016/j.mib.2008.12.004.PubMed CentralPubMedView Article
- Cirillo SL, Yan L, Littman M, Samrakandi MM, Cirillo JD: Role of the Legionella pneumophila rtxA gene in amoebae. Microbiology. 2002, 148: 1667-1677.PubMedView Article
- Gomez-Valero L, Rusniok C, Buchrieser C: Legionella pneumophila: population genetics, phylogeny and genomics. Infect Genet Evol. 2009, 9: 727-739. 10.1016/j.meegid.2009.05.004.PubMedView Article
- Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-genome. Curr Opin Genet Dev. 2005, 15: 589-594. 10.1016/j.gde.2005.09.006.PubMedView Article
- Rocha EP: Evolutionary patterns in prokaryotic genomes. Curr Opin Microbiol. 2008, 11: 454-460. 10.1016/j.mib.2008.09.007.PubMedView Article
- Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, Crabtree J, Sebaihia M, Thomson NR, Chaudhuri R: The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. 2008, 190: 6881-6893. 10.1128/JB.00619-08.PubMed CentralPubMedView Article
- Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS: Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proc Natl Acad Sci USA. 2005, 102: 13950-13955. 10.1073/pnas.0506758102.PubMed CentralPubMedView Article
- Lefebure T, Stanhope MJ: Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol. 2007, 8: R71-10.1186/gb-2007-8-5-r71.PubMed CentralPubMedView Article
- Coscolla M, Gonzalez-Candelas F: Comparison of clinical and environmental samples of Legionella pneumophila at the nucleotide sequence level. Infect Genet Evol. 2009, 9: 882-888. 10.1016/j.meegid.2009.05.013.PubMedView Article
- Harrison TG, Afshar B, Doshi N, Fry NK, Lee JV: Distribution of Legionella pneumophila serogroups, monoclonal antibody subgroups and DNA sequence types in recent clinical and environmental isolates from England and Wales (2000-2008). Eur J Clin Microbiol Infect Dis. 2009, 28: 781-791. 10.1007/s10096-009-0705-9.PubMedView Article
- Correia FF, D'Onofrio A, Rejtar T, Li L, Karger BL, Makarova K, Koonin EV, Lewis K: Kinase activity of overexpressed HipA is required for growth arrest and multidrug tolerance in Escherichia coli. J Bacteriol. 2006, 188: 8360-8367. 10.1128/JB.01237-06.PubMed CentralPubMedView Article
- Korch SB, Hill TM: Ectopic overexpression of wild-type and mutant hipA genes in Escherichia coli: effects on macromolecular synthesis and persister formation. J Bacteriol. 2006, 188: 3826-3836. 10.1128/JB.01740-05.PubMed CentralPubMedView Article
- Mazzariol A, Cornaglia G, Nikaido H: Contributions of the AmpC beta-lactamase and the AcrAB multidrug efflux system in intrinsic resistance of Escherichia coli K-12 to beta-lactams. Antimicrob Agents Chemother. 2000, 44: 1387-1390. 10.1128/AAC.44.5.1387-1390.2000.PubMed CentralPubMedView Article
- Hayes F: A family of stability determinants in pathogenic bacteria. J Bacteriol. 1998, 180: 6415-6418.PubMed CentralPubMed
- Jiang Y, Yang F, Zhang X, Yang J, Chen L, Yan Y, Nie H, Xiong Z, Wang J, Dong J: The complete sequence and analysis of the large virulence plasmid pSS of Shigella sonnei. Plasmid. 2005, 54: 149-159. 10.1016/j.plasmid.2005.03.002.PubMedView Article
- McClain MS, Hurley MC, Brieland JK, Engleberg NC: The Legionella pneumophila hel locus encodes intracellularly induced homologs of heavy-metal ion transporters of Alcaligenes spp. Infect Immun. 1996, 64: 1532-1540.PubMed CentralPubMed
- Busenlehner LS, Pennella MA, Giedroc DP: The SmtB/ArsR family of metalloregulatory transcriptional repressors: Structural insights into prokaryotic metal resistance. FEMS Microbiol Rev. 2003, 27: 131-143. 10.1016/S0168-6445(03)00054-8.PubMedView Article
- Doleans-Jordheim A, Akermi M, Ginevra C, Cazalet C, Kay E, Schneider D, Buchrieser C, Atlan D, Vandenesch F, Etienne J, Jarraud S: Growth-phase-dependent mobility of the lvh-encoding region in Legionella pneumophila strain Paris. Microbiology. 2006, 152: 3561-3568. 10.1099/mic.0.29227-0.PubMedView Article
- Samrakandi MM, Cirillo SL, Ridenour DA, Bermudez LE, Cirillo JD: Genetic and phenotypic differences between Legionella pneumophila strains. J Clin Microbiol. 2002, 40: 1352-1362. 10.1128/JCM.40.4.1352-1362.2002.PubMed CentralPubMedView Article
- Segal G, Russo JJ, Shuman HA: Relationships between a new type IV secretion system and the icm/dot virulence system of Legionella pneumophila. Mol Microbiol. 1999, 34: 799-809. 10.1046/j.1365-2958.1999.01642.x.PubMedView Article
- Molofsky AB, Swanson MS: Legionella pneumophila CsrA is a pivotal repressor of transmission traits and activator of replication. Mol Microbiol. 2003, 50: 445-461. 10.1046/j.1365-2958.2003.03706.x.PubMedView Article
- Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Moineau S, Romero DA, Horvath P: CRISPR provides acquired resistance against viruses in prokaryotes. Science. 2007, 315: 1709-1712. 10.1126/science.1138140.PubMedView Article
- Grissa I, Vergnaud G, Pourcel C: The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinformatics. 2007, 8: 172-10.1186/1471-2105-8-172.PubMed CentralPubMedView Article
- Ling JM, Moore RA, Surette MG, Woods DE: The mviN homolog in Burkholderia pseudomallei is essential for viability and virulence. Can J Microbiol. 2006, 52: 831-842. 10.1139/W06-042.PubMedView Article
- Inoue A, Murata Y, Takahashi H, Tsuji N, Fujisaki S, Kato J: Involvement of an essential gene, mviN, in murein synthesis in Escherichia coli. J Bacteriol. 2008, 190: 7298-7301. 10.1128/JB.00551-08.PubMed CentralPubMedView Article
- Ruiz N: Bioinformatics identification of MurJ (MviN) as the peptidoglycan lipid II flippase in Escherichia coli. Proc Natl Acad Sci USA. 2008, 105: 15553-15557. 10.1073/pnas.0808352105.PubMed CentralPubMedView Article
- Al-Khodor S, Price CT, Habyarimana F, Kalia A, Abu Kwaik Y: A Dot/Icm-translocated ankyrin protein of Legionella pneumophila is required for intracellular proliferation within human macrophages and protozoa. Mol Microbiol. 2008, 70: 908-923.PubMed CentralPubMed
- Jensen RB, Gerdes K: Programmed cell death in bacteria: proteic plasmid stabilization systems. Mol Microbiol. 1995, 17: 205-210. 10.1111/j.1365-2958.1995.mmi_17020205.x.PubMedView Article
- Kenig M, Abraham EP: Antimicrobial activities and antagonists of bacilysin and anticapsin. J Gen Microbiol. 1976, 94: 37-45.PubMedView Article
- Brassinga AK, Hiltz MF, Sisson GR, Morash MG, Hill N, Garduno E, Edelstein PH, Garduno RA, Hoffman PS: A 65-kilobase pathogenicity island is unique to Philadelphia-1 strains of Legionella pneumophila. J Bacteriol. 2003, 185: 4630-4637. 10.1128/JB.185.15.4630-4637.2003.PubMed CentralPubMedView Article
- Rodionov DA, Hebbeln P, Gelfand MS, Eitinger T: Comparative and functional genomic analysis of prokaryotic nickel and cobalt uptake transporters: evidence for a novel group of ATP-binding cassette transporters. J Bacteriol. 2006, 188: 317-327. 10.1128/JB.188.1.317-327.2006.PubMed CentralPubMedView Article
- Bezsudnova EY, Sorokin DY, Tikhonova TV, Popov VO: Thiocyanate hydrolase, the primary enzyme initiating thiocyanate degradation in the novel obligately chemolithoautotrophic halophilic sulfur-oxidizing bacterium Thiohalophilus thiocyanoxidans. Biochim Biophys Acta. 2007, 1774: 1563-1570.PubMedView Article
- Bandyopadhyay P, Liu S, Gabbai CB, Venitelli Z, Steinman HM: Environmental mimics and the Lvh type IVA secretion system contribute to virulence-related phenotypes of Legionella pneumophila. Infect Immun. 2007, 75: 723-735. 10.1128/IAI.00956-06.PubMed CentralPubMedView Article
- Franco IS, Shuman HA, Charpentier X: The perplexing functions and surprising origins of Legionella pneumophila type IV secretion effectors. Cell Microbiol. 2009, 11 (10): 1435-43. 10.1111/j.1462-5822.2009.01351.x.PubMedView Article
- Skriwan C, Fajardo M, Hagele S, Horn M, Wagner M, Michel R, Krohne G, Schleicher M, Hacker J, Steinert M: Various bacterial pathogens and symbionts infect the amoeba Dictyostelium discoideum. Int J Med Microbiol. 2002, 291: 615-624. 10.1078/1438-4221-00177.PubMedView Article
- Shevchuk O, Batzilla C, Hagele S, Kusch H, Engelmann S, Hecker M, Haas A, Heuner K, Glockner G, Steinert M: Proteomic analysis of Legionella-containing phagosomes isolated from Dictyostelium. Int J Med Microbiol. 2009, 299 (7): 489-508. 10.1016/j.ijmm.2009.03.006.PubMedView Article
- Cazalet C, Jarraud S, Ghavi-Helm Y, Kunst F, Glaser P, Etienne J, Buchrieser C: Multigenome analysis identifies a worldwide distributed epidemic Legionella pneumophila clone that emerged within a highly diverse species. Genome Res. 2008, 18: 431-441. 10.1101/gr.7229808.PubMed CentralPubMedView Article
- Jepras RI, Fitzgeorge RB, Baskerville A: A comparison of virulence of two strains of Legionella pneumophila based on experimental aerosol infection of guinea-pigs. J Hyg (Lond). 1985, 95: 29-38.View Article
- Staden R: The Staden sequence analysis package. Mol Biotechnol. 1996, 5: 233-241. 10.1007/BF02900361.PubMedView Article
- Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999, 27: 4636-4641. 10.1093/nar/27.23.4636.PubMed CentralPubMedView Article
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView Article
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.PubMed CentralPubMedView Article
- Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M: KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res. 2007, 35: W182-185. 10.1093/nar/gkm321.PubMed CentralPubMedView Article
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25: 955-964. 10.1093/nar/25.5.955.PubMed CentralPubMedView Article
- Silva FJ, Belda E, Talens SE: Differential annotation of tRNA genes with anticodon CAT in bacterial genomes. Nucleic Acids Res. 2006, 34: 6015-6022. 10.1093/nar/gkl739.PubMed CentralPubMedView Article
- Ghai R, Hain T, Chakraborty T: GenomeViz: visualizing microbial genomes. BMC Bioinformatics. 2004, 5: 198-10.1186/1471-2105-5-198.PubMed CentralPubMedView Article
- Sanders HL: Marine Benthic Diversity: A Comparative Study. The American Naturalist. 1968, 102: 243-282. 10.1086/282541.View Article
- Darling AC, Mau B, Blattner FR, Perna NT: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004, 14: 1394-1403. 10.1101/gr.2289704.PubMed CentralPubMedView Article
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.PubMedView Article
- Hilton MD, Alaeddinoglu NG, Demain AL: Synthesis of bacilysin by Bacillus subtilis branches from prephenate of the aromatic amino acid pathway. J Bacteriol. 1988, 170: 482-484.PubMed CentralPubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.