The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains

Background Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Results Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Conclusion Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain’s genetic profile from pathogenic to environmental. Electronic supplementary material The online version of this article (10.1186/s12864-017-4301-6) contains supplementary material, which is available to authorized users.


3
Transposons are suitable for ISFinder only if they are perfect. The element must be complete at both termini, must carry a transposase gene, and this gene must not show any evidence of gene disruption, such as in-frame stop codons, frameshifts, or premature termination.
ISFinder names are based on the species name followed by a serial number. Historical elements are an exception to this rule. Currently, a three-letter abbreviation is assigned to each species (e.g. ISAve for Aeromonas veronii). If the corresponding three-letter combination is already in use, a four-letter abbreviation is assigned (e.g. ISAeme for Aeromonas media). Historic transposons may have a name, which is not based on a species name (e.g. IS5) or where only two letters are used to indicate the species (e.g. ISAs for Aeromonas salmonicida). Once a transposon name is assigned, this name is maintained even if the basis for this assignment becomes invalid (e.g. ISApu1 is from Aeromonas caviae, formerly known as Aeromonas punctata). More recent elements are based on the revised species name (e.g. ISAeca1 from Aeromonas caviae).
In ISFinder, all transposons, which have less than 5% sequence difference are considered the same transposon, even if they are found in distinct species. Thus, A. salmonicida contain copies of the transposons IS5 and ISAhy1. To clarify this issue, we refer to these as Aersa_IS5 and Aersa_ISAhy1, the prefix Aersa_ being used for A. salmonicida subsp. pectinolytica strain 34mel. We use e.g. As449_ as prefix for A. salmonicida subsp. salmonicida strain A449.
Different copies of the same transposon are indicated by one serial letter (from _A onwards) or two serial letters (from _AA onwards, if more than 26 copies are found in a genome). If an element is split into several fragments, which is commonly caused by suffering an insertion of another transposon, the fragments receive serial numbers, with 1 being the most 5' fragment.
Our definition of a "complete" element differs from that of ISFinder as we only consider the element termini. Thus, an element with complete termini but a large internal deletion is considered complete in this manuscript, while it would be considered disrupted by ISFinder. We use ISFinder style names for incomplete elements, if the sequence similarity to a complete element is high enough.
MITEs (Miniature Inverted-Terminal repeat Elements) are mobile elements, which are devoid of a transposase gene. A section for MITEs was only recently added to the ISFinder database. Elements are named according to the MITE naming rules of ISFinder.
Elements, which are incomplete at one or both termini, or elements, which have a disrupted transposase gene are not themselves suitable for ISFinder. We have made an attempt to identify a suitable element by BLASTn analysis of the NCBI nr database or the gamma-proteobacteria subsection of NCBI's WGS (whole-genome shotgun) database. If these attempts were not successful, an element suitable for ISFinder cannot be identified. These elements were also named according to an ad hoc nomenclature [15].
Ad hoc names are built from a strain-specific abbreviation, followed by the term IRS (ISrelated sequence) and a serial number. We use AsIRS for elements from A. salmonicida subsp. pectinolytica strain 34mel, As449IRS for elements from A. salmonicida subsp.

Supplementary Text S4: on transposon counting and the meaning of "ISAs11"
We analyzed and counted mobile genetic elements in various strains of Aeromonas.
Likewise, Vincent et al. also reported transposon frequency analyses [16]. For various reasons detailed below, the results of the two studies are not well comparable.
First, two different methods for transposon counting were used. We analyzed strains with a complete, final genome sequence and count every occurrence of a transposon as one element. In contrast, Vincent et al. count mapped sequence reads to a selection of IS elements. In this case, results can vary considerably for strains, which contain high copynumber plasmids: we count the elements once, while the counts in Vincent et al. are proportional to the plasmid's copy number.
The methodological differences also preclude that the same set of strains can be analyzed. Our analysis is restricted to strains with a complete, finalized genome sequence. Most of the strains analyzed by Vincent  salmonicida nor in A. salmonicida subsp. pectinolytica. We attempted to reconcile this discrepancy and identified severe annotation problems with respect to "ISAs11". There is confusion between ISAs11 and ISAs3, which seems to reoccur in Aeromonas annotations and publications: salmonicida subsp. salmonicida strain A449 was sequenced by Reith et al. [17]. In Table   2 of that publication, ISAs3 is listed as a 1326 bp element, which belongs to the IS256 family. In contrast, ISAs11 is listed as a 2614 bp element, which belongs to family IS21. ISAs11 is used as an equivalent of ISAs3 in subsequent publications. A rearrangement was detected in a copy of this plasmid (pAsa5 in strain 01-B526, accession KY555069) and was described to be caused by "IS11 from the IS256 family" [18], again referring to the transposon listed as ISAs3 and not to the transposon listed as ISAs11 in Reith et al. Also, in Vincent et al., ISAs11 is made responsible for one key gene inactivation event, which is assigned to the ancestor of the psychrophilic strains. When 6 describing ISAs11, the authors refer to plasmid pAsal1 from strain 01-B526 (Genbank Accession: AJ508382, [19]). This plasmid contains a single transposon, namely ISAs3.  Table 2, belongs to the IS21 family, is 2614 bp long, and is reported to have 12 complete plus 3 partial copies in the strain A449 genome. These data are near-identical to the transposon, which we have submitted to ISFinder as ISAs29. This element belongs to the IS21 family, is 2613 bp long, and we found 12 complete and 2 partial copies in the strain A449 genome. The 34mel genome also contains 4 complete copies of this transposon. The 28 kb insert in pFBAOT6 Tn1721 starts right after the "basic transposon" Tn1722, separating it from the Tn1721-specific extension. This insert terminates with a sequence that has only 3 point mutations to the 6.5 kb transposon ISPa38, which has similarity to Tn3-type transposons. genome as the 4 th shared section (named AsIRS13). The Tn1721-specific extension, which follows on plasmid pFBAOT6, is not present in the strain 34mel genome. This extension codes for tetracyclin resistance genes.

Supplementary
Only part of the "basic transposon" Tn1722 is shared between pFBAOT6 and the 34mel genome. The Tn1722 -specific sequence codes for a gene, which is related to methylaccepting chemotactic proteins and has been shown to interfere with chemotaxis upon overexpression [20]. In the strain 34mel genome, this is replaced by a different set of passenger genes so that TnAs1, a distinct complete transposon is formed. The TnAs1specific sequence is near-identical to a region from the IncP-9 TOL plasmid pWW0 from Pseudomonas putida (Supplementary Table S6). This exemplifies the high plasticity of Tn3-type transposons. Due to the size distribution of the gDNA after isolation, the DNA was directly subjected to the Library Preparation. The final long insert PacBio libraries were size selected for fragments larger than 7 kb using the BluePippin device. PacBio SMRT sequencing was performed with the P4/C2 chemistry and 180 minutes sequencing resulting in 118 K raw subreads, which were subjected to the assembly processing.   A  B  C  D  E  F  G  H  I  J  1  2  2  3  1  4  5  6  7  4  version  212  C  C  C  C  C  C  C  A  C  C  461 GCG GCG GCG GCG GCG GCG GCG GCG ATA GCG  1006  A  A  A  G  A  A  A  A  G  A  1011 TGC CCA CCA TGT TGC TGT CCA TGC TGC TGT  1020  GC  TG  TG  AC  GC  AC TG  GC  AC  AC  1138  T  C  C  T  T  T  T  T  T

Supplementary Tables
T T  T  C  T  T  T  T  T  T  174 ATAG ATAG ATAG ATAG ATAG ATAG ACG ATAG ATAG ATAG  1158  T  T  C  T  T  T  T  T  C  C  1167  A  A  G  A  A  A  A  A  G  G  2826  C  T  T  T  T  T  T  T  T  C  2839  T  C  C  C  C  C  C  C  C  T  2842  A  A  G  A  A  G  A  A  A  A  2844  C  T  C  T  T  C  T  T  T  C  2847  T  T  C  T  T  C  T  T  T  T  2849  CA  TG  TG  TG  TG  TG TG  TG  TG  CA  2863 G  Figure 4). Each gene fragment is assigned its locus tag.

code domain name PFAM number description
Asalp_45687 osmC PF02566 Osmotically inducible protein C (OsmC) (P23929) is a stress -induced protein found in E. coli. This family also contains a organic hydroperoxide detoxification protein (O68390) that has a novel pattern of oxidative stress regulation [21]. Asalp_45690 adh_short_C2 PF13561 The short-chain dehydrogenases/reductases family (SDR) [22] is a very large family of enzymes, most of which are known to be NAD-or NADPdependent oxidoreductases. Asalp_45693 TetR_N PF00440 This entry represents a DNA-binding domain with a helix-turn-helix (HTH) structure that is found in several bacterial and archaeal transcriptional regulators, such as TetR, the tetracycline resistance repressor. Asalp_45696 CUPIN_7 (2x) PF12973 This clan represents the conserved barrel domain of the 'cupin' superfamily ('cupa' is the Latin term for a small barrel). The cupin fold is found in a wide variety of enzymes, but notably also contains the non-enzymatic seed storage proteins [23,24]. The cupin domain is also found in transcriptional activator ChrR [25] and other proteins.