The complete methylome of Helicobacter pylori UM032

Background The genome of the human gastric pathogen Helicobacter pylori encodes a large number of DNA methyltransferases (MTases), some of which are shared among many strains, and others of which are unique to a given strain. The MTases have potential roles in the survival of the bacterium. In this study, we sequenced a Malaysian H. pylori clinical strain, designated UM032, by using a combination of PacBio Single Molecule, Real-Time (SMRT) and Illumina MiSeq next generation sequencing platforms, and used the SMRT data to characterize the set of methylated bases (the methylome). Results The N4-methylcytosine and N6-methyladenine modifications detected at single-base resolution using SMRT technology revealed 17 methylated sequence motifs corresponding to one Type I and 16 Type II restriction-modification (R-M) systems. Previously unassigned methylation motifs were now assigned to their respective MTases-coding genes. Furthermore, one gene that appears to be inactive in the H. pylori UM032 genome during normal growth was characterized by cloning. Conclusion Consistent with previously-studied H. pylori strains, we show that strain UM032 contains a relatively large number of R-M systems, including some MTase activities with novel specificities. Additional studies are underway to further elucidating the biological significance of the R-M systems in the physiology and pathogenesis of H. pylori. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1585-2) contains supplementary material, which is available to authorized users.

have enzymatic activities independent of each other, and which often, but not always, occur 78 on independent polypeptides. When these two activities occur on the same polypeptide, the 79 system is denoted Type IIG. Both DNA methylation and cleavage occur within or close to a 80 defined recognition site. Type III systems have two subunits, which are products of the mod  [14][15][16][17]. H. pylori is naturally competent and able to take in DNA from the 90 environment [18] as well as being subject to bacteriophage infection [19,20]. Thus, the 91 MTases might also serve as part of the defence mechanism that protects the genome integrity 92 of the bacteria against transmissible DNA elements. On the other hand, strain-specific 93 MTases are thought to influence the phenotypic traits or virulence in pathogens, host 94 specificity and adaptability to micro-environment [21,22]. 95 The study of MTases of H. pylori enhances our understanding of the pathogenic 96 mechanisms of this organism. The discovery of hpyIM, which encodes a Type II MTase that 97 recognizes CATG, revealed that the MTases may play a role in H. pylori physiology beyond 98 the methylation function. The expression of hpyIM is growth-phase regulated and required 99 for normal bacterial morphology [23]. It was shown that the deletion of hpyIM altered the 100 expression of the stress-responsive dnaK operon [24]. A Type II MTase, M.HpyAIV, which 101 recognizes GANTC, has been shown to down-regulate the expression of the katA gene that 102 6 encodes for the catalase, suggesting its importance in the biology of H. pylori [25]

136
Nucleotide sequence accession number 137 The first annotated H. pylori UM032 genome sequence was deposited in   The remaining two systems demonstrated novel recognition motifs (GAAAG and 153 CYANNNNNNNTRG), which were not previously described in H. pylori. The detected 154 methylation motifs are summarized in Table 1, along with the corresponding MTase-155 encoding genes. All but one active R-M system was of Type II, with only one Type I R-M  Table 1, while all MTases 166 not responsible for any activity in the genome or shown to be inactive as clones are shown in 167   Table S3. K747_03825. This is a BcgI-like Type IIG R-M system, comprising two S subunit genes (S1 179 and S2) and a hybrid gene (RM) encoding both MTase and REase domains ( Figure 1). The 180 two S subunit genes (K747_11950 and K747_11945) are separated by a homopolymeric G 181 repeat, which may have resulted in a previously intact single S subunit becoming split as a 182 result of a frameshift mutation. When the RM, S1, and S2 genes were overexpressed together 183 in E. coli, the palindromic motif CYANNNNNNNTRG was found to be methylated just as in 184 the genome. This R-M system was named HpyUM032XIII. Interestingly, when the S1 and S2 185 were artificially fused by "correcting" the frameshift and overexpressed with the RM, a 186 change of methylation pattern was observed leading to recognition of CYANNNNNNNTTC.

187
This is a new specificity that was not detected in the methylome of H. pylori UM032 during 188 normal growth. It was named as HpyUM032XIII-mut1, indicating its artificially derived 189 sequence ( Figure 1). Expressing S2, but not S1, with the RM gene gave no activity. On the 190 basis of these results S1, which only encodes one TRD, must be responsible for recognition  HpyUM032XIII, which resembles the BcgI system in that it consists of a fused RM 254 protein and a separate S protein, also differs from BcgI in that the genetic system encodes 255 two S genes, each of which is one half of the typical length of such genes. It seemed likely 256 that these "half-genes" resulted from a frameshift that had occurred in an ancestral, full-257 length S gene. Although such frameshift often abolish activity, the cloned system, including 258 RM, S1 and S2 demonstrated MTase activity recognizing the palindromic site 259 CYANNNNNNNTRG. Identical activity was observed when the S2 subunit was omitted, and 260 no activity was observed when S1 was omitted, suggesting the activity resulted from a 261 complex of RM and S1 alone. Surprisingly, when S1 and S2 were artificially fused, the 262 recognition sequence had changed and was now CYANNNNNNNTTC (Figure 1). These 263 observations indicate that S1.HpyUM032XIII must contain a TRD capable of recognizing the 264 half-site CYA. Active BcgI, which also recognizes a palindromic sequence, has a where S is replaced by S1. S.BcgI and S1.HpyUM032XIII must each recognize only a single 267 half-site and therefore require dimerization for functionality. By fusing S1 and S2 into a  Further studies are required to verify these hypotheses.   The assembled genome was scanned for homologs of R-M system genes using in-house, 366 BLAST-based software (E-value < 1e-11) to identify putative MTases as previously Type I systems have bipartite recognition sequences consisting of two "half-sites." MTase 373 candidates with predicted specificities were matched where possible with observed motifs 374 found in our motif analyses. If a single candidate MTase existed for an observed motif, then 375 that gene was assumed to be responsible for that particular specificity. If multiple candidates 376 existed for a single motif, no automatic assignment was made. When assigning a novel 377 specificity to a given MTase, the MTase gene sequence was cross-checked against other 378 similar genes in REBASE, and the novel specificity against unassigned SMRT-derived motif 379 data in REBASE. In many cases, the same motif occurred in a different genome with an

389
Mutations to correct the frameshift in the S subunit of K747_03825 and silent mutations to 390 stabilize polynucleotide repeat sequences were likewise introduced using Gibson Assembly.

391
For example, in K747_03825, the 12-bp repeat sequence GGGGGGGGGGGG was changed 392 to GGAGGAGGCGG, which simultaneously introduced silent mutations to prevent 393 replication slippage and shortened the length to 11, bringing S2 in frame with S1. The 394 expression of all MTase genes was under the regulation of the same E. coli P lac promoter 395 present in the pRRS vector. Primer sequences are shown in Table S1. Recombinant constructs were used to transform E. coli ER2683. Restriction analysis was 398 performed to confirm that the bacterial transformants carried the desired plasmid construct.

18
The plasmid constructs were then used to transform E. coli strain ER2796, which lacks 400 endogenous MTase activity. The genomic DNA of the E. coli ER2796 recombinant strain 401 was subjected to SMRT sequencing to determine the resulting methylation pattern. Plasmid 402 sequences were confirmed by re-sequencing the PacBio reads against the plasmid reference.