Genome reannotation of Escherichia coli CFT073 with new insights into virulence
© Luo et al; licensee BioMed Central Ltd. 2009
Received: 25 March 2009
Accepted: 22 November 2009
Published: 22 November 2009
As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome.
The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/.
As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design.
Uropathogenic Escherichia coli (UPEC) strains lead to 70-90% of the estimated annual 150 million community-acquired urinary tract infections (UTIs) . As a member of UPEC, the complete genome of strain CFT073 (serotype O6:K2:H1) was sequenced in 2002 [GenBank: AE014075.1] , which has a 5,231,428 bp chromosome without plasmid and is 590,209 bp longer than the well-studied K-12 MG1655 strain. The difference in the CFT073 genome is mostly caused by five unique cryptic inserted prophage genomes that contain a large portion of virulence or virulence-associated genes, referred to as pathogenicity islands (PAIs) . At the time of this writing, the release in RefSeq annotates 5,339 protein-coding genes, 89 tRNA genes, 21 rRNA genes, and 6 miscellaneous RNA genes . In-depth analysis reveals that 3,190 genes (3,925,047 bp, 75.0%) are considered as conserved backbone genes, while the rest (1,306,391 bp, 25.0%), known as CFT073-specific islands, inserts into the backbone regions in an extensive mosaic manner. Regarding virulence and virulence-associated genes, the annotation includes 12 types of fimbriae, 7 autotransporters, and toxin operons such as hlyCABD and upxBDA . Since its first release, the annotation not only presents an overview of the complexity of the pathogen's lifestyle, but also has served as a guide for experimental design.
However, several lines of evidence suggest a need for the reannotation of the Escherichia coli CFT073 genome, partially due to discoveries and corrections overtime for the original RefSeq annotation even updated with some minor corrections. For example, new autotransporter encoding genes and some vital population density control factors are missing from the annotation [4, 5], while more and more novel small RNAs (sRNAs) that have recently been found to add to the complexity of virulence regulatory networks . In addition, a computational estimation suggests that the annotation quality of the translation initiation site is surprisingly lower in this strain than in its close relative, K-12 MG1655 . Moreover, similar observation, along with low annotation quality in CDSs, has been demonstrated in other E. coli strains (for example, APEC O1), by syntactic annotation methods . Such an observation indicates that the highly diverse adaptive paths in different E. coli are responsible for the requirement of more sophisticated annotation methods rather than traditional ones. As a systematic issue, research on how CFT073 establishes its virulence during the UTI process needs a more comprehensive and precise picture of the genomic structure of this pathogen instead of piecemeal information. Therefore, a thorough reannotation of CFT073 is justified for future studies.
Reannotation is a process to annotate a previously annotated genome by using better bioinformatics methods and more complete databases . Working toward improvement of gene structure as well as functionary information, the importance of genome reannotation has been recognized even before the completion of the first genome sequence [9, 10]. However, out of the total number of sequenced microbial genomes (845 at the time of writing), examples of genome-wide reannotations are surprisingly rare . With a few number of documented projects [11–14], nevertheless, several common features can be summarized. Firstly, the functional examination of genes already annotated has become a common practice in reannotation, thanks to the advances of sequence comparison and new experimental data from literature [11–14]. Secondly, new genes may also be described, with evidences mostly from de novel gene prediction or sequence comparison to public databases like SWISS-PROT , and to a less degree from experimental genome analysis data . Finally, almost all projects involve manual efforts to offer more precise designations to expert curators, and thus help avoid flawed research. In addition to a genome-wide analysis, particular interest may be directed to subsets of genes. For instance, Chen et al.  focused on assignment of function to genes recognized as being "hypothetical" in previous annotations.
In this work, we combine automated annotation tools with manual efforts to provide a comprehensive and precise reannotation of the Escherichia coli CFT073 genome. Hereby we refer to the current release of RefSeq annotation as the original annotation [RefSeq: NC_004431] for CFT073, although the very first annotation in 2002 has already been updated with some minor corrections. With a focus on virulence genes, the reannotation was achieved by using literature curation and applications of several analytical methods including gene finding tools, sequence/domain similarity search and transmembrane region analysis. As a result, 608 coding sequences (CDSs) annotated in RefSeq were excluded, while a total of 299 CDSs are new to the original annotation and one third of these are found in genomic island (GI) regions. Subsequent analysis were conducted by both general and case studies on genes that are crucial during the UTI process, including invasion, colonization, nutrition uptake and population density control. Besides virulence factors, miscellaneous RNAs are believed to contribute to the virulence of strain CFT073 . Therefore, the reannotation presents a total of 40 new miscellaneous RNA genes based on literature curation and database searching. The CFT073 reannotation resource is freely available via http://mech.ctb.pku.edu.cn/CFT073/. Following the proposal by Salzberg , the reannotation website includes three sections: a brief overview of the methods for reannotation, links to browse the reannotation, and links for data download.
In general, the new CDSs and miscellaneous RNA genes bring new perspectives to the virulence properties of this pathogen. We expect the reannotation to be complementary to the original annotation, with the hope to facilitate the study of new mechanisms of uropathogenicity in CFT073 for a variety of research communities.
Results & Discussion
CDS calling and gene start annotation
Overview of the differences between the original RefSeq annotation and the reannotation
Miscellaneous RNAs a
Backbone genes b
4,550 (4,440 protein-coding genes, 85 tRNA genes, 21 rRNA genes, and 4 miscellaneous RNA genes)
4,328 (4,178 protein-coding genes, 85 tRNA genes, 21 rRNA genes, and 44 miscellaneous RNA genes)
Genomic island genes c
905 (899 protein-coding genes, 4 tRNA genes, and 2 miscellaneous RNA genes)
851 (845 protein-coding genes, 4 tRNA genes, and 2 miscellaneous RNA genes)
We further manually examined all the pseudogenes (94 samples) in the original annotation. Due to shifting, trimming and splitting, some of the pseudogenes are identified to be protein-coding genes in the reannotation. For example, the annotated pseudogene, c0707 [RefSeq GeneID: 1036199], contains two parts, of which a new gene (c0056r) was reannotated as citrate lyase carrier gene citD by gene context analysis. In the reannotation, this new gene is surrounded by citrate carrier protein coding gene citG [RefSeq protein_id: NP_752632] and citrate lyase coding genes citXFE [RefSeq: NP_752633; NP_752635; NP_752637] in the upstream, and lyase ligase gene citC [RefSeq: NP_752638], sensor kinase gene citA [RefSeq: NP_752639], and transcriptional regulatory protein coding gene citB [RefSeq: NP_752640] in the downstream, and furthermore, is found to be essential to the citrate pathway . Thus the reannotation eliminated the possibility of false interpretation introduced by the original annotation. As a result of the thorough inspection, 35 of the 94 pseudogenes have been directly identified as coding genes newly added into the reannotation, while 55 of them are associated with dozens of new coding genes due to trimming, elongation, splitting or merging along the genomic DNA strand.
Clusters of genes were also manually analyzed in the reannotation. As the most significant characteristic of the E. coli CFT073 genome, the GIs, especially PAIs, differ from the backbone genome by possessing clusters of alien genes, especially virulence factor and virulence-associated factor genes. The reannotation indicates that more than one third of the newly added protein-coding genes (102/299 (34.11%)) are located in such genomic regions. Many of these genes are found to be complementary to other genes in genomic islands on both regulation and function levels. For instance, the new microcin genes, mcmAI [GeneID: 4194251] and mchIX [GeneID: 1039907], from genomic island PAI-CFT073-serX are required by the Fur-regulated iron concentration-dependent mirocin secretion (more details will follow).
For TIS annotation, we have proposed a computational method to estimate the annotation accuracy of a sequenced genome . The method calculates the accuracy by estimating the true TIS's contribution to the total sequence pattern around annotated TISs, not by simply comparing one set of predictions to another . As found in that paper, the accuracy of RefSeq TIS-annotation is surprisingly low for CFT073 . This is one of the reasons for us to reannotate this strain. With the increasing number of experimentally verified TISs in other genomes, it will be interesting to take these TIS-already-verified genes as references to improve the annotation of TIS in CFT073. In fact, with an alignment of N-terminal sequences (21 amino acids, 100% identity), this has been implemented as a part of a TIS annotation pipeline previously developed for any genome, namely ProTISA . To have high quality of gene start annotation, herein we applied the ProTISA pipeline for gene start relocation of the CFT073 genome .
Briefly, TISs of genes are collected from 1) experimental evidence (including those obtained by alignments of N-terminal sequences; tagged as IPT), 2) conserved domain search (CDC), 3) alignments of orthologous genes (HSC), and 4) predictions from TriTISA  for the rest of genes; a complete list can be retrieved from the ProTISA database [22, 24]. Although annotated by computational methods, TISs in categories of both CDC and HSC are believed to be highly reliable [22, 25, 26]. By taking genes with TISs tagged by IPT as benchmarks, the prediction for CFT073 by TriTISA  reports an accuracy of 95.6% that is 14.1% higher than that of the RefSeq annotation. In addition, by applying the method proposed in , the accuracy of the overall TISs of the reannotation for CFT073 is 19.1% higher than that of the RefSeq annotation (90.0% VS 70.5%). Both are positive towards the high TIS quality of TIS of the reannotation.
Finding of missed intricacy in PAIs
List of newly added mobile genetic element-related genes
Transposase for insertion sequence
Phage integrase family protein
Predicted integrase protein
Homologue to Iso-IS1-insB protein
Putative prophase integrase protein, IntD
R6-like transposase protein
Insertion sequence ATP-binding protein
Insertion sequence protein
Transposase IS3/IS911 family protein
Adjustment in virulence factors
As a uropathogenic strain, CFT073 employs a variety of virulence genes for invasion, adherence, colonization et al., to host cells. Most current studies on this pathogen focus on the virulence factors, such as fim operon, antigen 43 and so on, while missing of such genes in the annotation could be misleading for this direction. For example, several critical genes that might contribute to virulence during urinary tract infections, including hokA and hokC  are absent in the original RefSeq annotation. While in the reannotation, dozens of new protein-coding genes show functions relevant to virulence, including 2 toxic membrane genes, 8 cell-wall associated genes, 7 coilcin/microcin genes, 3 fimbrial regulator genes, and 5 outer membrane receptor genes (see Additional File 2). Specially, we list in Additional File 5 a total of 19 new genes that are likely to contribute to the virulence for strain CFT073, of which includes two genes hokA and hokC. In addition, the reannotation adds a set of small RNA (sRNA) genes which play essential roles in virulence for CFT073 such as oxyS, csrC, and omrAB (see the next subsection).
The newly added colicin and colicin-related genes
Uropathogenic specific S-type colicin
Putative colicin immunity protein
Microcin immunity protein, MchI
Microcin immunity protein, McmI
There are also sets of genes that indirectly contribute to the virulence and are considered virulence factors as well, given that their absence would lead to failure in infections. Particularly, under the extreme environment in the human urinary tract such as high osmotic stress and lack of oxygen, genes in charge of self-adaption are essential to survival in the transition from intestines to its specific niche. Of the new genes in the reannotation, 2 are toxic membrane genes, 8 are cell-wall associated genes, and 5 are outer membrane receptor genes. It is worth noting that some of these are found to be critical in environment sensing and self-adjusting. For instance, c0247r encodes a membrane permeability altering protein which might help CFT073 overcome the high osmolarity in the urinary tract; c0201r produces an anaerobic nitric oxide reductase, which would be essential to CFT073 when in the urinary tract, where oxygen concentration is low and nitrogen is very limited. Therefore, we expect that the reannotation of virulence factors will facilitate a more complete and precise understanding of how this pathogen survives, transfers, and colonizes in human urinary tract.
Mobile genetic elements have been known as being associated with pathogenicity islands in UPEC strains and play an important role in transition from an acute to a chronic state of disease [34, 35]. In this regard, the reannotation has also recovered a set of elements such as transposase and integrase. There are in total 14 newly added genes for this category (Table 2). Among them, c0012r is a transposase for insertion sequences, which is located in a prophage repeat region associated with insertion sequence IS629 and another putative transposase gene (c0139 [RefSeq: NP_752091]). Such genomic structure is similar to several other newly designated integrases/transposases. For example, as insertion elements prefer sites around tRNA genes, both c0053r and c0054r are integrase genes next to a Arg-tRNA gene, instead of two overlapping pseudogenes as reported in the original annotation. Moreover, the region surrounding c0053r and c0054r is a prophage area and contains phage-related genes such as nfrAB [RefSeq: NP_752585; NP_752586] (bacteriophage N4 adsorption genes). However, the picture for these phage-related genes is incomplete in the original annotation because of the missing of transposase/integrase.
To date, about 80 sRNA molecules in E. coli have been identified, many of which control transcriptions of virulence-related genes . However, almost all of the essential small RNAs (sRNAs) are found missing in the original RefSeq annotation. To correct this systematic defect, the reannotation carries out an update to sRNA genes. We combined Rfam9.0 prediction  and literature investigation for sRNA annotation, and thus retrieved a result of 46 samples (see Additional File 6), in which 6 annotated as miscellaneous genes in RefSeq are also included. Most of these sRNAs' functions are verified by experiments . For instance, gadY [GeneID: 2847729] activates a series of reactions in response to the acid environment for better resistance to low pH in the urinary tract, while ryhB [GeneID: 2847761] and fur [RefSeq: NP_752700] (a global iron-concerned regulator gene) repress each other and thus form a loop to control the expression of iron concentration-dependent genes. With these newly added sRNA genes, it is clear that the reannotation provides a more integral view of the regulatory networks in CFT073.
Using a combination of approaches and in-depth analysis, the reannotation of the Escherichia coli CFT073 genome presents a substantial update across the complete genome. To determine the functional annotation of protein-coding genes and RNAs, we deployed both a series of automated annotation tools and manual efforts, incorporating a wide variety of research information by data integration, literature curation, and genomic comparison against the relative strains in E. coli. Major updates include noteworthy correction of all protein-coding genes with 608 from RefSeq annotation being excluded and 299 added, also with 341 where their translation initiation sites were relocated. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. Moreover, the miscellaneous genes (sRNAs) have been updated in number from 6 in the RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, the concerns are more addressed to new protein-coding genes and sRNAs that are crucial or associated with virulence or the UTI process of CFT073. It is apparent that, without the genes newly added in the reannotation, many important functions or regulatory pathways related to the virulence of strain CFT073 cannot be well illuminated. As a result, the reannotation provides a more comprehensive picture of mechanism of uropathogenicity of this UPEC strain. The new genes change the view of its uropathogenicity in different respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation can thus serve as an important resource by providing new information of the genomic structure and organization, as well as gene function. We hope that the detailed analysis will facilitate future exploration of novel virulence mechanisms and help guide experimental design.
The genome sequences of E. coli strains CFT073 [RefSeq: NC_004431] , K-12 substrain MG1655 [RefSeq: NC_000913] , 536 [Refseq: NC_008253]  and UTI89 (with plasmid; [RefSeq: NC_007946]  were taken from RefSeq.
Programs and databases
Predictions of EasyGene1.2  were downloaded from its website. The other three gene-finders, GeneMark.hmm , MED 2.0 , and Glimmer 3.02 , were downloaded, installed and run in local. Other programs include: RPS-blast for conserved domain search (against CDD v2.13 ), blastp  for similarity search (against SWISS-PROT ), gene start prediction with TriTISA , sRNA genes prediction based on Rfam9.0 database , and ARTEMIS 9 for genome browse . Thresholds of e-value at e10-5 and identity score at 30 are set for blastp and RPS-blast.
Virulence factor prediction
Multiple sequence alignment with virulence factor sequences from VFDB  were manipulated by the uses of EMBOSS suit , Mega3.1  and T-coffee . The alignments were automatically shaded according to the default setting of these softwares. The assumed transmembranal protein sequences were examined by HTMSRAP .
- E. coli :
Open reading frame
Translation initiation site
Uropathogenic Escherichia coli
Urinary tract infection.
We would like to thank Prof. Zhen-Su She, and Lingjie Sang, Xiaobin Zheng, and Binbin Lai for beneficial discussions and help. We are grateful to Jaclyn Boyle's help on manuscript proofreading. This work received partial support from the National Natural Science Foundation (30970667, 30770499, 30300071 and 10721403) of China.
- Stamm WE, Norrby SR: Urinary tract infections: disease panorama and challenges. J Infect Dis. 2001, 183 (Suppl 1): S1-4. 10.1086/318850.View ArticlePubMedGoogle Scholar
- Welch RA, Burland V, Plunkett Gr, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR: Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA. 2002, 99 (26): 17020-17024. 10.1073/pnas.252529799.PubMed CentralView ArticlePubMedGoogle Scholar
- Hacker J, Blum-Oehler G, Muhldorfer I, Tschape H: Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol Microbiol. 1997, 23 (6): 1089-1097. 10.1046/j.1365-2958.1997.3101672.x.View ArticlePubMedGoogle Scholar
- Forsman K, Goransson M, Uhlin BE: Autoregulation and multiple DNA interactions by a transcriptional regulatory protein in E. coli pili biogenesis. EMBO J. 1989, 8 (4): 1271-1277.PubMed CentralPubMedGoogle Scholar
- Slechta ES, Mulvey MA: Contact-dependent inhibition: bacterial brakes and secret handshakes. Trends Microbiol. 2006, 14 (2): 58-60. 10.1016/j.tim.2005.12.003.View ArticlePubMedGoogle Scholar
- Gottesman S: Micros for microbes: non-coding regulatory RNAs in bacteria. Trends Genet. 2005, 21 (7): 399-404. 10.1016/j.tig.2005.05.008.View ArticlePubMedGoogle Scholar
- Hu GQ, Zheng X, Ju LN, Zhu H, She ZS: Computational evaluation of TIS annotation for prokaryotic genomes. BMC Bioinformatics. 2008, 9: 160-10.1186/1471-2105-9-160.PubMed CentralView ArticlePubMedGoogle Scholar
- Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, Bingen E, Bonacorsi S, Bouchier C, Bouvet O, Calteau A, Chiapello H, Clermont O, Cruveiller S, Danchin A, Diard M, Dossat C, Karoui ME, Frapy E, Garry L, Ghigo JM, Gilles AM, Johnson J, Le Bouguenec C, Lescat M, Mangenot S, Martinez-Jehanne V, Matic I, Nassif X, Oztas S, Petit MA, Pichon C, Rouy Z, Ruf CS, Schneider D, Tourret J, Vacherie B, Vallenet D, Medigue C, Rocha EPC, Denamur E: Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet. 2009, 5: e1000344-10.1371/journal.pgen.1000344.PubMed CentralView ArticlePubMedGoogle Scholar
- Ouzounis C, Karp P: The past, present and future of genome-wide re-annotation. Genome Biol. 2002, 3 (2): 10.1186/gb-2002-3-2-comment2001. COMMENT2001Google Scholar
- Salzberg S: Genome re-annotation: a wiki solution?. Genome Biol. 2007, 8: 102-10.1186/gb-2007-8-6-r102.PubMed CentralView ArticlePubMedGoogle Scholar
- Gundogdu O, Bentley SD, Holden MT, Parkhill J, Dorrell N, Wren BW: Re-annotation and re-analysis of the Campylobacter jejuni NCTC11168 genome sequence. BMC Genomics. 2007, 8: 162-10.1186/1471-2164-8-162.PubMed CentralView ArticlePubMedGoogle Scholar
- Dandekar T, Huynen M, Regula J, Ueberle B, Zimmermann C, Andrade M, Doerks T, Sanchez-Pulido L, Snel B, Suyama M, Yuan Y, Herrmann R, Bork P: Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames. Nucleic Acids Res. 2000, 28 (17): 3278-3288. 10.1093/nar/28.17.3278.PubMed CentralView ArticlePubMedGoogle Scholar
- Camus J, Pryor M, Medigue C, Cole S: Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology. 2002, 148 (pt 10): 2967-2973.View ArticlePubMedGoogle Scholar
- Chen L, Ma B, Gao N: Reannotation of hypothetical ORFs in plant pathogen Erwinia carotovora subsp. atroseptica SCRI1043. FEBS J. 2008, 275: 198-206.View ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28: 45-48. 10.1093/nar/28.1.45.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD, Ke Z, Krylov D, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Thanki N, Yamashita RA, Yin JJ, Zhang D, Bryant SH: CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 2007, D237-240. 10.1093/nar/gkl951. 35 Database
- Nielsen P, Krogh A: Large-scale prokaryotic gene prediction and comparison to genome annotation. Bioinformatics. 2005, 21 (24): 4322-4329. 10.1093/bioinformatics/bti701.View ArticlePubMedGoogle Scholar
- Besemer J, Borodovsky M: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 2005, W451-454. 10.1093/nar/gki487. 33 Web Server
- Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23 (6): 673-679. 10.1093/bioinformatics/btm009.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhu H, Hu GQ, Yang YF, Wang J, She ZS: MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes. BMC Bioinformatics. 2007, 8: 97-10.1186/1471-2105-8-97.PubMed CentralView ArticlePubMedGoogle Scholar
- Bott M: Anaerobic citrate metabolism and its regulation in enterobacteria. Arch Microbiol. 1997, 167 (2/3): 78-88. 10.1007/s002030050419.View ArticleGoogle Scholar
- Hu GQ, Zheng X, Yang YF, Ortet P, She ZS, Zhu H: ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes. Nucleic Acids Res. 2008, D114-119. 36 Database
- Hu GQ, Zheng X, Zhu H, She ZS: Prediction of translation initiation site with TriTISA. Bioinformatics. 2009, 25: 123-125. 10.1093/bioinformatics/btn576.View ArticlePubMedGoogle Scholar
- ProTISA. [http://mech.ctb.pku.edu.cn/protisa/searchadv.php]
- Frishman D, Mironov A, Mewes HW, Gelfand M: Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res. 1998, 26: 2941-2947. 10.1093/nar/26.12.2941.PubMed CentralView ArticlePubMedGoogle Scholar
- Makita Y, De Hoon MJL, Danchin A: Hon-yaku: a biology-driven Bayesian methodology for identifying translation initiation sites in prokaryotes. BMC Bioinformatics. 2007, 8: 47-10.1186/1471-2105-8-47.PubMed CentralView ArticlePubMedGoogle Scholar
- Aoki SK, Pamma R, Hernday AD, Bickham JE, Braaten BA, Low DA: Contact-dependent inhibition of growth in Escherichia coli. Science. 2005, 309 (5738): 1245-1248. 10.1126/science.1115109.View ArticlePubMedGoogle Scholar
- Tseng TT, Gratwick KS, Kollman J, Park D, Nies DH, Goffeau A, Saier MHJ: The RND permease superfamily: an ancient, ubiquitous and diverse family that includes human disease and development proteins. J Mol Microbiol Biotechnol. 1999, 1: 107-125.PubMedGoogle Scholar
- Rasko DA, Rosovitz MJ, Myers GSA, Mongodin EF, Fricke WF, Gajer P, Crabtree J, Sebaihia M, Thomson NR, Chaudhuri R, Henderson IR, Sperandio V, Ravel J: The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. J Bacteriol. 2008, 190 (20): 6881-6893. 10.1128/JB.00619-08.PubMed CentralView ArticlePubMedGoogle Scholar
- Dastmalchi S, Beheshti S, Morris MB, Church WB: Prediction of rotational orientation of transmembrane helical segments of integral membrane proteins using new environment-based propensities for amino acids derived from structural analyses. FEBS J. 2007, 274 (10): 2653-2660. 10.1111/j.1742-4658.2007.05800.x.View ArticlePubMedGoogle Scholar
- Relman DA, Domenighini M, Tuomanen E, Rappuoli R, Falkow S: Filamentous hemagglutinin of Bordetella pertussis: nucleotide sequence and crucial role in adherence. Proc Natl Acad Sci USA. 1989, 86 (8): 2637-2641. 10.1073/pnas.86.8.2637.PubMed CentralView ArticlePubMedGoogle Scholar
- Fexby S, Bjarnsholt T, Jensen PO, Roos V, Hoiby N, Givskov M, Klemm P: Biological Trojan horse: Antigen 43 provides specific bacterial uptake and survival in human neutrophils. Infect Immun. 2007, 75: 30-34. 10.1128/IAI.01117-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Cascales E, Buchanan SK, Duche D, Kleanthous C, Lloubes R, Postle K, Riley M, Slatin S, Cavard D: Colicin biology. Microbiol Mol Biol Rev. 2007, 71: 158-229. 10.1128/MMBR.00036-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Hochhut B, Wilde C, Balling G, Middendorf B, Dobrindt U, Brzuszkiewicz E, Gottschalk G, Carniel E, Hacker J: Role of pathogenicity island-associated integrases in the genome plasticity of uropathogenic Escherichia coli strain 536. Mol Microbiol. 2006, 61 (3): 584-595. 10.1111/j.1365-2958.2006.05255.x.View ArticlePubMedGoogle Scholar
- Blum G, Ott M, Lischewski A, Ritter A, Imrich H, Tschape H, Hacker J: Excision of large DNA regions termed pathogenicity islands from tRNA-specific loci in the chromosome of an Escherichia coli wild-type pathogen. Infect Immun. 1994, 62 (2): 606-614.PubMed CentralPubMedGoogle Scholar
- Lindberg S, Xia Y, Sonden B, Goransson M, Hacker J, Uhlin BE: Regulatory Interactions among adhesin gene systems of uropathogenic Escherichia coli. Infect Immun. 2008, 76 (2): 771-780. 10.1128/IAI.01010-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005, D121-124. 33 Database
- Blattner FR, Plunkett Gr, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y: The complete genome sequence of Escherichia coli K-12. Science. 1997, 277 (5331): 1453-1474. 10.1126/science.277.5331.1453.View ArticlePubMedGoogle Scholar
- Brzuszkiewicz E, Bruggemann H, Liesegang H, Emmerth M, Olschlager T, Nagy G, Albermann K, Wagner C, Buchrieser C, Emody L, Gottschalk G, Hacker J, Dobrindt U: How to become a uropathogen: comparative genomic analysis of extraintestinal pathogenic Escherichia coli strains. Proc Natl Acad Sci USA. 2006, 103 (34): 12879-12884. 10.1073/pnas.0603038103.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen SL, Hung CS, Xu J, Reigstad CS, Magrini V, Sabo A, Blasiar D, Bieri T, Meyer RR, Ozersky P, Armstrong JR, Fulton RS, Latreille JP, Spieth J, Hooton TM, Mardis ER, Hultgren SJ, Gordon JI: Identification of genes subject to positive selection in uropathogenic strains of Escherichia coli: a comparative genomics approach. Proc Natl Acad Sci USA. 2006, 103 (15): 5977-5982. 10.1073/pnas.0600938103.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.View ArticlePubMedGoogle Scholar
- Yang J, Chen L, Sun L, Yu J, Jin Q: VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics. Nucleic Acids Res. 2008, D539-542. 36 Database
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16 (6): 276-277. 10.1016/S0168-9525(00)02024-2.View ArticlePubMedGoogle Scholar
- Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 2008, 9 (4): 299-306. 10.1093/bib/bbn017.PubMed CentralView ArticlePubMedGoogle Scholar
- Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.View ArticlePubMedGoogle Scholar
- Lloyd AL, Rasko DA, Mobley HLT: Defining genomic islands and uropathogen-specific genes in uropathogenic Escherichia coli. J Bacteriol. 2007, 189 (9): 3532-3546. 10.1128/JB.01744-06.PubMed CentralView ArticlePubMedGoogle Scholar