SHV Lactamase Engineering Database: a reconciliation tool for SHV β-lactamases in public databases

Background SHV β-lactamases confer resistance to a broad range of antibiotics by accumulating mutations. The number of SHV variants is steadily increasing. 117 SHV variants have been assigned in the SHV mutation table (http://www.lahey.org/Studies/). Besides, information about SHV β-lactamases can be found in the rapidly growing NCBI protein database. The SHV β-Lactamase Engineering Database (SHVED) has been developed to collect the SHV β-lactamase sequences from the NCBI protein database and the SHV mutation table. It serves as a tool for the detection and reconciliation of inconsistencies, and for the identification of new SHV variants and amino acid substitutions. Description The SHVED contains 200 protein entries with distinct sequences and 20 crystal structures. 83 protein sequences are included in the both the SHV mutation table and the NCBI protein database, while 35 and 82 protein sequences are only in the SHV mutation table and the NCBI protein database, respectively. Of these 82 sequences, 41 originate from microbial sources, and 22 of them are full-length sequences that harbour a mutation profile which has not been classified yet in the SHV mutation table. 27 protein entries from the NCBI protein database were found to have an inconsistency in SHV name identification. These inconsistencies were reconciled using information from the SHV mutation table and stored in the SHVED. The SHVED is accessible at http://www.LacED.uni-stuttgart.de/classA/SHVED/. It provides sequences, structures, and a multisequence alignment of SHV β-lactamases with the corrected annotation. Amino acid substitutions at each position are also provided. The SHVED is updated monthly and supplies all data for download. Conclusions The SHV β-Lactamase Engineering Database (SHVED) contains information about SHV variants with reconciled annotation. It serves as a tool for detection of inconsistencies in the NCBI protein database, helps to identify new mutations resulting in new SHV variants, and thus supports the investigation of sequence-function relationships of SHV β-lactamases.


Background
Since the application of penicillin to the clinical practice in the 1940s, the effectiveness of b-lactam antibiotics have been reduced drastically [1][2][3]. One of the main reasons is the hydrolysis of their b-lactam ring by b-lactamases (EC 3.5.2.6) resulting in a loss of function. These enzymes, especially SHV and TEM b-lactamase variants, accumulate mutations gradually [4,5] to resist b-lactam antibiotics and rapidly spread over the world [6][7][8].
SHV b-lactamases belong to class A b-lactamases and have a serine in the active site [9]. The premature protein consists of 286 amino acids. The first 21 amino acids at the N-terminus form the signal sequence and are removed to yield the mature enzyme [10]. SHV b-lactamases were first described in the members of the genus Klebsiella as a narrow-spectrum b-lactamase against penicillin [6,11]. Their genes are located either in the bacterial chromosome or on a plasmid [12]. Genes encoding these enzymes have been mutated rapidly and transferred to other Gram-negative bacteria in different geographical regions [6]. Currently, 117 SHV variants have been described. A list of assigned SHV variants was compiled and maintained by Jacoby and Bush [13] which is referred further in this paper as "SHV mutation table". Beside the SHV mutation table, sequence information on SHV b-lactamases can also be found in the NCBI protein database [14]. One of the important data sources of the NCBI protein database is the NCBI nucleotide database which is open for submission of new sequences without further validation; therefore it is growing rapidly, but contains inconsistencies. In contrast, the SHV mutation table is manually curated by experts in the b-lactamase field and therefore is widely accepted as a reliable and consistent information source. In the SHV mutation table, each SHV variant is characterized by its name and mutation profile which is a set of amino acid substitutions at certain positions in the sequence. Positions are identified according to the Ambler numbering scheme [15]. To become listed in the SHV mutation table as a new SHV b-lactamase, it must have arisen naturally, is fully sequenced, and harbors a new mutation profile [13]. Therefore, engineered proteins are not considered.
The SHV Engineering Database (SHVED) was built up as a comprehensive inventory by collecting data on SHV b-lactamases from these two databases to facilitate detection of inconsistencies in entries derived from NCBI protein database and to eventually reconcile them, to detect new SHV b-lactamases with novel mutation profiles, and to identify new amino acid positions at which mutations can occur.

Construction and content
Construction Development and construction of SHVED Amino acid sequence of SHV-1 originated from Klebsiella pneumoniae (GenInfo (GI): 4337048) was used as a seed sequence for building up the SHVED. A BLAST search [16] was performed against the NCBI protein database [14] without filtering of low complexity regions and with a low E-value threshold (10 -124 ) to prevent the occurrence of TEM lactamases and other non-SHV lactamases in the BLAST results. For each hit in the BLAST result, the GI was extracted and the complete XML entry was downloaded from the NCBI protein database. Information on sequence, position-specific annotations, functional descriptions, and source organism was extracted from the entry and parsed by an automated retrieval system into an in-house developed relational database system [17]. For BLAST results representing protein structures, monomers were extracted from the PDB [18] and deposited as structure entries.
Sequences generated from the annotated mutation profiles deposited in the SHV mutation table [13] were also incorporated into the SHVED. Except for 16 assigned SHVs which were "withdrawn" or "not yet released", 117 assigned SHV sequences were generated and parsed into the SHVED using the available information on amino acid exchanges and the reference sequence SHV-1. On the webpage, the "source organism" of these sequences was set to "Clinical sample" and the data source to 'lc' abbreviated from "Lahey Clinic" where the SHV mutation table is hosted.

Identification and naming of SHV b-lactamase sequences
Each protein sequence in the SHVED was aligned with SHV-1 using ClustalW [19] to identify its mutation profile. This mutation profile is the set of amino acid exchanges, deletions, and insertions occurring in a certain SHV, e.g. L35Q for the substitution of leucine at position 35 by glutamine. Subsequently, the mutation profile was matched against the mutation profiles listed in the SHV mutation table to identify whether the respective protein sequence is identical to an already assigned SHV. If the mutation profiles were identical, the protein was named accordingly (e.g. "SHV-3").
Otherwise it was named "SHV-like" and its mutation profile was stored. In the case of sequences longer than SHV-1, only the region corresponding to SHV-1 was examined to identify the mutation profile. Amino acid insertions arising inside the protein sequence were annotated, e.g. "-162.1D -162.2R" for the insertion of two residues aspartic acid and arginine after the residue at position 162. The amino acid deletion was annotated with the corresponding residue and position, e.g. "G54-" for the deletion of a glycine at position 54.
For sequences longer than SHV-1, the number of additional residues was recorded, e.g. "C+5" for a sequence 5 residues longer at its C terminus. Sequences shorter than SHV-1 were considered as fragments of the respective SHV sequences or the SHV-like sequences, although they were probably named differently in the entry of the source database. The number of missing residues at the N-and C-terminus were annotated, e.g. "N-21 C-3" for 21 and 3 residues missing at the N-and C-terminus, respectively.

Multisequence alignment and feature annotation
The annotation information was enriched by performing multisequence alignment using CLUSTALW [19]. Information on secondary structure calculated using DSSP [20] were also included in the SHVED. Individual residues in the sequence as well as in the alignments were numbered according to the standard scheme suggested by Ambler [15] Reconciliation of data inconsistencies A systematic comparison of entries of the NCBI protein database and the SHV mutation table allows a reconciliation of NCBI protein database entries which have an inconsistent annotation. In the SHVED, the wrong name assignment is corrected if its mutation profile is already included in SHV mutation table. A sequence with a new mutation profile is stored in the SHVED as new SHV b-lactamase, even if it has been named by the authors by a (wrong) SHV name in the NCBI protein database. A link from the reconciled SHVED entry to the original NCBI protein database entry allows the author of the respective entry to correct an erroneous entry.

Content
Data content of the SHVED 452 protein sequence entries from NCBI protein database and 117 protein sequences from SHV mutation table were collected and parsed into the SHVED, resulting in 200 distinct protein entries. 20 crystal structures of 2 SHV b-lactamases (SHV-1 and SHV-2) were stored in the SHVED. 19 crystal structures were from SHV-1 with one or two engineered mutations. Apart from the structure (PDB entry 3D4F) which is full-length sequence, all crystal structures lack the 21 residues of the N-terminal signal sequence. Two protein sequences (PDB entries 2A3U and 2A49) possess 5 and 4 additional residues, respectively, at their C-terminus (Table 1).
Of the 200 proteins, 35 SHV sequences were derived from SHV mutation table, but not from the NCBI protein database, 82 protein sequences were exclusively found in the NCBI protein database, and 83 protein sequences were accessible in both source databases. In 82 protein sequences found only in the NCBI protein database, there are 41 sequences which originate from microbial sources and harbor a new mutation profile. 22 are full-length sequences (table 2) and 19 are fragments  (table 3).

Analysis of amino acid substitutions and substitution positions
In addition to the amino acid substitutions described in the SHV mutation table [13], 27 new substitution    2). Not only the substitution at new positions, but also new amino acid exchanges at already known positions were found. As an example, the protein sequence with GI 259038268 harbors an lysine at the position 252 instead of a proline. In the SHV mutation table, only the substitution P252G is described.

Data inconsistencies
There are 27 distinct protein entries derived from the NCBI protein database having inconsistent annotations (table 4). In all cases, the annotated SHV name is inconsistent with its mutation profile. For example, the protein sequence with GI 40950644 has three mutations (L35Q, G238S, and E240K), therefore, it should be named "SHV-12" according to the SHV mutation table, but it is actually annotated as "betalactamase SHV-5" in the NCBI protein database. In 12 cases, the protein sequence is a fragment and therefore there is not enough information to rename it in the SHVED.

Utility
A multisequence alignment of all 200 protein entries was generated using CLUSTALW. For protein structures, all sequence entries were included and displayed with aligned secondary structure information. Proteins were labeled by the GIs and linked to the NCBI protein database. Annotation of individual residues is visualized by color-coding in the alignment and upon moving the  N-x C-y: sequence lacks × amino acids at the N terminus and y amino acids at the C terminus. The mutatation profile is given in square brackets.   cursor over the respective residue. The SHVED is accessible at http://www.LacED.uni-stuttgart.de/classA/ SHVED by a JavaScript-enabled WWW browser. Protein tables provide information on the protein name, mutation, number of residues missing at the N-and C-terminal (in case of fragments), and on the source organism.
As an alternative to the multisequence alignment, the SHV variants are visualized as mutations relative to the sequence of SHV-1. Substitution positions are colored and annotated by the exchanged amino acids.

Discussion
Data content of the SHVED By systematic analysis of protein sequences in the SHVED, 41 protein sequences with a new mutation profile were identified. 22 of them are full length sequences originating from microbial sources and therefore are candidates for a new SHV number assignment. The new mutations occurred either at new position on the sequence or they were new amino acid exchange at already described positions.

Detection of novel SHV b-lactamases and novel amino acid substitutions
Except for one new mutation profile originating from a synthetic construct (GI 151861), all new mutation profiles originated from microbial sources. As a plasmidbound gene, the SHV b-lactamase encoding bla SHV genes are easily transferred among the members of Gram-negative bacteria, especially Enterobacteriaceae because of their close genetic relationship [6]. However, the information about the substitution at new positions found in these fragments could be used in the future to predict the occurrence of new SHV variants.

Data inconsistencies and reconciliation
In all 27 cases of inconsistency, the annotated name differed from the actual mutation profile. However, the reasons of the inconsistency varied. In the case of the protein sequence with GI 154269503, the lysine at position 256 is substituted by an arginine, while it is reported that the lysine is exchanged by an arginine at position 250 (K250R) [21]. In the SHV mutation table, it is listed as SHV-103 and characterized by the substitution of a leucine at position 250 by an arginine (L250R). A mutation at position 256 is not yet recorded in the SHV mutation table, and the mutation at position 250 can only be seen in the SHV-103. Probably, the difference in amino acid numbering by the author of GI 154269503 and by the curators of the SHV mutation table at Lahey Clinic caused the inconsistence. In the case of the protein sequence with GI 161367444, the inconsistency might derive from the primer used. In the sequence, only one mutation R202S was found, while it is annotated as SHV-104 which has two mutations (M5L and R202S) according to the SHV mutation table. It is noted in the NCBI entry that the forward primer "ATGCGTTA-TATTCGCCTGTGTATT" was used to amplify the target DNA, which results a methionine at position 5. Therefore, the deduced amino acid substitution M5L (if it actually occurred) could not be present in the deposited amino acid sequence, and the deposited amino acid sequence should not be annotated as SHV-104 because it does not harbor the mutation profile 'M5L R202S'. In the case of the protein sequence with GI 15718691, the duplication of a pentapeptide 163DRWET167 was reported [22] and assigned as SHV-16. But in addition, two mutations H96T and Y97H are present in the amino acid sequence. Therefore, it is not clear whether the actual SHV-16 harbors only the pentapeptide duplication or additionally the mutations H96T and Y97H. In other cases of inconsistency, the amino acid sequences were submitted to the NCBI protein database without corresponding publication and showed inconsistencies in their annotation. One example is the protein sequence with GI 30230495. It is annotated as SHV-48 which should harbor mutation V119I according to the SHV mutation table, while actually four mutations (L35Q, R191H, G238S, and E240K) were found in the deposited amino acid sequence. In the SHV mutation table, an inconsistency in residue numbering (position 253 and 255) was revealed and communicated to the curator for correction.

Conclusion
The SHV Lactamase Engineering Database (SHVED) was established to identify new SHV b-lactamases and to identify inconsistencies in public databases. Based on our analysis, 22 candidates for assignment of new SHV names were identified. 27 proteins entries with inconsistencies were found and reconciled. Also, three assigned mutation profiles were identified to be in doubt: SHV-16, SHV-103, and SHV-104. The SHVED thus supports the scientific community to name new SHV b-lactamases and to reconcile existing annotation of SHV b-lactamases sequences.

Availability and requirements
The SHVED is accessible at http://www.LacED.uni-stuttgart.de/classA/SHVED/ by a JavaScript-enabled WWW browser.

Additional material
Additional file 1: Additional_file_1.pdf contains table S1 and table S2 mentioned in the text. They list new mutation profiles of sequences derived from microbial organisms.