Structural classification by the Lipase Engineering Database: a case study of Candida antarctica lipase A
© Widmann et al; licensee BioMed Central Ltd. 2010
Received: 9 December 2009
Accepted: 19 February 2010
Published: 19 February 2010
The Lipase Engineering Database (LED) integrates information on sequence, structure and function of lipases, esterases and related proteins with the α/β hydrolase fold. A new superfamily for Candida antarctica lipase A (CALA) was introduced including the recently published crystal structure of CALA. Since CALA has a highly divergent sequence in comparison to other α/β hydrolases, the Lipase Engineering Database was used to classify CALA in the frame of the already established classification system. This involved the comparison of CALA to similar structures as well as sequence-based comparisons against the content of the LED.
The new release 3.0 (December 2009) of the Lipase Engineering Database contains 24783 sequence entries for 18585 proteins as well as 656 experimentally determined protein structures, including the structure of CALA. In comparison to the previous release  with 4322 protein and 167 structure entries this update represents a significant increase in data volume. By comparing CALA to representative structures from all superfamilies, a structure from the deacetylase superfamily was found to be most similar to the structure of CALA. While the α/β hydrolase fold is conserved in both proteins, the major difference is found in the cap region. Sequence alignments between both proteins show a sequence similarity of only 15%. A multisequence alignment of both protein families was used to create hidden Markov models for the cap region of CALA and showed that the cap region of CALA is unique among all other proteins of the α/β hydrolase fold. By specifically comparing the substrate binding pocket of CALA to other binding pockets of α/β hydrolases, the binding pocket of Candida rugosa lipase was identified as being highly similar. This similarity also applied to the lid of Candida rugosa lipase in comparison to the potential lid of CALA.
The LED serves as a valuable tool for the systematic analysis of single proteins or protein families. The updated release 3.0 was used for the evaluation of α/β hydrolases. The HTML version of the database with new features is available at http://www.led.uni-stuttgart.de and provides sequences, structures and a set of analysis tools including phylogenetic trees and HMM profiles
Lipases (triacylglycerol hydrolases E.C. 126.96.36.199) are a versatile group of enzymes which catalyze the hydrolysis or synthesis of a broad range of water insoluble esters.
They belong to the class of α/β-hydrolases which also contains esterases, acetylcholinesterases, cutinases, carboxylesterases and epoxide hydrolases. Despite their high diversity in sequence and function, the α/β-hydrolases share a common architecture, the α/β-hydrolase fold  and conserved active site signatures, the GxSxG and GxDxG motifs [3, 4]. Two conserved features found in all α/β-hydrolases are the active site, consisting of the catalytic triad of S-D(E)-H, and the oxyanion hole. Depending on the amino acids involved in forming the oxyanion hole, the enzymes can be classified into three classes, the GGGX-, GX-, and the Y-class . The Lipase Engineering Database (LED)  is a resource of fully and consistently annotated superfamilies and homologous families of α/β hydrolases including multisequence alignments of all families. The curation and annotation process for the LED is supported by DWARF , an inhouse data warehouse system for protein families. The LED is accessible by a web interface at http://www.led.uni-stuttgart.de. It can be browsed on the level of families, organisms, or structures, and BLAST searches can be performed against all sequence entries.
Prominent members of the α/β hydrolases are the two lipases from Candida antarctica. Lipase B is a versatile and well characterized biocatalyst in many organic syntheses and biotransformations [6–8] and shows a low sequence similarity to other α/β hydrolases. The second lipase from Candida antarctica, lipase A (CALA), shows a number of unique biocatalytic properties among hydrolases, e.g. high thermostability and stability at acidic pH ranges and the acceptance of tertiary and sterically hindered alcohols . CALA also has a low sequence similarity to other members of the α/β hydrolase fold including lipase B. Therefore it was not included in previous versions of the LED. Only after its structure was recently determined , a detailed analysis of its structure identified CALA unambiguously as a member of the α/β hydrolase family. However, in this structure the active site is not accessible to a substrate, therefore the molecular details of substrate binding or the existence of a possible lid are still elusive.
Database content and layout
Release 3.0 of the Lipase Engineering Database (LED) contains 18585 proteins with 24783 sequence and 656 structure entries of which about 14000 protein and 489 structure entries are new. Six new homologous families and one new superfamily (the "Candida antarctica lipase A like" superfamily) have been added to the LED in the update process. Seed sequences for the new "Candida antarctica lipase A like" superfamily (LED identifier: abH38) included the sequence from the resolved crystal structure (gi: 160286179) and three sequences of homologous lipases from other organisms (Kurtzmanomyces sp. - gi: 20429169, Malassezia furfur - gi: 73765555, Ustilago maydis - gi: 71018653) which showed high sequence similarity to Candida antarctica lipase A (CALA) . Most of the sequences of the superfamily abH38 have already been assigned to a common protein family in other protein family databases. In the Pfam database  they are included in the LIP (PF03583) family, in the InterPro database  in the family IPR005152, and in the ESTHER  database in the Fungal-Bact_LIP family. Because we included only sequences of high sequence similarity to guarantee a good alignment of all sequences of individual families, especially of active site residues, our family abH38 contains less sequences than the respective protein families of the other databases. The four largest superfamilies in release 3.0 contain 50% of all proteins in the LED: The "Cytosolic Hydrolases" superfamily (LED identifier: abh08) with 3188 proteins, containing epoxide hydrolases and haloalkane dehalogenases, the "Carboxylesterases" superfamily (LED identifier: abh01) with 2998 proteins, containing a wide range of carboxylesterases, such as acetylcholine esterases and bile salt activated lipases, the "Moraxella lipase 2 like" superfamily (LED identifier: abh04) with 1781 proteins containing mainly lipases and carboxylesterases, and the "Microsomal Hydrolases" superfamily (LED identifier: abh09) with 1336 proteins, containing microsomal epoxide hydrolases and peptidases. The "Cytosolic Hydrolases" and "Microsomal Hydrolases" superfamilies (abh08 and abh09) belong to the GX-class of α/β hydrolases, the "Carboxylesterases" and "Moraxella lipase 2 like" superfamilies (abh01 and abh04) belong to the GGGX-class of α/β hydrolases.
Candida antarctica lipase A protein family
However, the binding site of CALA shows surprising similarity to another lipase, Candida rugosa lipase (CRL). For CRL, two different structural confirmations have been resolved, an open conformation (1CRL) , and a closed conformation (1TRH)  where the lid of CRL is blocking the substrate access to the active site. CRL has a cap region between β-strands 6 and 7, consisting of four α-helices (Figure 2). The substrate binding site of CRL consists of a long tunnel for the acyl moiety of the substrate and provides ample space for the alcohol moiety of the substrate (Figure 4). Despite having a lower overall structure similarity to CALA than the B. subtilis deacetylase, the binding sites of CALA and CRL are highly similar (Figure 4). Both provide space for large, bulky alcohol moieties of the substrate and have a tunnel like binding site for the acyl moiety. Both proteins posses a lid which covers the active site and prevents direct access to the substrate binding site in its closed state. The lid of CRL lipase is formed by a α-helix between β-strands 1 and 2 and is located in the N-terminal region while the putative lid in CALA is formed by the two C-terminal β-strands 9 and 10 (Figure 2).
The LED contains annotated and systematically classified protein families of α/β hydrolases. It has been shown to be a useful tool for the systematic analysis of protein families. Previous work employed the LED and BLAST in order to identify novel enzymes belonging to the α/β hydrolase fold [19, 20]. A model for the prediction of protein solubility was developed and refined by performing a comprehensive analysis of the protein families of the LED . A further study involved the systematic analysis of protein families of the LED in regard to the distribution and conservation of functionally relevant rare codons .
Since the first release of the LED , more than 14000 new α/β hydrolases became available and were integrated in the release 3.0. As a case study for the utility of the highly enriched and annotated database, the newly introduced superfamily of CALA was analysed and compared to other protein structures in the LED. The goal was to characterise the sequence and structure of CALA in comparison to other α/β hydrolases despite its low sequence similarity and to understand the molecular basis of substrate recognition.
While CALA shows structural similarity to the deacetylase family, the substrate specificity of both enzymes differs, which is consistent with the differences observed in the substrate binding sites of both proteins. In contrast, the lipase from C. rugosa (CRL), which shows a lower overall structural similarity to CALA, is remarkably similar in regard to the substrate binding site. The structural similarities and differences are in accordance with experimentally observed substrate specificities of the three enzymes. All three proteins have a spacious alcohol binding site. The B. subtilis deacetylase accepts a wide variety of bulky substrates like cephalosporin C and xylose . CALA and CRL also accept bulky substrates, ranging from primary alcohols to sterically hindered secondary alcohols and even tertiary alcohols [23, 24].
The tunnel like binding site of CALA allows the enzyme to accept esters of long chain fatty acids [23, 25]. The similar tunnel like acyl binding site of CRL also accepts fatty acids up to a chain length of 18 . In contrast, the small acyl binding site of the B. subtilis deacetylase is unable to accept large acyl groups and is restricted towards acetyl moieties . Experimentally, CALA and CRL have been shown to display interfacial activation [26, 27]. While a lid in CRL has been localized and the open and closed form of CRL has been crystallized ([17, 18]), the lid function of the β-strands 9-10 in CALA remains to be experimentally verified. However, the similarities to CRL suggest a substrate access involving the movement of β-strands 9-10.
The analysis of the newly introduced protein family of Candida antarctica lipase A demonstrates the strength of our database approach by providing a large set of protein families which share a common protein fold despite an overall low sequence similarity. By combining both, structural and sequential information of a large number of proteins a thorough analysis and classification of proteins of interest is made possible. The Lipase Engineering Database (LED) is online accessible at http://www.led.uni-stuttgart.de. All information on families of sequence and structure data, as well as alignments, phylogenetic trees, and family-specific profiles can be accessed by manual download.
Structural comparisons and alignments
Comparison of structures were carried out using DALI . The structure of CALA was compared against 28 representative structures from all superfamilies. To identify the most closely related superfamilies, only structures which could be aligned to more than 50% of the residues of CALA were considered. Structural alignments of proteins were performed by STAMP .
For superfamilies which share a close structural relationship but a low overall sequence identity, a two step strategy was used in order to obtain a more significant multisequence alignment. First, a multisequence alignment for each of the two superfamilies was carried out separately. Then a structural alignment, between reference structures from each protein family was performed using STAMP . The multisequence alignments where then aligned against the structure from their respective protein family.
Multisequence alignments for all protein families were generated using ClustalW  with a gap opening and extension penalties of 10 and 0.2, respectively. Hidden Markov models were created using HMMER .
The implemented data model is based on Firebird  and is based on the previously published  data model (Figure S1, Additional file 1). Protein families are organised on the level of homologous families and superfamilies based on their sequence similarity. The database is updated by an automated Perl  script. It performs a BLAST  search against the current version of the non-redundant sequence database an NCBI  for each sequence entry with an E-value cut-off of 10-50. Crystal structure information referring to new sequence entries is updated as well. New sequence and structure entries are assigned to homologous families and superfamilies based on sequence similarity. New families which consisted of only one putative protein entry where not included. Annotation information of residues is either taken directly from the according GenBank entry or is transferred to new sequences using the DWARF graphical user interface. Annotation information is then transferred to the newly integrated sequences.
We acknowledge the valuable contribution of Robert Radloff for help in the annotation process and of Florian Wagner for the programming of the dynamic user interface. The work was carried out in the framework of the IP-project 'Sustainable Microbial and Biocatalytic Production of Advanced Functional Materials' (BIOPRODUCTION/NMP-2-CT-2007-026515) funded by the European Commission.
- Fischer M, Thai QK, Grieb M, Pleiss J: DWARF--a data warehouse system for analyzing protein families. BMC Bioinformatics. 2006, 7: 495-10.1186/1471-2105-7-495.PubMed CentralPubMedView Article
- Ollis DL, Cheah E, Cygler M, Dijkstra B, Frolow F, Franken SM, Harel M, Remington SJ, Silman I, Schrag J, Sussman J, Verschueren KH, Goldman A: The alpha/beta hydrolase fold. Protein Eng. 1992, 5: 197-211. 10.1093/protein/5.3.197.PubMedView Article
- Pleiss J, Fischer M, Peiker M, Thiele C, Schmid RD: Lipase engineering database - Understanding and exploiting sequence-structure-function relationships. Journal of Molecular Catalysis B-Enzymatic. 2000, 10: 491-508. 10.1016/S1381-1177(00)00092-8.View Article
- Barth S, Fischer M, Schmid RD, Pleiss J: The database of epoxide hydrolases and haloalkane dehalogenases: one structure, many functions. Bioinformatics. 2004, 20: 2845-2847. 10.1093/bioinformatics/bth284.PubMedView Article
- Fischer M, Pleiss J: The Lipase Engineering Database: a navigation and analysis tool for protein families. Nucleic Acids Res. 2003, 31: 319-321. 10.1093/nar/gkg015.PubMed CentralPubMedView Article
- Gotor-Fernandez V, Busto E, Gotor V: Candida antarctica lipase B: An ideal biocatalyst for the preparation of nitrogenated organic compounds. Advanced Synthesis & Catalysis. 2006, 348: 797-812.View Article
- Orrenius C, Ohrner N, Rotticci D, Mattson A, Hult K, Norin T: Candida-Antarctica Lipase-B Catalyzed Kinetic Resolutions - Substrate Structure Requirements for the Preparation of Enantiomerically Enriched Secondary Alcanols. Tetrahedron-Asymmetry. 1995, 6: 1217-1220. 10.1016/0957-4166(95)00147-H.View Article
- Orrenius C, Norin T, Hult K, Carrea G: The Candida antarctica lipase B catalysed kinetic resolution of seudenol in non-aqueous media of controlled water activity. Tetrahedron-Asymmetry. 1995, 6: 3023-3030. 10.1016/0957-4166(95)00399-1.View Article
- de Maria PD, Carboni-Oerlemans C, Tuin B, Bargeman G, Meer van der A, van Gemert R: Biotechnological applications of Candida antarctica lipase A: State-of-the-art. Journal of Molecular Catalysis B-Enzymatic. 2005, 37: 36-46. 10.1016/j.molcatb.2005.09.001.View Article
- Ericsson DJ, Kasrayan A, Johanssonl P, Bergfors T, Sandstrom AG, Backvall JE, Mowbray SL: X-ray structure of Candida antarctica lipase a shows A novel lid structure and a likely mode of interfacial activation. Journal of Molecular Biology. 2008, 376: 109-119. 10.1016/j.jmb.2007.10.079.PubMedView Article
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2008, 36: D281-288. 10.1093/nar/gkm960.PubMed CentralPubMedView Article
- Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF: InterPro: the integrative protein signature database. Nucleic Acids Res. 2009, 37: D211-215. 10.1093/nar/gkn785.PubMed CentralPubMedView Article
- Hotelier T, Renault L, Cousin X, Negre V, Marchot P, Chatonnet A: ESTHER, the database of the alpha/beta-hydrolase fold superfamily of proteins. Nucleic Acids Res. 2004, 32: D145-147. 10.1093/nar/gkh141.PubMed CentralPubMedView Article
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW: GenBank. Nucleic Acids Res. 2009, 37: D26-31. 10.1093/nar/gkn723.PubMed CentralPubMedView Article
- Wei Y, Contreras JA, Sheffield P, Osterlund T, Derewenda U, Kneusel RE, Matern U, Holm C, Derewenda ZS: Crystal structure of brefeldin A esterase, a bacterial homolog of the mammalian hormone-sensitive lipase. Nat Struct Biol. 1999, 6: 340-345. 10.1038/7576.PubMedView Article
- Vincent F, Charnock SJ, Verschueren KH, Turkenburg JP, Scott DJ, Offen WA, Roberts S, Pell G, Gilbert HJ, Davies GJ, Brannigan JA: Multifunctional xylooligosaccharide/cephalosporin C deacetylase revealed by the hexameric structure of the Bacillus subtilis enzyme at 1.9A resolution. J Mol Biol. 2003, 330: 593-606. 10.1016/S0022-2836(03)00632-6.PubMedView Article
- Grochulski P, Li YG, Schrag JD, Bouthillier F, Smith P, Harrison D, Rubin B, Cygler M: Insights into Interfacial Activation from an Open Structure of Candida-Rugosa Lipase. Journal of Biological Chemistry. 1993, 268: 12843-12847.PubMed
- Grochulski P, Li Y, Schrag JD, Cygler M: Two conformational states of Candida rugosa lipase. Protein Sci. 1994, 3: 82-91.PubMed CentralPubMedView Article
- Lammle K, Zipper H, Breuer M, Hauer B, Buta C, Brunner H, Rupp S: Identification of novel enzymes with different hydrolytic activities by metagenome expression cloning. J Biotechnol. 2007, 127: 575-592. 10.1016/j.jbiotec.2006.07.036.PubMedView Article
- Kim EY, Oh KH, Lee MH, Kang CH, Oh TK, Yoon JH: Novel cold-adapted alkaline lipase from an intertidal flat metagenome and proposal for a new family of bacterial lipases. Appl Environ Microbiol. 2009, 75: 257-260. 10.1128/AEM.01400-08.PubMed CentralPubMedView Article
- Koschorreck M, Fischer M, Barth S, Pleiss J: How to find soluble proteins: a comprehensive analysis of alpha/beta hydrolases for recombinant expression in E. coli. BMC Genomics. 2005, 6: 49-10.1186/1471-2164-6-49.PubMed CentralPubMedView Article
- Widmann M, Clairo M, Dippon J, Pleiss J: Analysis of the distribution of functionally relevant rare codons. BMC Genomics. 2008, 9: 207-10.1186/1471-2164-9-207.PubMed CentralPubMedView Article
- Kirk O, Christensen MW: Lipases from Candida antarctica: Unique Biocatalysts from a Unique Origin. Org Proc Res. 2002, 6: 446-451. 10.1021/op0200165.View Article
- Akoh CC, Lee GC, Shaw JF: Protein engineering and applications of Candida rugosa lipase isoforms. Lipids. 2004, 39: 513-526. 10.1007/s11745-004-1258-7.PubMedView Article
- Pfeffer J, Richter S, Nieveler J, Hansen CE, Rhlid RB, Schmid RD, Rusnak M: High yield expression of Lipase A from Candida antarctica in the methylotrophic yeast Pichia pastoris and its purification and characterisation. Applied Microbiology and Biotechnology. 2006, 72: 931-938. 10.1007/s00253-006-0400-z.PubMedView Article
- Martinelle M, Holmquist M, Hult K: On the interfacial activation of Candida antarctica lipase A and B as compared with Humicola lanuginosa lipase. Biochim Biophys Acta. 1995, 1258: 272-276.PubMedView Article
- Grochulski P, Li Y, Schrag JD, Bouthillier F, Smith P, Harrison D, Rubin B, Cygler M: Insights into interfacial activation from an open structure of Candida rugosa lipase. J Biol Chem. 1993, 268: 12843-12847.PubMed
- Holm L, Sander C: Dali: a network tool for protein structure comparison. Trends Biochem Sci. 1995, 20: 478-480. 10.1016/S0968-0004(00)89105-7.PubMedView Article
- Russell RB, Barton GJ: Multiple Protein-Sequence Alignment from Tertiary Structure Comparison - Assignment of Global and Residue Confidence Levels. Proteins-Structure Function and Genetics. 1992, 14: 309-323. 10.1002/prot.340140216.View Article
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralPubMedView Article
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.PubMedView Article
- Firebird. [http://sourceforge.net/projects/firebird]
- PERL. [http://www.perl.org/]
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.