Construction and analysis of a plant non-specific lipid transfer protein database (nsLTPDB)

Background Plant non-specific lipid transfer proteins (nsLTPs) are small and basic proteins. Recently, nsLTPs have been reported involved in many physiological functions such as mediating phospholipid transfer, participating in plant defence activity against bacterial and fungal pathogens, and enhancing cell wall extension in tobacco. However, the lipid transfer mechanism of nsLTPs is still unclear, and comprehensive information of nsLTPs is difficult to obtain. Methods In this study, we identified 595 nsLTPs from 121 different species and constructed an nsLTPs database -- nsLTPDB -- which comprises the sequence information, structures, relevant literatures, and biological data of all plant nsLTPs http://nsltpdb.life.nthu.edu.tw/. Results Meanwhile, bioinformatics and statistics methods were implemented to develop a classification method for nsLTPs based on the patterns of the eight highly-conserved cysteine residues, and to suggest strict Prosite-styled patterns for Type I and Type II nsLTPs. The pattern of Type I is C X2 V X5-7 C [V, L, I] × Y [L, A, V] X8-13 CC × G X12 D × [Q, K, R] X2 CXC X16-21 P X2 C X13-15C, and that of Type II is C X4 L X2 C X9-11 P [S, T] X2 CC X5 Q X2-4 C[L, F]C X2 [A, L, I] × [D, N] P X10-12 [K, R] X4-5 C X3-4 P X0-2 C. Moreover, we referred the Prosite-styled patterns to the experimental mutagenesis data that previously established by our group, and found that the residues with higher conservation played an important role in the structural stability or lipid binding ability of nsLTPs. Conclusions Taken together, this research has suggested potential residues that might be essential to modulate the structural and functional properties of plant nsLTPs. Finally, we proposed some biologically important sites of the nsLTPs, which are described by using a new Prosite-styled pattern that we defined.


Background
Lipids are hydrocarbons insoluble in water but soluble in organic solvents. They are commonly translocated among subcellular membranes to enable various metabolic activities [1]. Lipid transfer proteins (LTPs) have been found in animals, plants and some fungi, and they exist in many tissues with various sizes and functions [1][2][3]. LTPs play an important role not only in plant but also in human. It mediates in vitro the transfer of all common phospholipids, cholesterol and gangliosides between membranes [4,5]. The term plant "nonspecific lipid transfer proteins" indicates that LTPs can associate with various phospholipids with broad specificity [6]. The first known plant lipid transfer protein was isolated from potato tuber in 1975 by Kadar [7]. At present, much more nsLTPs have been found in monocots, dicots and gymnosperms, etc [3]. Plant nsLTPs are a kind of small (usually 6.5 to 10.5 kDa), basic (isoelectric point, or pI, usually falls between 8.5 and 12) and stable (with four conserved disulfide bonds) proteins. They can be isolated from various plants, e.g., Arabidopsis, rice, barely, wheat, maize, caster bean, and spinach leaf [8][9][10][11][12][13][14][15]. All the identified nsLTPs show high homology in protein sequence and share similar characteristics. NsLTPs are stabilized by eight conserved cysteine residues forming four disulfide bonds and they usually contain signal peptides in the N-terminus [1]. Previous studies showed that nsLTPs can be divided into two main groups according to their molecular weight: nsLTP1 (9 kDa) and nsLTP2 (7 kDa) [16]. These two groups exhibit different disulfide bond patterns. The disulfide bond linkages of nsLTP1 at Cys 1 -Cys 6 and Cys 5 -Cys 8 differ from those of nsLTP2 at Cys 1 -Cys 5 and Cys 6 -Cys 8 . The major difference is observed at the C 6 -X-C 8 motif. For the CXC motif in nsLTP1, × is a hydrophilic residue, for example asparagine; however, in nsLTP2, a hydrophobic residue, such as leucine or phenylalanine, was found at the × position. These conserved hydrophilic or hydrophobic residues may play important roles in the biological functions of nsLTPs [17].
Several plant nsLTP structures have been determined. Three dimensional structures of either ligand-free and ligand-bound forms of nsLTPs are available [9,10,12,18]. The structure of nsLTP1 is composed of four alpha helices and a flexible stretching C-terminus [19,20]. The four alpha helices are connected by flexible loops and stabilized by the four disulfide bonds [10,12,18]. A typical characteristic of nsLTPs is the existence of an internal hydrophobic cavity running through the molecule. The cavity allows the binding of one or two monoacyl lipids, diacylated lipids, or some hydrophobic molecules [18,21]. The hydrophobic cavity in nsLTP1 shows a tunnel-like conformation, and nsLTP2 exhibits a triangular conformation [19]. The major structural difference between nsLTP1 and nsLTP2 is the size of the hydrophobic cavity; the cavity of an nsLTP1 is usually larger than that of an nsLTP2 protein [10,18,22]. In recent years, increasing studies have reported that plant nsLTPs are involved in many crucial biological functions but the mechanisms responsible for these functions are unclear yet.
Several biological functions of plant nsLTPs have been identified, inclusive of mediating phospholipid transfer, involving in plant defence activity against bacterial and fungal pathogens, and participating in the assembly of hydrophobic protective layers of surface polymers such as the formation of cutin [23,24]. NsLTPs were found accumulated at the surface of certain tissues at a high concentration [25], which may be correlated with the adaptation to different environmental stresses [26]. Several studies pointed out that the expression of nsLTPs can be induced by environmental stresses like extreme temperatures, osmosis pressures and drought [27]. Furthermore, nsLTPs exhibit defence activities toward blight or pathogens because of their high thermal stability and resistance to proteases [23,28]. In addition, nsLTPs are involved in the formation of beer foam [29] and in food allergy to processed fruits [30]. Jeroen Nieuwland et al. postulated that nsLTPs can associate with hydrophobic cell wall compounds and disrupt the cell wall or facilitate the extension of cell wall [31]. These features of nsLTPs suggest that their functions are very diverse, and these features may exist because of their ability to bind and/or carry hydrophobic molecules such as fatty acid or fatty acid derivatives [25].
There is no golden standard for the identification and classification of nsLTPs because of their unclear lipid transfer mechanisms and the insufficiency of publicly available data. In the last twenty years, nsLTPs were mainly categorized into two subfamilies based on their molecular weights, nsLTP1 (~9 kDa) and nsLTP2 (~7 kDa) [3]. Nevertheless, this method is inadequate for categorizing many newly identified nsLTPs [7]. In 2008, Boutrot et al. proposed a new classification for nsLTPs using the putative mature form of rice, wheat, and Arabidopsis thaliana. The authors divided nsLTPs into nine types (from I to IX) according to their sequence similarities (see Additional File 1) [32]. Some recent papers minorly modified this classification system using a very limited number of sequences (see Additional File 2) [33,34].
Plant non-specific lipid transfer proteins are one of the most well-known proteins that are widely distributed in the plant kingdom. Our wet-lab laboratory has been studying nsLTPs for years, but there is still much unknown space left about these sequence highly-diverse proteins. Importantly, there is no nsLTPs database systematically collecting and organizing relevant data about nsLTPs. Boutrot et al. had identified and classified 267 nsLTPs sequences in 2008 [32], but their method still failed to classify many nsLTPs (see Table 1) [32]. This works aimed to establish an nsLTPs database, develop a robust classification method for nsLTPs and formulate Prosite signature patterns for the identification of nsLTPs as well as the key residues for the structural stability or the lipid binding ability of nsLTPs.

Methods
Databases and web-based tools utilized NCBI http://www.ncbi.nlm.nih.gov/ The National Center for Biotechnology Information provides many public databases and tools relating to biotechnology. First, we established a non-redundant protein sequence dataset by retrieving data from NCBI RefSeq and Genbank; then, we used BLAST [35] to search for homologous sequences for each sequence in our dataset.
ExPASY http://expasy.org/ The ExPASy (Expert Protein Analysis System) database is established by Swiss Institute of Bioinformatics (SIB) and European Bioinformatics Institute (EBI), such as Swiss-Prot, UniProtKB and TrEMBL. Swiss-Prot and TrEMBL provide many information relating to protein sequence, structure and function (e.g., domains structure, post-translational modifications, variants) [36].
TARGETP http://www.cbs.dtu.dk/services/TargetP The TargetP 1.1 Server is a web based tool to predict the subcellular location of eukaryotic proteins [37].
SignalP 3.0 Server http://www.cbs.dtu.dk/services/SignalP/ The SignalP 3.0 Server is a sequence prediction server that allows user to submit the sequence query and receive the result about presence and location of signal peptide cleavage sites in amino acid sequences from different organisms. In this study, all identified nsLTPs were analyzed for presence of potential signal peptide cleavage sites by using this tool. After removing the signal peptide of all nsTPs in our database, we got putative mature-form nsLTPs sequences. Each putative mature nsLTP sequence was validated through the analysis of the 8-cysteine residue motif (8-Cys motif): Cys 1 -Xn-Cys 2 -Xn-Cys 3 Cys 4 -Xn-Cys 5 XCys 6 -Xn-Cys 7 -Xn-Cys 8 After removing proteins improbable to be nsLTPs, we identified 1,395 putative nsLTP sequences. Then we constructed a database and a web-based user interface collecting all these putative nsLTPs and relevant information. Additionally, in order to make our results more reliable, we deleted any redundant sequences with 100% sequence identities and finally got 595 putative nsLTP sequences; these sequences were employed for subsequent protein analysis and evolutionary study.
Protein Data Bank http://www.pdb.org/ The protein structure files utilized in this work were obtained from the Protein Data Bank (PDB) [38].
Prosite database ftp://ftp.expasy.org/databases/prosite/ The Prosite database is a collection of annotated motif descriptors from protein families and domains [39][40][41][42]. These descriptors, or patterns, are extracted from SWISS-PROT protein databases. Each pattern is recorded with two files: PROSITE.DAT is a computer readable text file providing all information necessary to programs that will scan sequences with patterns and/or matrices, and PRO-SITE.doc contains textual information and the documentation of patterns listed in PROSITE.DAT.
The version of Prosite we used was 20.8. After careful tests, we found that, although this version of Prosite possess 1,331 patterns, few of them are related with nsLTPs and most mature nsLTP sequences could not be recognized by those patterns. In this study, we examined the eight well-conserved cysteine region of collected nsLTPs and finally proposed new Prosite-styled patterns for Type I and Type II nsLTPs.

Data mining and Hidden Markov model
The standalone BLAST (version 2.2.17) [33] was utilized as the search engine, by using which we searched all known plant nsLTP sequences against the SwissProt protein sequence database. For every known nsLTP, homologous sequences from plant organisms with sequence identities >15% were considered as candidate nsLTP sequences. Then, candidates without 8-Cys motif were filtered out. After further removing redundant homologous sequences with 100% sequence identities, we manually examined every remaining candidate nsLTP sequence and thus identified 595 nsLTPs.

Sequence alignment and phylogenetic tree reconstruction
To examine the phylogenetic relationships of the nsLTPs identified in this study, we used ClustalW (version 2.0.12), a well-known multiple sequence alignment method, to obtain all the pairwise sequence similarities between nsLTP sequences. After refining the alignment results manually, we utilized the PHYLIP package v3.67 [43] to construct the phylogenetic tree of nsLTPs by using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) [44] and the neighbor-joining [45] clustering methods. Finally, MEGA4 [46] and Dendroscope [47] software packages were recruited to draw the tree graphs. In the web interface of our database, BioEdit (v7.0.9.0) was also utilized to compute amino acid identities and visualize sequence alignments.

Classification based on sequence similarities
In this study, we characterized 595 nsLTPs. The presence of signal peptide for each protein was predicted by using the SignalP 3.0 program, and we found that 98% of the nsLTP precursors were initially synthesized with a signal peptide of 7-49 amino acids. The main characteristic of plant nsLTPs was the presence of eight cysteine residues at highly conserved positions, the spanning of which forms a common sequence pattern: C-Xn-C-Xn-CC-Xn-CXC-Xn-CXn-C (8-Cys motif).
This 8-Cys motif is consensus in nsLTPs, but it could not be used to classify nsLTPs. To classify nsLTPs, we modified Boutrot's nine-type classification into a fivetype system (see Table 2 and Table 3). After analyzing the classified nsLTPs, we found that (1) Types I and II are shared by all the species that we identified to possess nsLTPs; (2) Type III is only found in Oryza sativa and Arabidopsis; (3) Types IV and V are shared by Triticum aestivum, Oryza sativa, Sorghum bicolor and Arabidopsis.
After making the above classification, we further analyzed the pI (isoelectric point) values, Mw (molecular weight) values, charges and the CXC motifs of all available nsLTPs. As shown in Additional File 3A, Type I and Type III were mostly 9 kDa proteins and Type II nsLTPs were 7 kDa proteins; the Mw of Type IV and Type V was much higher than that of Types I-III. Judging from the pI values, Types I, II and III are mostly alkaline proteins. Type IV nsLTPs are weakly alkaline and most Type V nsLTPs are acidic (see Additional File 3B). As for the CXC motif, most residues at the × position in Type I nsLTPs were hydrophilic, while in Type II, III, IV, and V nsLTPs, the × position is usually occupied by a hydrophobic residue (see Additional File 3C). There is no obvious difference in the distribution of net charge among all types of nsLTP (see Additional File 3D).

Phylogenetic analysis of nsLTPs
In order to analyze the phylogenetic relationships of the nsLTP families, we performed multiple sequence alignments for mature-form nsLTP sequences by using the ClustalW program. Unrooted phylogenetic trees were generated with the UPGMA clustering method implemented in the Phylip package. Based on the number of residues that intervene the eight conserved cysteine residues, the 595 nsLTPs were clustered into 5 different groups. The results of our phylogentic analyses supported our classification results. As shown in Additional File 4 and 5, the 5 types of nsLTPs could be fully separated in the phylogenetic trees.

Strategies for defining new Prosite-styled patterns for nsLTPs
Functional sites of proteins collected in the Prosite database are expressed as regular expressions. By querying Prosite with the nsLTP sequences that have assigned Uniprot IDs, we noticed that many (i.e. 86 cases) of them shared in common the pattern PS00597, that is, However, this regular expression pattern failed to recognize most of the other nsLTP sequences we collected. Therefore, we would like to define new Prositestyled patterns that are feasible to identify a broad scope of nsLTPs. We have previously observed, in our multiple sequence alignment results, that several positions in the nsLTP sequence are moderately or even highly conserved. Here we computed the occurrence of 20 amino acids at every position. Notably, in both nsLTP1 and nsLTP2, we found that some positions are always occupied by the same amino acids or amino acids with the same physiochemical properties. Two new patterns for nsLTP Type I and Type II were then defined according to the amino acid occurrence at various positions; they are available in Figure 1. Type III, IV and V were omitted in this experiment because of the small number of cases.

The mungbean nsLTP1
To evaluate our Prosite-styled pattern for the Type I nsLTPs, we referred to some mutagenesis experimental data and computational results. By using alanine scanning, we have previously identified that Asn9, Leu10, Cys13, Leu17, Leu35, Arg44, Val47, Ala66, Leu69, and Tyr79 are important to the lipid transfer activity of the LTP1 from mungbean because alanine substitutions at these residues increased the lipid transfer activity ( Figure  2) [48]. For Leu10, Val31, Ile34, Arg44, Leu51, Leu69 and Val75, which are located in the hydrophobic cavity, this might be because the substituting alanine decreased the hydrophobic stack of the cavity and thus make the structure slightly loosed, creating more space to accommodate the lipid molecules ( Figure 2A) Note that most of the conserved residues are hydrophobic. We supposed that these highly conserved residues may play important roles in nsLTPs.

The rice nsLTP2
We have used the protein-protein docking model (Autodock) [49] to investigate the importance of the conserved hydrophobic residues, e.g., Leu8, Phe36, Phe39, Tyr45, Tyr48 and Val49, around the binding cavity of rice nsLTP2 ( Figure 3A) [50]. The results indicated that changing a single residue of Leu8, Phe36, or Val49 to alanine was sufficient to destroy the integrity of the cavity. Other mutant proteins (i.e., F39A, Y45A, and Y48A) typically had native-like structure but were less stabilized compared with the wild type nsLTP2 ( Figure 3B and 3C). According to our Prosite-styled pattern for the Type II nsLTPs, to which the rice LTP2 belongs, the sequence of these structurally important residues are highly conserved. Leu] at position 45 is also high (75%). Thus, residues with higher conservation may play an important role in structural stability or lipid binding ability. These results revealed that our Prosite-styled patterns can provide potential residues that are important to the structural and functional properties of plant nsLTPs. Interestingly, we also noticed that, in the structures of the Type I and Type II nsLTPs, there are several highly conserved (the occurrence frequency of the major amino acids > 90%) positions never studied in previous researches (see Figure 4), and most of these residues are located in alpha-helices or close to the binding cavity for lipids. We supposed that these residues may be good targets for future studies on nsLTPs.

The nsLTPDB
This database is composed of a web-based user interface collecting all nsLTPs sequence and related information ( Figure 5). Our database was made up of five parts including Homepage, Species Browsing, Structure Browsing, Related References, and some useful tools. In our database, there are currently 1,395 putative non-specific lipid transfer protein sequences and 32 PDB structures. Each part is easily accessible by clicking on the hyperlink shown at the left side of the browser window. In addition, the webbased molecular viewer Chem3D http://accelrys.com/products/informatics/cheminformatics/chime/no-fee.php is provided to display the protein conformation. This program allows users to view and manipulate images of molecules structure in three dimensions.

1.
We have constructed an nsLTPs database, which provides the information of sequences, structures, literatures as well as biological data of all plant nsLTPs http://nsltpdb.life.nthu.edu.tw/. There are 595 nsLTPs contained in this database. 2. The phylogenetic tree of the identified nsLTPs was constructed using UPGMA and neighbor-joining clustering algorithms. 3. The 595 nsLTPs were clustered into five different types based on the sequence similarity matrix of them and the properties of their 8-cysteine motifs. 4. We compared the Prosite results with experimental mutagenesis data and found that highly conserved residues in the nsLTP sequence may play an    important role in structural stability and/or lipid binding ability of nsLTP. 5. We created Prosite-styled patterns for nsLTPs, which are supposed useful for future identifications and studies of nsLTPs.