Protein intrinsic disorder toolbox for comparative analysis of viral proteins
© Goh et al; licensee BioMed Central Ltd. 2008
Published: 16 September 2008
Skip to main content
© Goh et al; licensee BioMed Central Ltd. 2008
Published: 16 September 2008
To examine the usefulness of protein disorder predictions as a tool for the comparative analysis of viral proteins, a relational database has been constructed. The database includes proteins from influenza A and HIV-related viruses. Annotations include viral protein sequence, disorder prediction, structure, and function. Location of each protein within a virion, if known, is also denoted. Our analysis reveals a clear relationship between proximity to the RNA core and the percentage of predicted disordered residues for a set of influenza A virus proteins.
Neuraminidases (NA) and hemagglutinin (HA) of major influenza A pandemics tend to pair in such a way that both proteins tend to be either ordered-ordered or disordered-disordered by prediction. This may be the result of these proteins evolving from being lipid-associated. High abundance of intrinsic disorder in envelope and matrix proteins from HIV-related viruses likely represents a mechanism where HIV virions can escape immune response despite the availability of antibodies for the HIV-related proteins. This exercise provides an example showing how the combined use of intrinsic disorder predictions and relational databases provides an improved understanding of the functional and structural behaviour of viral proteins.
Structures and functions of a large number of viral proteins are not yet totally understood [1–5]. This may account for the continuous need for the development of novel computational and experimental tools suitable for the viral protein analysis. Although experimental techniques remain the major providers of structural and functional knowledge, often, the experiments are expensive or difficult to the point of infeasibility. The use of various bioinformatics tools to predict structure and function represents an alternative approach that is gaining significant attention. Comparative computational studies have opened a new way for easier benchmarking and functional analysis of proteins. Here we examine the usefulness of intrinsic disorder predictions for studying the viral proteins. To this end, a set of biocomputing tools that include relational database design and utilization of disorder prediction algorithms was elaborated.
Two families of RNA viruses, the Lentivirinae (HIV) and the Orthomyxoviridae (Influenza), were used in this comparative study. These viral families were selected because they are widely studied due to their involvement in major outbreaks during the last century [5, 6]. The Lentiviruses include the HIV and the SIV viruses among others , whereas the orthomyxoviruses encompass mainly the various influenza viruses .
HIV is also an enveloped virus. Figure 1B represents a model of its virion. The surface of the HIV virion is the viral envelope made of the cellular membrane, which is acquired when the virus leaves the host cell. Protruding from the envelope is the viral glycoprotein, gp160, which is made up of two component parts, the structural unit (SU), gp120, and the transmembrane (TM), gp41. These two surface proteins play important roles in attachment and penetration of HIV into target cells. Inside the lipid envelope, there is a matrix formed by Gag protein p17, which holds the RNA-containing core in place. This cylindrical core is a proteinaceous capsid made of p24 protein. The capsid contains two copies of the single-stranded RNA genome and three key enzymes: protease, PR (p11); integrase, IN (p32); and reverse transcriptase RT (p66), as well as some other proteins.
Major HIV proteins
Function of the Protein
Binding of host's CD4 to itself
Involved in fusion with host
Transmembrane (TM) at the Virion Envelope
Beneath the Enevelope
Nucleus transportation of viral proteins
Within capsid and beneath matrix
Main core protein
Within the Core
Within th Core
Within the Core
Major influenza A virus proteins
Hemagglutinin allows the attachhment of host's CD4 to itself
Neuraminidase cleaves sialic acud group to allow virion release into the extracellular region.
Transmembrane (TM) at the Virion Envelope
MA (M1, M2)
Protein assembly with membrane binding and disassociation
Matrix: Beneath the Enevelope
Non-Structural Protein. Inhibits RNA splicing. RNA binding
Within capsid and beneath matrix
Transportation of RNPb to cytoplasm
Main core protein
Binds to Nucleocapsid
Table 1 shows that proteins similarly located within the virions of different viral types possess significant functional similarities . For example, similar functions can be seen in the surface proteins (gp120, HA, NA) in both influenza A and HIV viruses. Although Table 1 lists major functions for several proteins, it is important to remember that some of the functions are not totally understood or are not known at all . Multi-functionality of a protein is, of course, also possible.
Many proteins are intrinsically disordered; i.e., they lack rigid 3-D structure under physiological conditions in vitro, existing instead as dynamic ensembles of interconverting structures. Intrinsically disordered proteins  are also known by several other names including "intrinsically unstructured"  and "natively unfolded." [14–16] While the function of a given protein is often determined by its unique structure, comparative studies on several exceptions to the structure-to-function mechanism led to the realizations that intrinsically disordered proteins share many sequence characteristics and so comprise a distinct cohort. These intrinsically unstructured proteins and regions differ from structured globular proteins and domains with regard to many attributes, including amino acid composition, sequence complexity, hydrophobicity, charge, flexibility [12, 15], and type and rate of amino acid substitutions over evolutionary time . Many of these differences between ordered and intrinsically disordered proteins were utilized to develop numerous disorder predictors. The disorder predictors used in this paper are PONDR®s (Predictors of Naturally Disordered Regions) VLXT and VL3 [18–21]. We utilized these predictors to address the following question: Can disorder prediction be used to determine or map at least some the functions for viral proteins?
Abundance of intrinsic disorder in various datasets.
% Predicted Disordered X-raya
% Predicted Disordered NMR
% Predicted Disordered
24 ± 2 (14 ± 2)
34 ± 2 (32 ± 1)
24 ± 2 (15 ± 1)
27 ± 2 (16 ± 1)
50 ± 3 (41 ± 3)
34 ± 2 (19 ± 2)
21 ± 2 (10 ± 2)
34 ± 3 (40 ± 3)
21 ± 2 (11 ± 2)
Summary of the predicted disorder rates in HIV proteins.
2i60.pdb, Subunit G
2cmr.pdb, Subunit: S
1jek.pdb, Subunit: A
1hiw.pdb, Subunit: A
1ceu.pdb, Subunit: A
2goh.pdb, Subunit: A
1tcw.pdb, Subunit A
2hnz.pdb, Sunbunt: A
1k6y.pdb, Subunit: A
1avz.pdb, Subunit: A
1jfw.pdb, Subunit: A
Summary of the predicted disorder rates in influenza viral protein.
1ruz.pdb, Subunit H
1mqn.pdb, Subunit: A
1mqn.pdb Subunit: B
1ea3.pdb, Subunit: A
1xeq.pdb, Subunit: A
1pd3.pdb, Subunit: A
2hn8.pdb, Subunit: A
2iqh.pdb, Subunit: A
Predicted intrinsic disorder in surface proteins of influenza virus.
Disorder prediction in various NA subtypes
2hu0.pdb, Subunit A
2f10.pdb, Subunit: A
2htw.pdb, Subunit A
2w20.pdb, Subunit: A
2htr.pdb, Subunit: A
1jsn.pdb, Subunit S
Disorder predictions in various HA subtypes
1ruz.pdb, Subunit H
1mqn.pdb, Subunit: A
1mqn.pdb, Subunit: B
2ibx.pdb, Subunit: A
2ibx.pdb Subunit: B
1ti8.pdb, Subunit: A
1ti8.Pdb, Subunit:: B
RNA Binding Proteins
DNA Binding Protein
Enzymes (e.g. Proteases, Ribonucleases)
Non-Multiply Spanning Membrane Proteins
Transmembrane Proteins (e.g Pores)
There is an interesting correlation between the percentage disorder rate and the protein localization within the virion. This phenomenon is especially clear for influenza virus (see Figure 2), where the closer the protein gets to the core, then higher the level of predicted disorder. This can easily be explained by the increased likelihood of colocalization of the RNA-binding proteins and the genomic RNA in the viral core. This trend is also seen for HIV proteins with the exception for the several enzymes, which are located in the close proximity to the core (see Figure 2). But of course enzymes need to be structured so the active site can provide a catalytic surface, and so this result is entirely consistent with previous work [24–26].
Data in Tables 3 – 4 and Figure 2 show that the surface proteins are generally predicted to be ordered in both the influenza and HIV viruses. HA is crucial in the entry of the virus to the host. A cleavage at the disulfide bond between subunits, HA1 and HA2, has to occur before the viral entry can take place. The function of HA can be compared to that of gp120, as gp120 is also known to play a crucial role in mediating the entry of the virus into the host cell [1, 27]. Neuraminidases, on the other hand, play enzymatic role at the other end of the viral process. This protein cleaves the sialic acid group from the oligosaccharide portion. This cleavage step is needed for the virion to be released into the extracellular region .
According to our analysis, gp120, like HA, is predicted to be quite ordered. There are however, observed differences in the prediction results for these two proteins: gp120 was consistently predicted to be ordered, whereas the levels of predicted order in influenza HA varied with the viral subtype. A summary of this can be found in Table 5. It is also should be noted that both HA (HA2) and NA are transmembrane proteins [28, 29].
In HIV, a protein that spans the lipid membrane is the transmembrane (TM) protein, gp41. This protein acts as a fusion protein  and functions in membrane interactions. This integral membrane protein contains a TM anchor domain that holds this envelope protein in association with the lipid bilayer . This TM protein is responsible for the fusion of the viral and cellular membranes via its fusion peptide located in its extracellular, N-terminal domain . Previously transmembrane fragments of channels and pores were predicted to be highly ordered [24–26]. However, the situation might be quite different for membrane proteins with relatively large extra- and intracellular domains, e.g., for fusion proteins. We now have an opportunity to analyse the predicted disorder rate of transmembrane proteins that are involved in the membrane fusion. Table 3 shows that the predicted disorder level for gp41 is quite sizeable (34% for PONDR® VLXT). In fact, the amount of disorder in this protein is significantly higher than that of the transmembrane proteins (HA, NA) of the influenza A virus. This might be also correlated with the high level of predicted disorder in the HIV matrix.
Similarly to gp41, HA is a transmembrane glycoprotein. Both, gp41 and HA, are members of the class I viral fusion proteins that mediate viral entry into cells. Class I viral fusion proteins are thought to fold into a prefusion, metastable conformation, which is then activated to undergo a large conformational rearrangement to a lower energy state, thereby providing the energy needed to accomplish membrane fusion [33–35]. The role of HA as a fusion protein and the associated large-scale conformational changes may help to account for the slightly higher predicted disorder rates observed for HA in many subtypes (see Table 5).
Analysis of the data in Table 3 revealed that transmembrane viral proteins are, in general, characterized by relatively low predicted disorder rates. For instance, the Vpr protein which is present in HIV but is not expressed in SIV  and structure of which was determined by NMR is predicted to be rather disordered (39% by PONDR® VLXT and 64% by PONDR® VL3, Table 3). On the other hand, Vpu was predicted to be more ordered (26% by PONDR® VLXT) in agreement with the fact that this protein has a transmembrane domain .
The matrix proteins, which form a layer below the lipid envelope, produced interesting data for both families of viruses. The matrices of both influenza and HIV viruses are relatively disordered (see Tables 3 and 4). The HIV matrix protein is predicted to be highly disordered, whereas the influenza virus matrix protein is predicted to be only moderately disordered (or somewhat ordered) by PONDR® VLXT. This peculiarity may highlight an important difference between the two families of viruses and may have important medical implications. For the influenza virus, the proteins that are even closer to the core include NS1, NP and PB1. All these proteins are predicted to be highly disordered.
While M1 is known to bind RNA, proteins that are located closer to the core are even more likely to interact with the viral RNA. This may account for the trend that, for proteins that are closer to the core, the amount of predicted disorder increases. This trend is clearly highlighted in Figure 2. Interestingly, a number of RNA-binding proteins that are not viral have also been predicted to be disordered (Table 6). This trend can also be seen for HIV virus proteins, with the exception that enzymes are usually predicted to be ordered. Intergrases (IN), Reverse Transcriptase (RT), and to some extent Protease (PR) exemplify the observation that enzymes are exceptions to the general trend. That is, all three of these proteins are enzymes, and they are all predicted to be relatively ordered. Reverse transcriptase and integrases, however, are also RNA-binding proteins, with the RT having an additional ability to bind DNA. In such cases, the proteins typically have (nucleic binding) regions that are somewhat disordered and also (catalytic) regions that are highly ordered respectively. Such trends have been observed for non-viral proteins in general [24–26], and the viral proteins seem to follow this rule of thumb as well. Consistent with these trends, the HIV protease, which binds neither RNA nor DNA, is not surprisingly predicted to be the most ordered of all.
The attachment and membrane fusion of the influenza virus and the host cell are mediated by its hemagglutinin (HA). HA is a homotrimer, and each monomer comprises an ectodomain with about 510 amino acid residues, a transmembrane domain with 27 residues, and a cytoplasmic domain with 10 to 11 residues. The HA monomer is synthesized as a single polypeptide chain and cleaved into two subunits, HA1 and HA2, by proteolytic enzymes after virus budding or during intracellular transport. The HA1 and HA2 subunits are functionally specialized. HA1 carries receptor-binding activity, and HA2 mediates membrane fusion . As discussed briefly above, the amount of intrinsic disorder in HA1 and HA2 varies with the viral subtype (Table 5). Analysis of the past experimental data in comparison with the disorder predictions in Table 5 suggests that variations in the infectivity of the virus , variations in the assembly of the HA proteins, and variations in the correlation between protein-membrane interaction and the lipid raft motion may all be related to the amount of predicted disorder. For example, one of the HA functions is to assemble proteins, including those involved into the formation of pores. Acetylation of the HA molecules often affects this function, which is crucial for the infectivity of the virus. However, it has been shown that H1, H3, and H7 behave differently when the sites that are normally palmitylated are mutated. Viral subtypes with H1 proteins were most affected by the mutations, whereas the virions did not lose much of their infectivity in the case of H3 and H7 [40, 41]. Table 5 shows that, among the viral subtypes analyzed, H1 proteins possess the least amount of predicted disorder, whereas H3 and H7 proteins were predicted to be essentially more disordered. We showed elsewhere that enzyme-mediated posttranslational modifications usually occur with disordered regions [24–26, 42]. The increased predictions of intrinsic disorder in HA are associated with increased infectivity of influenza virus, perhaps via changes in posttranslational modification the ease of which may depend on the tendency to be disordered.
For all enveloped viruses, the envelope is derived from the host cell during the process of virus budding. In the case of influenza virus, budding takes place at the apical plasma membrane and is heavily dependent on the presence of lipid microdomains, or "rafts" [43–45]. Lipid rafts, also known as detergent-insoluble glycosphingolipid-enriched domains, are specific domains on plasma membranes that are enriched in detergent-insoluble glycolipids (DIGs), cholesterol and sphingolipids [46–48]. Levels of cholesterol and sphingolipids can vary amongst individuals, which alters the extent the raft formation. Lipid rafts play an important role in several biological processes, including signal transduction, T-cell activation, protein sorting, and virus assembly and budding . Such enveloped viruses incorporate some integral membrane proteins; among the best studied are the influenza virus hemagglutinin (HA) and neuraminidase (NA) . Acetylation of the envelope proteins and also palmitoylation are important for these viral proteins to be targeted to the lipid raft microdomains on the cell surface . C-terminal domains of both HA and NA of influenza virus are crucial for association with rafts and this interaction constitutes part of the signaling machinery necessary for apical targeting in polarized cells. In fact, the cytoplasmic tails of HA and NA are so important for assembly that the information contained in these tails is partially redundant . For example, the removal of the cytoplasmic tail or mutation of the three palmitoylated cysteine residues in the transmembrane (TM) domain and the cytoplasmic tail of influenza virus hemagglutinin (HA) was shown to decrease the association of HA with lipid rafts, decrease the incorporation of HA into virions , and modulates incorporation of cholesterol into the viral envelope.
The level of the envelope cholesterol has been shown to play a crucial role in the HA-mediated fusion of the influenza virus with the host cell . These data were obtained for the WSN (H1N1) strain of influenza virus and the authors proposed that differences may exist with other virus strains. Perhaps the virion cholesterol is important for the organization of influenza virus HA trimers into fusion-competent domains, and perhaps also the depletion of cholesterol inhibits virus infectivity due to inefficient fusion . Here we suggest that variations in intrinsic disorder in the surface proteins may play similar role. In fact, Table 5 shows that H1 is predicted to be ordered, whereas H3 and H7 are predicted to be more disordered. This increased level of disorder might offer a mechanism for proteins to by-pass the lipid raft requirement. Studies on chimera proteins with specific swapping of regions predicted to be ordered or disordered could be used to test this proposed mechanism.
Observed paring of predicted disorder of HA-NA in subtypes involved in major epidemics. Quantitative details of the respective predicted disorder values can be found in Table 5.
HA (Predicted Disorder/Order)
NA (Predicted Disorder/Order)
"Spanish Flu" (1918)
"Asian Flu" (1957)
"Hong Kong Flu" (1968)
"Avian Flu" (1997)
An understanding of viral surface proteins is crucial for developing the appropriate vaccination strategies and for improving the understanding of the immune responses. The comparative analysis of intrinsic disorder distribution in the HIV and influenza virions uncovers specific patterns that could provide some useful insight into these problems. Above we showed that the level of predicted disorder varies in the HA and NA subtypes. This observation might be used for tuning vaccination strategies. However, the data in Table 5 shows that the variations in the predicted disorder do not deviate greatly. Furthermore, in general, HA and NA can be described as highly ordered to or moderately disordered (see Tables 3 and 5, and Figure 2). This may account for the observations that the anti-influenza antibodies recognize and bind various HA and NA subtypes providing the grounds for the development of an effective immune response and therefore for the elaboration of the appropriate vaccination strategies. The situation with immunogeneity of HIV virus is totally different. Although antibodies were found to bind to several HIV proteins, these HIV-binding antibodies do not lead to an effective immune response. The reason for this is unknown as of yet. However, we believe that a comparative analysis of disorder distribution in proteins from both orthomyxoviruses and lentiviruses might potentially provide greater insight into this problem.
The first step in HIV infection is the binding of the envelope glycoprotein gp120 to the host cell receptor CD4 [53, 54]. CD4 binding induces extensive structural rearrangements in gp120, resulting in the exposure of a binding surface for the second host cell chemokine receptor, CCR5 or CXCR4 [55, 56]. The interface between gp120 and CD4 is highly conserved among different HIV-1 isolates . In gp120-CD4 complexes, CD4 was shown to interact with all three domains of gp120, including the inner domain, the outer domain, and the bridging β-sheet. Furthermore, in all structures of various gp120-CD4 complexes analyzed by X-ray crystallography, a deep hydrophobic cavity enclosed by conserved gp120 residues was detected . CD4 residue Phe43 is the only cavity-interacting residue in CD4. It fits to the opening of this cavity  and was shown to contribute about 23% of the total interaction surface . According to our analysis, the surface protein of the HIV virion, gp120, has a consistently low predicted disorder value across various strains of lentiviruses (data not shown). Therefore, this feature has functional implications since a rigid structure of gp120 might be necessary for the formation of a stable complex with the host protein, CD4.
The analysis of Figure 2 reveals that, unlike the proteins of influenza A virus, HIV proteins do not follow the trend of having increasing amount of predicted disorder as the locations of the proteins become closer to the core. In part, this distinct trend can be attributed to the presence of several enzymes in the HIV virion capsule. However, even the HIV matrix proteins and some of the surface proteins have quite high percentages of residues predicted to be disordered. For example, Tables 3 and 4 shows that the matrix protein of influenza virus (M1) is predicted to be only somewhat disordered (25% by PONDR® VLXT and 0% by PONDR® VL3), whereas the matrix protein of HIV, MA (p17), has a very high percentage of disordered residues (61% PONDR® VLXT and 48% by PONDR® VL3). Furthermore, Table 3 shows also that this very high abundance of intrinsic disorder is extended to areas above the matrix and is observed in the HIV envelope. The gp41 (TM), which is a transmembrane protein involved in fusion, has a high predicted disorder rate, in striking contrast to the fusion protein of the influenza virus (HA2, Table 5).
An interesting possibility is that the high prevalence of intrinsic disorder in proteins located in the close proximity to the surface of HIV- related viruses provides a mechanism for the avoiding the induction of immune response. In fact, the antigenicity of a given protein is known to reside in a restricted number of antigenic determinants (sites or epitopes) located on its surface. As antigenic determinants of several proteins have been shown to correspond to the surface regions with high segmental mobility (high B-factor values), the high mobility of an antigenic determinant was suggested to help in the determinant adjustment to a pre-existing antibody site not fashioned to fit the exact geometry of a protein . On the other hand, additional research has revealed that an effective antigenic site, being mobile, should possess an internal propensity to form ordered structure; i.e., it should not be completely disordered. Importantly, some long disordered regions and intrinsically disordered proteins promote weak immune responses or are even completely non-immunogenic [60–62]. This is further illustrated by the analysis of literature data on the gp120 immunogeneity.
Neutralizing antibodies play a significant role in the vaccines development. The key HIV targets for neutralizing antibody are found in the external envelope protein, gp120 [63–65]. The principle neutralizing determinant of HIV-1 virus was mapped to the third variable (V3) loop region (residues 301–341) of gp120 [66–68]. This V3 loop is also required for viral entry into target T cells and macrophages  and interacts with chemokine co-receptors on the surfaces of these cells [55, 56, 70]. The V3 loop is characterized by a highly variable amino acid sequence, which is assumed to contribute to the ability of HIV to escape the host immune response .
Using solid state NMR spectroscopy it has been shown that a 24-residue fragment of the V3 loop of HIV-1 strain III (namely residues 308–331) that includes the GPGR motif is conformationally heterogeneous . Furthermore, this fragment was shown to adopt very different conformations when bound to different anti-V3 antibodies [71–73]. The disorder-to-order transition of V3 loop has been hypothesized to play a crucial role in function of this protein, determining its potential to interact with a variety of chemokine receptors and thus allowing different avenues into the cell . The same mechanism makes devising vaccines against HIV very difficult because some V3 loops escape detection by antibodies that specifically recognize a particular conformation but that fail to bind other conformations. These observations provide further support to the hypothesis that high abundance of intrinsic disorder in proteins located in the close proximity to the surface of HIV-1 can help this virus to avoid the immune response induction. Therefore, intrinsic disorder might represent a crucial viral weapon for evading immune response.
We previously discussed several pathogens that use disordered regions for binding, with these disordered regions being weakly immunogenic, and suggested using disorder for binding might be a common strategy for avoiding the immune system . For this mechanism, the disordered region needs to have a sufficiently high flexibility. For such flexible disordered regions, we speculate that the relatively small size of the antibody binding site provides insufficient binding energy to fold the flexible disorder and therefore cannot bind tightly enough for the generation of an immune response. On the other hand, in our proposal the relatively larger size of the receptor binding surface can provide sufficient energy of association to overcome the flexibility and thereby induce binding via a disorder-to-order transition. The flexibility of the key HIV proteins may be slightly less than that for the previously discussed pathogens, and so antibodies are produced and these bind to different conformational states. Yet the ability of the flexible disordered binding region to fold in different ways may lead to confusion of the immune system and may substantially weaken the overall immune response as discussed above. Thus, antigenic sites may benefit by being somewhat flexible , but probably become less effective as the flexibility increases beyond some useful level.
Results presented in this paper show the usefulness of the intrinsic disorder prediction for the comparative analysis of viral proteins. This approach offers several advantages, including the opportunity to map proteins by functionality, predicted disorder, and locality across viral species, strains and subtypes. Furthermore, it provides useful benchmarks for the evaluation of the intrinsic disorder concept and for the analysis of various disorder predictors. Using this comparative study of predicted disorder, several interesting patterns in the behaviour of viral proteins from HIV-1 and influenza A viruses were uncovered. We have shown that the patterns of predicted disorder can be mapped and related to the functions of the various proteins. There is evidence that the functions and the amount of disorder of the proteins are related to their physical location in the virion. Some of the key findings of this paper are further outlined below.
Intrinsic disorder is unevenly distributed within the virions, especially for influenza, with the least predicted disorder being observed at the surface proteins and the most disorder being characteristic for the proteins at the virion core. While a similar trend is observed for HIV, the disorder changes are much less pronounced.
Proteins near the surface of HIV-related viruses are characterized by higher levels of predicted disorder as compared to influenza. Although the major surface protein, gp120, has been consistently predicted to be ordered, its major neutralizing determinant is highly mobile. These data support a scenario where HIV virions can escape immune response despite the availability of antibodies for the HIV-related proteins.
Significant variations in the amount of predicted disorder by HA subtypes in influenza A virus were observed. This might provide an explanation for the variations in the functionality and infectivity of specific viral subtypes. Furthermore, NA and HA of major influenza A pandemic, tend to pair in such a way that both tend to be predicted either ordered-ordered or disordered-disordered. Such behaviour might be linked to the evolutionary advantages of being ordered or disordered, but more experiments are needed to test this conjecture.
The programs were written in C#, JAVA®-JDBC, Microsoft® SQLSERVER, and MYSQL. Object-oriented programming in JAVA®-5 was also used. The design of the database was done using relational database concepts with normalization in Third Normal Form Boyce-Codd Normal Form .
The predictors of intrinsic disorder used in this paper are PONDR® VLXT and VL3 [18, 19, 21]. PONDR® VLXT was built using 15 proteins whose structures were elucidated using X-ray diffraction, NMR spectroscopy, circular dichroism spectroscopy, or limited proteolysis . PONDR® VL3, on the other hand, was built using a combination of 30 neural networks and a training set of disordered regions of 150 proteins .
In order to do a comparative study of the viral proteins, it was necessary to develop a database that would capture the information from the amino acid sequence and the disorder prediction. The list of proteins of interest included viral proteins of lentiviruses and orthomyxoviruses. Searches were done on the list using the Entrez website . Available samples were randomly chosen with preferences given to those with longer chains and those with binding partners. Whenever possible, corresponding viral protein of different virus strains were included as samples and annotated. The respective FASTA and PDB  files were downloaded and stored using a JAVA® program and the list prepared.
In order to provide a benchmark for predicted disorder, a set of proteins from PDB-Select 90 was randomly chosen and downloaded to the database. The mean and standard deviation were calculated using bootstrapping techniques when necessary . PDB Select90  is defined as a representative, non-redundant subset of the PDB , made up of proteins that have no more than 95% sequence identity .
Using a set of programs written in JAVA®, the PDB and FASTA files were searched, and the essential information was placed in the MYSQL tables using accessions seq_access and seq_access_atom. The necessary FASTA files were then used to generate PONDR® VLXT and PONDR® VL3 scores via a LINUX BASH shell script. Another JAVA® program was then used to load the prediction into the seq_predn table. Information regarding to the virus and its subtype was initially stored in a Microsoft® SQLSERVER database via C# and later was transferred to the MySQL database server.
This work was supported in part by the grants R01 LM007688-01A1 (to A.K.D and V.N.U.) and GM071714-01A2 (to A.K.D and V.N.U.) from the National Institutes of Health and the Programs of the Russian Academy of Sciences for the "Molecular and cellular biology" and "Fundamental science for medicine" (to V. N. U.). We gratefully acknowledge the support of the IUPUI Signature Centers Initiative.
This article has been published as part of BMC Genomics Volume 9 Supplement 2, 2008: IEEE 7th International Conference on Bioinformatics and Bioengineering at Harvard Medical School. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2164/9?issue=S2
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.