Discovering putative prion sequences in complete proteomes using probabilistic representations of Q/N-rich domains
© Espinosa Angarica et al.; licensee BioMed Central Ltd. 2013
Received: 8 October 2012
Accepted: 6 May 2013
Published: 10 May 2013
Prion proteins conform a special class among amyloids due to their ability to transmit aggregative folds. Prions are known to act as infectious agents in neurodegenerative diseases in animals, or as key elements in transcription and translation processes in yeast. It has been suggested that prions contain specific sequential domains with distinctive amino acid composition and physicochemical properties that allow them to control the switch between soluble and β-sheet aggregated states. Those prion-forming domains are low complexity segments enriched in glutamine/asparagine and depleted in charged residues and prolines. Different predictive methods have been developed to discover novel prions by either assessing the compositional bias of these stretches or estimating the propensity of protein sequences to form amyloid aggregates. However, the available algorithms hitherto lack a thorough statistical calibration against large sequence databases, which makes them unable to accurately predict prions without retrieving a large number of false positives.
Here we present a computational strategy to predict putative prion-forming proteins in complete proteomes using probabilistic representations of prionogenic glutamine/asparagine rich regions. After benchmarking our predictive model against large sets of non-prionic sequences, we were able to filter out known prions with high precision and accuracy, generating prediction sets with few false positives. The algorithm was used to scan all the proteomes annotated in public databases for the presence of putative prion proteins. We analyzed the presence of putative prion proteins in all taxa, from viruses and archaea to plants and higher eukaryotes, and found that most organisms encode evolutionarily unrelated proteins with susceptibility to behave as prions.
To our knowledge, this is the first wide-ranging study aiming to predict prion domains in complete proteomes. Approaches of this kind could be of great importance to identify potential targets for further experimental testing and to try to reach a deeper understanding of prions’ functional and regulatory mechanisms.
KeywordsPrion domain Protein aggregation Amyloid fibrils Prion prediction
The formation of intracellular amyloid fibrils is a widespread phenomenon in eukaryotes [1–4] and it has been found related to a number of beneficial adaptive cellular functions [5–11], to protein-encoded heritable information transmission in yeast [12–15], and to a variety of important diseases in mammals [16–20]. Amyloidogenesis is mediated by a diverse group of evolutionarily unrelated proteins from different organisms, all sharing the propensity to form β-sheet aggregates in their complete or fragmented forms . A subset of these aggregation-prone proteins is characterized by the presence of regions that comprise homopolymeric tracts, also named ‘single sequence repeats’ . It has been reported that the presence of these low complexity stretches, and more specifically that of (Q/N)-rich regions, strongly influences the aggregation potential of eukaryotic proteins [22–24]. In several neurodegenerative disorders, such as spinocerebellar ataxias and Huntington’s disease, long pure glutamine repeats are generated by the instability of CAG codons [25–27], and cause the abnormal proteins to form intracellular inclusions in specific neuron types. However, prionogenic Q/N-rich regions usually contain additional amino acids and form sequentially heterogeneous domains responsible for the main properties of prions, including self-propagating amyloid aggregation.
Much research has been devoted to determine the structural and sequential basis of prion formation, and the compositional determinants of prionogenic domains. Studies from different groups have concluded that both amino acid composition and the length of such regions play important roles in prion induction [28–30]. Additional sequential requirements such as the number and distribution of prolines and charged residues have been recently found to be relevant in the formation of prionic aggregates . Mutational studies, in which the sequence of yeast prions Ure2p and Sup35p were randomly shuffled, proved that the [PSI+] phenotype is mainly determined by the amino acid composition of the domain independently of the primary sequence, as most of the shuffled species generated were able to form prions in vivo[28, 29]. This knowledge has been used to try to predict putative prions in biological sequence databases, though the available methodologies to carry out the task are just a few. A first group of algorithms intend to estimate the propensity of peptides of a given length to form amyloid aggregates based on their primary sequence [31–34]. This kind of methods, based on more or less complex models of parallel β-sheets, have proven quite ineffective for coping with Q/N-rich stretches since these domains do not share the common characteristics of β-sheet-amyloid forming peptides  –e.g. high hydrophobicity.
A second group of methodologies try to predict Q/N-rich domains from the primary sequence based on the strong amino acid compositional bias of these segments. Proteome-wide identification of Q/N-rich regions was successfully achieved in 30 proteomes from eukarya, archaea and eubacteria using a quite straightforward algorithm based on the estimation of the significance of occurrence of regions with a high proportion of Q and N . A similar methodology for assessing compositional bias in biological sequences was also tested to find proteins enriched in Q and N . However, these two algorithms only take into consideration the frequency of a specific group of biased amino acids in a given sequence segment –i.e. Q/N, hydrophobic or charged amino acids, instead of considering the relative contribution of all the residues present in the segment to the prionogenicity of the domain . Furthermore, they failed to generate a statistical model and a scoring function that would allow the systematic evaluation of protein segments and the sorting of the predicted domains according to their prionogenicity. A recent report has proposed an interesting alternative procedure to generate a bioinformatics model to predict prions at genomic scale. Starting from the sequences of four known yeast prions, a hidden Markov model (HMM) was generated to assess the compositional similarity of proteins from the yeast proteome to the model. This yielded up to 200 proteins with candidate prionogenic domains (PrD), from which the top scoring 100 were tested experimentally in vitro and in vivo. Finally, a total of 19 new proteins that proved switching behavior and amyloid formation were identified, which adds to the four prions previously described in this organism. Notwithstanding the remarkable outcomes from this work, the inherent bias of the predictive model built, generated from just a few sequences , apparently hampers its ability to correctly score proteins sequences, as roughly half of the high scoring predictions were false positives exhibiting no prion-like behavior.
A complementary strategy went farther in an attempt to define the compositional features that influence prion formation. Libraries of Sup35p mutants expressed in vivo were used to comprehensively analyze the sequence compositional determinants of prions . This study ultimately produced an experimental technique to measure the prion propensities of individual amino acids, showing that there is a strong bias against prolines and charged residues, a strong bias favoring the presence of hydrophobic residues and no significant bias for or against Q/N residues . With this methodology, the scoring of the putative prions made by Alberti et al. could be improved. A recent follow up by the same group has used this methodology to design de novo synthetic prionogenic sequences capable, not only of forming amyloids, but also to stably propagate over many generations . However, this and the other approaches available to date for identifying and predicting Q/N-rich segments with prionogenic activity, lack a detailed statistical benchmarking of their performances at a genomic scale. Thus, a methodology able not only to identify putative prion domains in large databases of protein sequences, but also to correctly classify the predictions in terms of precision and accuracy would be of high interest.
Here we present a bioinformatics approach to create a statistical representation of prion domains that allows scoring protein sequences according to their likelihood of being prions. Starting from a list of 29 proteins reported experimentally to exhibit conformational conversion and amyloid formation in yeast , we have developed a probabilistic model of PrD to discover Q/N-rich prionogenic proteins in complete proteomes. The independent probability of occurrence of all amino acids in prion domains were estimated and a log-likelihood model was built to assign uncalibrated scores to sequence fragments of variable length. We first benchmarked our model against a list of 18 proteins that were tested in the same experimental conditions and showed no SUP35C activity in vivo. From this assay we obtained the predictive cutoff that should be used and the confidence intervals of the predictions. Our classifier performed fairly well filtering prions from proteins with no prionogenicity with an accuracy higher than 0.83 and a precision of 80% at the predictive cutoff set. In these conditions the fraction of false positives was rather low, corresponding to less than 16% of the total predictions. We also tested the ability of our model to scan large sequence datasets from Uniprot , the PDB  and intrinsically disordered proteins (IDPs) annotated in Disprot . Our results proved that the model is well suited to handle datasets with a high proportion of negative instances without recovering an excessive amount of false positives, which is important to perform predictive assays in complete proteomes. Our scoring model was effective to almost completely separate the distributions of real prion domains from the Uniprot and PDB datasets, while the sequence of some IDPs proved more alike Q/N-rich prion forming domains.
We have used this methodology to scan all the known proteomes annotated in public databases, which yielded 20540 predictions in 1536 different organisms from all taxa. This is to our knowledge the most extensive effort to predict PrD sequences performed so far, reporting putative prions in the proteomes of a diverse group of organisms, most of which have been poorly studied. We also inspected the predictions obtained and observed some interesting trends in the distribution of PrDs in different protein functional families. The predicted prionogenic domains appear to be associated with different cellular components and to function in different biological processes depending on the taxon and organism group. The present predictive approach uncovers a large set of putative prionogenic proteins whose further experimental characterization might contribute significantly to understanding prion biology.
Amino acid composition of prion-forming domains
Amino acid propensities in PrD and PrD-cores
Prion domain (Library 1)
The analysis of the ratios reported in a previous work  resulting from a random mutagenesis assay of two specific segments of Sup35p protein reveals significant differences with our results. They include, see Table 1, differences in the relative log-odds for some important residues such as E, 3.8 times less frequent in PrDs according to our results and P, which is 3.9 times more likely to be found in these domains according to our model, see Table 1. The more remarkable differences are obtained for some key residues such as Q and N, for which we found a marked favorable bias. For other residues such as K, Y, S and D no significant differences were found between our model and the results from Toombs et al.
Using compositional bias to assess the prionogenicity of protein sequences
We also decided to test the wealth of the amino acid propensities calculated in our model and check whether there is a high rate of redundancy within the training set, which could hamper the predictive potential of the model. Thus we performed a thorough bootstrap assay in which we randomly resampled 106 training sets from the 18 sequences that are positives in all the experimental assays, leaving out 9 PrDs each time, see Methods for details. In each case we recalculated the propensities and used the excluded PrDs as positive test set in the ROC plot tryouts, maintaining the same negative set. The results of this experiment are also shown in Figure 2, where the average ROC curve calculated from the million plots generated is depicted. As expected, the AUC decreases, but only to 0.87, which still corresponds to a fairly good classifier performance, reflecting that the deviation from the most common classification behavior is marginal. This finding means that the estimated propensities calculated from the training set are unbiased and are significant enough to correctly separate the population of positive and negative instances.
Testing the suitability of our algorithm to process large sequence databases
Selection of a cutoff value for predicting in complete proteomes
Proteome-wide predictions of proteins bearing putative PrDs
Summary of the prion predictions in different taxa
Ratio of prion domains in the proteomes of representative organisms
% of the proteome
Listeria monocytogenes 1
Bacillus cereus 1
Staphylococcus aureus 1
Cryptosporidium parvum 2
Dictyostelium discoideum 2
Dictyostelium purpureum 2
Plasmodium falciparum 2
Theileria parva 2
Trypanosoma brucei 2
Candida albicans 3
Saccharomyces cerevisiae 3
Lodderomyces elongisporus 3
Arabidopsis thaliana 4
Oryza sativa 4
Drosophila melanogaster 5
Drosophila mojavensis 5
Anopheles gambiae 5
Caenorhabditis elegans 6
Homo sapiens 7
From amino acid composition to a comprehensive model of prion-forming domains
Great effort has been devoted in recent years to the experimental characterization of prion proteins, with a special interest in defining the sequential and structural determinants of aggregate formation and prion transmission. To date, the number of prions studied is still limited and little is known regarding the approximate number of prion-like proteins in complete proteomes or the cellular processes in which they might be involved. Nevertheless, several studies have shed some light into the general characteristics of prions [1, 16, 50–52] and how this information can be used to try to identify novel Q/N-rich candidates in protein databases [30, 36–38]. Only recently the availability of high-throughput experimental procedures to study prions in vitro and in vivo[38, 53–55] and the feasibility of extensive mutational studies [28–30, 56] have provided deeper insights into the characteristics of protein domains that mediate aggregation and prion induction. It is now clear that methodologies relying on approximating the likelihood of contiguous protein stretches to form parallel β-sheets [31–34] cannot be successfully used to predict Q/N-rich prion domains. Among other examples, these methods are unable to predict β-aggregation nuclei in known yeast prions such as Ure2p and Sup35p . Instead, prediction of PrDs using the distinctive amino acid composition of these domains [30, 36, 37] and assuming primary sequence independence for prion formation [28, 29, 39, 56] appears more promising. A recent comparison of most of the methods currently used to predict prion propensity has proved that approaches that focus largely on composition –e.g. PAPA and Zyggregator– show far more predictive accuracy than those focusing on primary sequence .
Following this idea, we have generated here a reliable model that uses the compositional bias of PrDs, taking special care on thoroughly benchmarking the algorithm in order to establish realistic confidence intervals for predicting in large biological sequence databases. The results from the work by Alberti et al. were very valuable to provide an ample enough training set from which we obtained the statistical potentials summarized in Table 1. The odds-ratios calculated by us embody the previously described bias observed in prion-forming domains [30, 36, 37], and enable the inspection of protein sequences to find putative PrDs. Our method relies solely on amino acid propensities calculated using compositional bias, plus a correction to the score which accounts for the unfavorable existence of certain proline patters in the sequences analyzed, see Figure 1. The variance of the score distribution of candidate prions for which there is strong experimental evidence , reflects the high sequential variability that aggregation-prone domains can accommodate. In their work, Alberti and coworkers do not make a statistical evaluation of the predictive power of the model used. Instead, they rely on the potentiality of the high-scale experimental assays performed to classify the predictions. They acknowledge the bias of the hidden Markov model built , which might be related to the scant scoring capability of the method that ranks highest a number of sequences that showed no aggregation propensity. The training stage is very important in the construction of HMMs , and this is probably why this model, generated from just a few examples, is able to identify probable candidates but is unable to score them correctly. We believe our model improves the scoring of these sequences, as can be inferred from the scoring of known PrDs in the complete yeast genome (Figure 3).
Another recent study aimed at modeling and predicting prions  has produced interesting results. The authors carried out random mutagenesis assays of the Sup35p sequence in specific locations and tested for amyloidogenesis in the expressed cultures, resulting in estimations of the propensities of amino acids in PrDs. A two dimensional analysis, complementing the prion propensity estimations with calculations of intrinsic disorder, was also used to improve the classification method. This methodology has been successfully used to generate synthetic prion-like sequences that were able to form aggregates and propagate on in vivo experiments . As stated above, this methodology by Toombs et al., displaying a fairly high classification accuracy when compared to other available methodologies, rely on the random mutation of just two short segments of 19 and 7 amino acids of Sup35, a domain of almost 100 residues with long glutamine and asparagine-rich stretches. As a consequence, it is possible that the mutational space is not completely explored, which could result in a model not well suited to scan large sets of protein sequences. In contrast, our model is based in the sequences of almost all the known proteins displaying prion-like behavior and we have demonstrated that our method can perform as well as PAPA for differentiating real and false prions. The bootstrapping assay, see Figure 2, also proves that the propensities obtained are unbiased.
Putting the algorithm in context: analyzing real-size sequence databases
Most of the algorithms used to predict Q/N-rich prion candidates [30, 36–38] have a common downside: they lack of a proper statistical calibration of the methodology and thus an estimation of the predictive capability of the model to scan sequence databases. In some cases, protein sequences have been modeled as a Poisson  or a binomial  distribution to calculate the probability of occurrence of glutamine and asparagine in a peptide, and its statistical significance. These approximations have two main problems; the first is that they exclude the positive or negative contributions of all other residues to the prionogenicity of the domain. And the second is that not even a normalized probability of occurrence for the Q/N composition of a stretch guaranties a good classification performance in terms of number of false positive prions that will be returned to rescue a desired number of true prions. Our position-independent model accounts for the positive contribution of Q and N to prion induction, but also for the favorable contribution given by S and Y, and for the unfavorable contribution of C, E or W, among others (see Table 1). Our model corresponds to an unsupervised learning classifier that represents almost all the rules describing real prion-forming domains, also appending the negative contribution of uncontiguous prolines. An increase in the number of PrDs sequences available for the training, as well as the inclusion of supervised training to add biologically relevant information to the model, such as organism-specific information of the distribution of prolines in the domains or the intrinsic β-aggregation propensity of the sequence, might improve the predictive potential of our model.
We have confirmed here that our strategy performs reasonably well at recovering known prions from large datasets of protein sequences, which makes it very appropriate to make predictions at genome scale. The method shows a consistent performance even for 500-fold skews towards the negative instances population, see Figure 4, suggesting that the compositional information embodied in the model can efficiently discriminate between prions and non-prions in variable-size protein sequences databases. This is important if the goal is to predict Q/N-rich domains in small genomes of just a few hundred proteins as well as in the larger eukaryotic genomes.
The benchmarking of our algorithm also gives us the opportunity to obtain statistically the confidence intervals within which we can predict prions in complete proteomes. The choice of a classification cutoff score is always subjective, but an analytical approach permits to ascertain the composition of the recovery sets during the search of a database, and also enables controlling the inherent tradeoff between precision and recall . Here we decided to set the cutoff high at 50 bits, as depicted in Figure 5, in accordance to the maximum prediction accuracy and to diminish as much as possible the rate of false positives included in the predictions. We were primarily concerned about obtaining a high number of fall-outs that could mislead the implications of our work. The false discovery rates obtained support the fairly good classification ability of the algorithm that minimizes down to 16% the proportion of non-prions passing the cutoff.
It is also interesting that with our scoring model we found compositional similarities between some IDPs [60–62] and prions. Amino acid composition has been used in the past to predict IDPs [60, 63–65] and those studies have concluded that such domains are enriched in K, E, P, S and Q, and depleted in W, C, Y, G and N . The propensities calculated in this study represent in some cases a compositional bias similar to those found in IDPs, –i.e. enrichment in Q and S and the depletion in C and W. This might be the reason causing the superposition of the right tail of the Disprot score distribution with that of PrDs. Based in those similarities, we can argue that most of the false positive predictions recovered in a predictive tryout would be natively disordered proteins. There are also experimental evidences suggesting that certain intrinsically disordered proteins might in fact propagate like prions [66, 67], including α-synuclein , the Aβ peptide  and huntingtin , involved in Parkinson, Alzheimer and Huntington diseases, respectively. Huntingtin is predicted to posses a PrD, whereas Aβ and α-synuclein are not included in our dataset. However, it is still a matter of debate whether these two proteins are disordered or contain a significant α-helical content [71, 72]. Therefore, it could be that our method can correctly classify proteins in the superposed zone between the two distributions, and that some of the predictions tagged as false positives could be in fact prions. However, in general terms, the amino acid propensities of the rest of residues is rather different between IDPs and PrDs, which determines that, in most cases, our algorithm can accurately discriminate between these domain types.
Discovering putative prion-like domains in complete proteomes
Although generally thought as linked to disease, prions are also associated with central cellular functions and have been well studied in fungi and some microorganisms where they play important roles as epigenetic elements [73, 74], evolutionary capacitors [13, 75] and bet-hedging devices [76, 77] in the processes of adaptation to environmental fluctuations. There are also evidences suggesting that, even in invertebrates, prions take part in mechanisms crucial to maintain long-term physiological states [78–80]. However, our knowledge of prions in higher organisms is limited to a handful of examples associated to serious illnesses, thereby the need for strategies that can point out new putative candidates that might be coupled to other cellular functions. The decisive step of a predictive methodology is always the discovery of new instances resembling a given model under some statistical restrictions. Our model, and most importantly the outcomes of the calibration process that proves that our methodology can be used to scan large databases without losing accuracy, gave us the opportunity to scan all the available proteomes. This distinguishes our work from previous attempts in a few specific organisms. The 20540 predictions in 1536 different organisms from all the evolutionary classes represents, to our best knowledge, the most extensive set of PrD predictions obtained so far, which will help to attain a global view of the distribution of prion domains in the proteomes of organisms and to unravel the cellular processes in which proteins containing different prion-forming domains might be involved.
Our results show that, in general terms, the number of prions per genome is low, though there are organisms in which prion-like self-assembly might play important functions, as can be inferred from the rather high number of prions in their genomes. It is important to bear in mind that there could be a significant bias in these estimations, when associated with annotation problems of some genomes. The analysis of incomplete sequenced genomes of some members of the genus Plasmodium proved that they contain abundant hydrophilic low-complexity segments, which correspond to species-specific, rapidly diverging regions that might be forming non-globular domains that help the parasites to evade the host’s immune response . Here we demonstrate this trend by analyzing the complete proteomes of various members of this genus, and propose that most of these stretches may correspond to PrDs. We also found a similar tendency in the genome of Dictyostelium discoideum, by far the organism with more predicted prions in its proteome, which implies that most of the low-complexity stretches found in the sequencing of the genome of this organism  could be prions, though the functional implications of such an amount of aggregation-prone proteins is unclear. Having a high number of low-complexity stretches appears to be characteristic of these organisms . Accordingly, despite being less represented than in Dictyostelium discoideum, the number of PrDs in Dictyostelium purpureum genome is fairly high in comparison with that in other organisms. It is known that Plasmodium is able to survive with an aggregation-prone proteome even under the periodic heat shock stress that characterize malaria, where patients suffer recurrent episodes of fever exceeding 40°C. This is possible thanks to the presence of specialized chaperones, which are essential for parasite survival within red cells, . So far only one of our Plasmodium PrDs candidates has been characterized experimentally: PFI115w (Q8I2S1_PLAF7). In agreement with our prediction, the protein aggregates intracellularly when expressed in human cells . Plasmodium chaperones act as cellular capacitors allowing the accumulation of potentially deleterious PrDs, whose presence should therefore provide certain advantage to the organism. It is still to discover whether Dictyostelium exploits a similar strategy to cope with the high aggregation load of its proteome.
Saccharomyces cerevisiae is the most studied organism regarding amyloid formation, and there are various predictive strategies reporting putative PrDs in its complete proteome [30, 38, 83]. Here we have not only improved the scoring capability of previous methodologies , but have also provided an ample list of PrD predictions, including more than 500 completely new predictions in the yeast proteome. The molecular chaperone Hsp104 is essential for the propagation of known yeast prions, which cannot be propagated in cells devoid of the chaperone. The current model of amyloid propagation suggests that the prion fibrils need to be shortened or cleaved by Hsp104 in order to be transmitted to the progeny during cell division . Therefore, one should expect a certain correlation between the ability of Hsp104 to propagate prionogenic species and the number of PrDs in the proteome of this organism. Despite its homology with the S. cerevisiae chaperone, it has been shown that the Schizosaccharomyces pombe Hsp104 is unable to propagate the [PSI+] prion . Interestingly enough, only 3 putative PrDs were identified in the genome of S. Pombe. This is in contrast with Candida albicans, the yeast with the largest number of predicted PrDs after S. cerevisiae (169 domains), whose Hsp104 chaperone supports [PSI+] prion propagation .
Association between proteins bearing PrD predictions and diseases in human
Spinocerebellar Ataxia Type 8
Premature ovarian failure
Peripheral primitive neuroectodermal tumor
Amyotrophic lateral sclerosis
Prion-like domains are associated to specific protein functions, processes and locations in different organisms
The analysis of the predictions in the different proteomes using Gene Ontology annotations allows classification of proteins into functional classes, processes and cellular locations, uncovering similarities and differences in PrDs distribution between taxa or evolutionary related organisms (Additional files 11, 12, 13). A first surprising observation is that the predicted PrDs appear to be associated with different cellular components and to work in different biological processes in different taxa and organism groups. These data are consistent with the view that the common switching mechanism underlying prion behavior can be exploited for different physiological purposes .
In bacteria, PrDs are depleted in the intracellular space and significantly enriched at the cell wall. Accordingly, bacterial PrDs appear to be essentially involved in metabolic and catabolic processes resulting in construction and disassembly of the cell wall. No prion protein has been characterized yet in bacteria. However, many bacterial species form extracellular biofilms, which are constituted, among other components, by proteins assembled into amyloid structures identical to those in neurodegenerative disorders. Amyloidogenic proteins in biofilms are constituents or interact with the bacterial cell wall. Biofilms are important virulence factors for bacteria favoring the attachment to eukaryotic cells. Importantly, biofilm forming pathogens such as Staphylococcus aureus present the highest content in PrDs, suggesting that the identified proteins might contribute to form or sustain the network of amyloid contacts that stabilize the biofilm. Preliminary experimental data support this view since the predicted S. aureus PrD (SSAA2) forms bona fide amyloid fibrils in vitro (S.V. unpublished results). Bacterial amyloids can initiate the formation of pathogenic or misfolded amyloid upon interaction with diverse host proteins . This template-directed process resembles prion transmission and brings up a possible relationship between bacterial infections and neurodegenerative diseases. Accordingly, bacterial amyloids cause the development of amyloidosis when they are injected in susceptible mice .
In eukaryotes, PrDs are intracellular and preferentially localized in the nucleus, as previously suggested . In yeast and plants, PrDs are found associated with the transcription factor II D component, a protein complex composed of the TATA binding protein (TBP) and a set of TBP associated factors (TAFs), well conserved across species. Binding of TFIID to DNA is necessary for transcription initiation from most RNA polymerase II promoters. Accordingly, in both taxa, a large number of PrDs are linked to transcriptional function. In fungi 86 PrDs are involved in catalyzing release of nascent polypeptide chains from the ribosome, a function similar to that exerted by SUP35. Overall, both in fungi and plantae PrDs are enriched in DNA and RNA-binding proteins, controlling apparently unrelated processes such as nitrogen utilization in fungi and hormone (auxin and ethylene) signaling pathways in plants.
In animals, PrDs are also essentially nuclear and depleted in both the mitochondrial and plasmatic membrane, consistent with a soluble nature under physiological conditions. They are also underrepresented in mitochondrion, consistent with the observation that bacteria contain a reduced number of PrDs. Also in animals the majority of PrDs corresponds to DNA and RNA-binding proteins. In vertebrates, PrDs are overrepresented in two important functional components; the mediator and the histone acetyltransferase complexes. Mediator is a multiprotein complex that functions as a transcriptional coactivator in all eukaryotes. In fact we also find PrDs linked to mediator in yeast. The mediator complex is required for activation of transcription of most protein-coding genes, but can also act as a transcriptional co-repressor. In humans, it includes proteins such as MED12 and MED15, which, as discussed previously, are linked to debilitating disorders. Histone acetylation is also linked to transcriptional activation and associated to euchromatin. Histone acetyl-transferases can also acetylate non-histone proteins, such as transcription factors and nuclear receptors to facilitate gene expression. The DNA/RNA binding properties of mammal PrDs determine that most of them act in the control of transcriptional and translational processes. In humans, these proteins include transcriptional factors (PAX-interacting protein 1, TOX3), tumor suppressor proteins (MN1), histone methyl/acetyl-transferases (Histone-lysine N-methyl-transferase MLL2, E1A-binding protein p400) and nuclear receptors (NCOA3), and they function in essential pathways such as beta cadherin mediated Wnt signaling or estrogen response.
Overall, in animals, PrDs appear to work in the upstream regulation of central biological processes and more specifically in development. In vertebrates PrDs act in the development of central nervous regions such as the putamen, caudate nucleus or the neural crest. This regulatory activity of neuronal development is conserved between mammals and humans, where PrDs additionally play a role in cerebellum and cerebral cortex development. Therefore, it is likely that PrDs malfunction might be intimately linked to the apparition of neurodegenerative diseases, as previously discussed (Table 4). Mammal and human PrDs are also involved in embryonic development and more generally in cell differentiation, which might explain the association of PrDs with different types of cancer (Table 4).
Interestingly, 30% of the predictions in humans were found in proteins of unknown function. If we combine all the predictions obtained in this study for all the analyzed organisms, the percentage of PrDs predictions in proteins of unknown function raises to 564%. Therefore, our results could be of help to uncover new potential targets for experimental analysis and to unravel the yet-to-discover functional implications of these proteins.
In this work, we have developed a probabilistic model to predict prion domains based on the primary sequence of proteins. By using this model, which is combined with a thorough benchmarking and calibration to handle genome-size sequence databases, we have been successful on predicting prions in all the proteomes available, which to our knowledge constitutes the most extensive study in this direction performed so far. We have disclosed an ample list of proteins containing stretches with a fairly high compositional similarity to those of known prions, including proteins from almost all the evolutionary classifications and taxa, from archaea and viruses to mammals and human. Our results also show that this kind of domains is found in an ample and diverse group of evolutionarily unrelated proteins. In fact, our predictions highlight some interesting trends in the distribution of prion domains in different protein functional families, different cellular compartments and involved in dissimilar biological processes depending on the taxonomic classification. In a time in which prion biology is a rather unexplored field, and the number of prion proteins confirmed experimentally is scarce, predictive approaches such as ours could be of great help to pinpoint putative prionogenic proteins for further experimental characterization. Thus, the free distribution of these predictions, as well as the continuous updating and improvement of the predictive models based on new experimental evidence, might significantly contribute to increase the understanding of prion biology and to reach a deeper understanding of prions’ functional and regulatory mechanisms.
A group of 29 proteins that proved heritable switch and significant in vivo amyloid formation in yeast  was used as the training set for obtaining the amino acid propensities in prion domains. We calculated the propensities based on the complete sequences that were cloned and tested experimentally in this work, which we believe, is more credible than using the predicted PrD-cores, which are inferred solely based in statistical precepts. Another set of 18 high scoring prion predictions, all of which had also been experimentally tested and showed no prion-forming propensity in any of the four assays , was used as the negative evaluation set in the benchmarking of the methodology (the sequences of the proteins and PrDs are described in the Additional file 14). The positive evaluation set for the ROC plot analysis was formed with the 18 out of the 29 prions used to construct the model that resulted positive in all the four assays described in the work by Alberti et al. In order to avoid artifacts due to the use of intersected sets of positive instances for training and testing, we also performed an exhaustive jackknife bootstrap assay to estimate the significance of the amino acid propensities obtained. In this bootstrap assay we resampled with replacement one million subsets from the positive set of 18 prion proteins, randomly excluding half of the prions each time. We then regenerated the model with the remaining 9 prions and used the excluded instances as the positive test set for the ROC plot construction, while the negative set was the same set of 18 negative sequences in all cases. Accordingly a million ROC plots were built and processed to obtain the average curve and the errors associated to the estimations in each point of the curve.
We also defined three additional evaluation datasets, comprising the Uniprot/Swissprot database  (release from February 2012), a culled list of proteins with solved tridimensional structure annotated in SCOP (version 1.75) obtained from the ASTRAL compendium  (including proteins with less than 95% sequence similarity) and all the intrinsically disordered proteins annotated in Disprot  (version 5.7). In the case of the Uniprot/Swissprot dataset we randomly generated a million sets that were used in the benchmarking, while for the other two databases we used all the protein sequences annotated. In all cases the known prions were removed from the negative datasets. These three test sets were used to measure the ability of the model to handle sequence datasets with a high number of negative instances, as it is the case of the scanning of complete proteome databases.
Construction of the probabilistic model
where the Score of a protein sequence segment of length L is obtained accounting for the relative support of each amino acid independently.
in which the second addend accounts for significance of non-contiguous prolines in the sequence. The resulting corrected scores were used in the benchmarking and predictive stages of our methodology.
Benchmarking of the classification methodology
where TP, FN, FP, TN stands for true positives, false negatives, false positives and true negatives respectively. These variables were used to calculate the false positive (FPR) and true positive (TPR) rates, needed for constructing the receiver operating characteristics (ROC) curves. The Accuracy, Precision and false discovery rate (FDR) were also calculated. The areas under the ROC curves (AUC) were calculated non-parametrically using the trapezoid algorithm. All the statistical analysis was done using the R suite  and a library of ad hoc Perl scripts developed by us.
Predicting Q/N-rich putative PrD in complete proteomes
We downloaded the complete proteomes of all the organisms sequenced so far from the Uniprot/Knowledgebase database  to identify novel proteins containing prion-forming domains. These repositories include four-weekly updates of proteins resulting from genome sequencing and annotation projects and are subdivided in two complementary and non-redundant datasets: a) Swissprot for fully annotated curated entries and b) TrEMBL formed by computer-generated entries enriched with automated classification and annotation. This subsection of Uniprot is organized in separate files for different taxonomic divisions, which give us the opportunity to study the compositional characteristics of our predictions in each evolutionary clade. In this dataset, there is a file for each taxon, including all the proteins for organisms belonging to that taxon, except for rodents, mammals and human, which are distributed in individual files each. These files were processed with an ad hoc perl script included in Additional file 15. The proteins passing the cutoff defined in the predictive methodology based on the amino acid composition of a continuous stretch of sixty residues  –i.e. what was proposed to be a typical length of PrD-cores– were accepted as predictions. All the predictions, organized in one file for each taxon can be found in the Additional files 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. The predictions obtained were analyzed to estimate the number of proteins with PrDs in all the taxa studied, belonging to different ontology classifications  in the following sub-categories: Molecular Function, Biological Process and Cellular Component. Also, in order to estimate the significance of the number of predictions in a given classification, we set up a tryout in which we calculated the expected number of each GO term by randomizing the selection 106 times and then estimating the z-scores for each GO term parametrically. These results are included in Additional files 11, 12, 13.
The following additional data are available with the online version of this paper:We provide ten pdf-files (one for each taxon: Archaea, Bacteria, Viruses, Fungi, Invertebrates, Vertebrates, Plants, Rodents, Mammals and Human) including all the prion-forming domain predictions obtained using our methodology. Each file is organized by organism (the organism line is headed with the ‘>’ symbol, followed by the specific name of the organism followed by colon and the number predictions in this organisms). After the organism line, we include one description line for each prediction, organized in the following way: the Uniprot ID of the protein bearing the prediction followed by tab and the position of the first residue of the sixty-residue window used by our algorithm as described in the Methods section, followed by a semicolon and the score of the prediction in bits, then a vertical bar separates the sequence of the ‘Prion Domain’ predicted in this protein. At the head of each file we also include a summary sectionwith the information of all the predictions obtained in the given taxon with the name of the taxon.
intrinsically disordered proteins
area under the curve
hidden markov models
receiver operating characteristics curve.
VEA was funded by Banco Santander Central Hispano, Fundación Carolina and Universidad de Zaragoza and is now recipient of a doctoral fellowship awarded by Consejo Superior de Investigaciones Científicas, JAE program. SV would like to acknowledge financial support from grants BFU2010-14901 from Ministerio de Ciencia e Innovación (Spain), 2009-SGR-760 and 2009-CTP-00004 from AGAUR (Generalitat de Catalunya. SV has been granted an ICREA Academia award (ICREA). JS would like to acknowledge financial support from grants BFU2010-16297 [Ministerio de Ciencia e Innovación Spain] and PI078/08 and CTPR02/09 [DGA, Spain]. We also thank the HPC group from the BIFI for technical assistance in the running of parallel jobs in the BIFI/UNIZAR computer cluster.
- Inge-Vechtomov SG, Zhouravleva GA, Chernoff YO: Biological roles of prion domains. Prion. 2007, 1 (4): 228-235. 10.4161/pri.1.4.5059.PubMed CentralView ArticlePubMed
- Chiti F, Dobson CM: Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem. 2006, 75: 333-366. 10.1146/annurev.biochem.75.101304.123901.View ArticlePubMed
- Selkoe DJ: Folding proteins in fatal ways. Nature. 2003, 426 (6968): 900-904. 10.1038/nature02264.View ArticlePubMed
- Jahn TR, Radford SE: The Yin and Yang of protein folding. FEBS J. 2005, 272 (23): 5962-5970. 10.1111/j.1742-4658.2005.05021.x.View ArticlePubMed
- Coustou V, Deleu C, Saupe S, Begueret J: The protein product of the het-s heterokaryon incompatibility gene of the fungus Podospora anserina behaves as a prion analog. Proc Natl Acad Sci U S A. 1997, 94 (18): 9773-9778. 10.1073/pnas.94.18.9773.PubMed CentralView ArticlePubMed
- Iconomidou VA, Vriend G, Hamodrakas SJ: Amyloids protect the silkmoth oocyte and embryo. FEBS Lett. 2000, 479 (3): 141-145. 10.1016/S0014-5793(00)01888-3.View ArticlePubMed
- Podrabsky JE, Carpenter JF, Hand SC: Survival of water stress in annual fish embryos: dehydration avoidance and egg envelope amyloid fibers. Am J Physiol Regul Integr Comp Physiol. 2001, 280 (1): R123-R131.PubMed
- Chapman MR, Robinson LS, Pinkner JS, Roth R, Heuser J, Hammar M, Normark S, Hultgren SJ: Role of Escherichia coli curli operons in directing amyloid fiber formation. Science. 2002, 295 (5556): 851-855. 10.1126/science.1067484.PubMed CentralView ArticlePubMed
- Graether SP, Slupsky CM, Sykes BD: Freezing of a fish antifreeze protein results in amyloid fibril formation. Biophys J. 2003, 84 (1): 552-557. 10.1016/S0006-3495(03)74874-7.PubMed CentralView ArticlePubMed
- Fowler DM, Koulov AV, Alory-Jost C, Marks MS, Balch WE, Kelly JW: Functional amyloid formation within mammalian tissue. PLoS Biol. 2006, 4 (1): e6-10.1371/journal.pbio.0040006.PubMed CentralView ArticlePubMed
- Maji SK, Perrin MH, Sawaya MR, Jessberger S, Vadodaria K, Rissman RA, Singru PS, Nilsson KP, Simon R, Schubert D: Functional amyloids as natural storage of peptide hormones in pituitary secretory granules. Science. 2009, 325 (5938): 328-332. 10.1126/science.1173155.PubMed CentralView ArticlePubMed
- Chien P, Weissman JS: Conformational diversity in a yeast prion dictates its seeding specificity. Nature. 2001, 410 (6825): 223-227. 10.1038/35065632.View ArticlePubMed
- Shorter J, Lindquist S: Prions as adaptive conduits of memory and inheritance. Nat Rev Genet. 2005, 6 (6): 435-450.View ArticlePubMed
- Liebman SW, Chernoff YO: Prions in yeast. Genetics. 2012, 191 (4): 1041-1072. 10.1534/genetics.111.137760.PubMed CentralView ArticlePubMed
- Staniforth GL, Tuite MF: Fungal prions. Prog Mol Biol Transl Sci. 2012, 107: 417-456.View ArticlePubMed
- Aguzzi A, Calella AM: Prions: protein aggregation and infectious diseases. Physiol Rev. 2009, 89 (4): 1105-1152. 10.1152/physrev.00006.2009.View ArticlePubMed
- Bellotti V, Chiti F: Amyloidogenesis in its biological environment: challenging a fundamental issue in protein misfolding diseases. Curr Opin Struct Biol. 2008, 18 (6): 771-779. 10.1016/j.sbi.2008.10.001.View ArticlePubMed
- Prusiner SB, Scott MR, DeArmond SJ, Cohen FE: Prion protein biology. Cell. 1998, 93: 337-348. 10.1016/S0092-8674(00)81163-0.View ArticlePubMed
- Karran E, Mercken M, De Strooper B: The amyloid cascade hypothesis for Alzheimer’s disease: an appraisal for the development of therapeutics. Nat Rev Drug Discov. 2011, 10 (9): 698-712. 10.1038/nrd3505.View ArticlePubMed
- Ross CA, Poirier MA: Protein aggregation and neurodegenerative disease. Nat Med. 2004, 10 (Suppl): S10-S17.View ArticlePubMed
- Faux NG, Bottomley SP, Lesk AM, Irving JA, Morrison JR, de la Banda MG, Whisstock JC: Functional insights from the distribution and role of homopeptide repeat-containing proteins. Genome Res. 2005, 15 (4): 537-551. 10.1101/gr.3096505.PubMed CentralView ArticlePubMed
- Dorsman JC, Pepers B, Langenberg D, Kerkdijk H, Ijszenga M, den Dunnen JT, Roos RA, van Ommen GJ: Strong aggregation and increased toxicity of polyleucine over polyglutamine stretches in mammalian cells. Hum Mol Genet. 2002, 11 (13): 1487-1496. 10.1093/hmg/11.13.1487.View ArticlePubMed
- Fandrich M, Dobson CM: The behaviour of polyamino acids reveals an inverse side chain effect in amyloid structure formation. EMBO J. 2002, 21 (21): 5682-5690. 10.1093/emboj/cdf573.PubMed CentralView ArticlePubMed
- Halfmann R, Alberti S, Krishnan R, Lyle N, O’Donnell CW, King OD, Berger B, Pappu RV, Lindquist S: Opposing effects of glutamine and asparagine govern prion formation by intrinsically disordered proteins. Mol Cell. 2011, 43 (1): 72-84. 10.1016/j.molcel.2011.05.013.PubMed CentralView ArticlePubMed
- Andresen JM, Gayan J, Djousse L, Roberts S, Brocklebank D, Cherny SS, Cardon LR, Gusella JF, MacDonald ME, Myers RH: The relationship between CAG repeat length and age of onset differs for Huntington’s disease patients with juvenile onset or adult onset. Ann Hum Genet. 2007, 71 (Pt 3): 295-301.View ArticlePubMed
- Choudhry S, Mukerji M, Srivastava AK, Jain S, Brahmachari SK: CAG repeat instability at SCA2 locus: anchoring CAA interruptions and linked single nucleotide polymorphisms. Hum Mol Genet. 2001, 10 (21): 2437-2446. 10.1093/hmg/10.21.2437.View ArticlePubMed
- Saunders HM, Bottomley SP: Multi-domain misfolding: understanding the aggregation pathway of polyglutamine proteins. Protein Eng Des Sel. 2009, 22 (8): 447-451. 10.1093/protein/gzp033.View ArticlePubMed
- Ross ED, Baxa U, Wickner RB: Scrambled prion domains form prions and amyloid. Mol Cell Biol. 2004, 24 (16): 7206-7213. 10.1128/MCB.24.16.7206-7213.2004.PubMed CentralView ArticlePubMed
- Ross ED, Edskes HK, Terry MJ, Wickner RB: Primary sequence independence for prion formation. Proc Natl Acad Sci U S A. 2005, 102 (36): 12825-12830. 10.1073/pnas.0506136102.PubMed CentralView ArticlePubMed
- Toombs JA, McCarty BR, Ross ED: Compositional determinants of prion formation in yeast. Mol Cell Biol. 2010, 30 (1): 319-332. 10.1128/MCB.01140-09.PubMed CentralView ArticlePubMed
- Bryan AW, Menke M, Cowen LJ, Lindquist SL, Berger B: BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysis. PLoS Comput Biol. 2009, 5 (3): e1000333-10.1371/journal.pcbi.1000333.PubMed CentralView ArticlePubMed
- Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L: Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol. 2004, 22 (10): 1302-1306. 10.1038/nbt1012.View ArticlePubMed
- Trovato A, Seno F, Tosatto SC: The PASTA server for protein aggregation prediction. Protein Eng Des Sel. 2007, 20 (10): 521-523. 10.1093/protein/gzm042.View ArticlePubMed
- Zibaee S, Makin OS, Goedert M, Serpell LC: A simple algorithm locates beta-strands in the amyloid fibril core of alpha-synuclein, Abeta, and tau using the amino acid sequence alone. Protein Sci. 2007, 16 (5): 906-918. 10.1110/ps.062624507.PubMed CentralView ArticlePubMed
- Pawar AP, Dubay KF, Zurdo J, Chiti F, Vendruscolo M, Dobson CM: Prediction of “aggregation-prone” and “aggregation-susceptible” regions in proteins associated with neurodegenerative diseases. J Mol Biol. 2005, 350 (2): 379-392. 10.1016/j.jmb.2005.04.016.View ArticlePubMed
- Michelitsch MD, Weissman JS: A census of glutamine/asparagine-rich regions: implications for their conserved function and the prediction of novel prions. Proc Natl Acad Sci U S A. 2000, 97 (22): 11910-11915. 10.1073/pnas.97.22.11910.PubMed CentralView ArticlePubMed
- Harrison PM, Gerstein M: A method to assess compositional bias in biological sequences and its application to prion-like glutamine/asparagine-rich domains in eukaryotic proteomes. Genome Biol. 2003, 4 (6): R40-10.1186/gb-2003-4-6-r40.PubMed CentralView ArticlePubMed
- Alberti S, Halfmann R, King O, Kapila A, Lindquist S: A systematic survey identifies prions and illuminates sequence features of prionogenic proteins. Cell. 2009, 137 (1): 146-158. 10.1016/j.cell.2009.02.044.PubMed CentralView ArticlePubMed
- Toombs JA, Petri M, Paul KR, Kan GY, Ben-Hur A, Ross ED: De novo design of synthetic prion domains. Proc Natl Acad Sci U S A. 2012, 109 (17): 6519-6524. 10.1073/pnas.1119366109.PubMed CentralView ArticlePubMed
- Consortium TU: The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 2010, 38 (Database issue): D142-D148.View Article
- Rose PW, Beran B, Bi C, Bluhm WF, Dimitropoulos D, Goodsell DS, Prlic A, Quesada M, Quinn GB, Westbrook JD: The RCSB Protein Data Bank: redesigned web site and web services. Nucleic Acids Res. 2011, 39 (Database issue): D392-D401.PubMed CentralView ArticlePubMed
- Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN: DisProt: the Database of Disordered Proteins. Nucleic Acids Res. 2007, 35 (Database issue): D786-D793.PubMed CentralView ArticlePubMed
- NaMS G: Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: Statistical significance and interpretation. Q J Roy Meteorol Soc. 2002, 128: 2145-2166. 10.1256/003590002320603584.View Article
- Storey J: The positive false discovery rate: A Bayesian interpretation and the q-value. Ann Stat. 2003, 31 (6): 2013-2035. 10.1214/aos/1074290335.View Article
- Benjamini YYD: The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001, 29 (4): 1165-1188. 10.1214/aos/1013699998.View Article
- Eichinger L, Pachebat JA, Glockner G, Rajandream MA, Sucgang R, Berriman M, Song J, Olsen R, Szafranski K, Xu Q: The genome of the social amoeba Dictyostelium discoideum. Nature. 2005, 435 (7038): 43-57. 10.1038/nature03481.PubMed CentralView ArticlePubMed
- Pizzi E, Frontali C: Low-complexity regions in Plasmodium falciparum proteins. Genome Res. 2001, 11 (2): 218-229. 10.1101/gr.GR-1522R.PubMed CentralView ArticlePubMed
- Nishizawa M, Nishizawa K: Local-scale repetitiveness in amino acid use in eukaryote protein sequences: a genomic factor in protein evolution. Proteins. 1999, 37 (2): 284-292. 10.1002/(SICI)1097-0134(19991101)37:2<284::AID-PROT13>3.0.CO;2-4.View ArticlePubMed
- Golding GB: Simple sequence is abundant in eukaryotic proteins. Protein Sci. 1999, 8 (6): 1358-1361. 10.1110/ps.8.6.1358.PubMed CentralView ArticlePubMed
- Moore RA, Taubner LM, Priola SA: Prion protein misfolding and disease. Curr Opin Struct Biol. 2009, 19 (1): 14-22. 10.1016/j.sbi.2008.12.007.PubMed CentralView ArticlePubMed
- Pastore A, Zagari A: A structural overview of the vertebrate prion proteins. Prion. 2007, 1 (3): 185-197. 10.4161/pri.1.3.5281.PubMed CentralView ArticlePubMed
- Eisenberg D, Nelson R, Sawaya MR, Balbirnie M, Sambashivan S, Ivanova MI, Madsen AO, Riekel C: The structural biology of protein aggregation diseases: Fundamental questions and some answers. Acc Chem Res. 2006, 39 (9): 568-575. 10.1021/ar0500618.PubMed CentralView ArticlePubMed
- Halfmann R, Lindquist S: Screening for amyloid aggregation by semi-denaturing detergent-agarose gel electrophoresis. J Vis Exp. 2008, 17: e838-10.3791/838.
- Tanaka M, Collins SR, Toyama BH, Weissman JS: The physical basis of how prion conformations determine strain phenotypes. Nature. 2006, 442 (7102): 585-589. 10.1038/nature04922.View ArticlePubMed
- Sondheimer N, Lindquist S: Rnq1: an epigenetic modifier of protein function in yeast. Mol Cell. 2000, 5 (1): 163-172. 10.1016/S1097-2765(00)80412-8.View ArticlePubMed
- Ross ED, Minton A, Wickner RB: Prion domains: sequences, structures and interactions. Nat Cell Biol. 2005, 7 (11): 1039-1044. 10.1038/ncb1105-1039.View ArticlePubMed
- Linding R, Schymkowitz J, Rousseau F, Diella F, Serrano L: A comparative study of the relationship between protein structure and beta-aggregation in globular and intrinsically disordered proteins. J Mol Biol. 2004, 342 (1): 345-353. 10.1016/j.jmb.2004.06.088.View ArticlePubMed
- Eddy SR: Hidden Markov models. Curr Opin Struct Biol. 1996, 6 (3): 361-365. 10.1016/S0959-440X(96)80056-X.View ArticlePubMed
- Buckland MaG F: The relationship between recall and precision. J Am Soc Inf Sci Technol. 1994, 45 (1): 12-19. 10.1002/(SICI)1097-4571(199401)45:1<12::AID-ASI2>3.0.CO;2-L.View Article
- He B, Wang K, Liu Y, Xue B, Uversky VN, Dunker AK: Predicting intrinsic disorder in proteins: an overview. Cell Res. 2009, 19 (8): 929-949. 10.1038/cr.2009.87.View ArticlePubMed
- Eliezer D: Biophysical characterization of intrinsically disordered proteins. Curr Opin Struct Biol. 2009, 19 (1): 23-30. 10.1016/j.sbi.2008.12.004.PubMed CentralView ArticlePubMed
- Dunker AK, Silman I, Uversky VN, Sussman JL: Function and structure of inherently disordered proteins. Curr Opin Struct Biol. 2008, 18 (6): 756-764. 10.1016/j.sbi.2008.10.002.View ArticlePubMed
- Peng K, Radivojac P, Vucetic S, Dunker AK, Obradovic Z: Length-dependent prediction of protein intrinsic disorder. BMC Bioinforma. 2006, 7: 208-10.1186/1471-2105-7-208.View Article
- Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK: Sequence complexity of disordered protein. Proteins. 2001, 42 (1): 38-48. 10.1002/1097-0134(20010101)42:1<38::AID-PROT50>3.0.CO;2-3.View ArticlePubMed
- Romero P, Obradovic Z, Kissinger C, Villafranca JE, Dunker AK: Identifying disordered regions in proteins from amino acid sequence. Neural Netw. 1997, 1: 90-95.
- Munch C, Bertolotti A: Propagation of the prion phenomenon: beyond the seeding principle. J Mol Biol. 2012, 421 (4–5): 491-498.View ArticlePubMed
- Brundin P, Melki R, Kopito R: Prion-like transmission of protein aggregates in neurodegenerative diseases. Nat Rev Mol Cell Biol. 2010, 11 (4): 301-307. 10.1038/nrm2873.PubMed CentralView ArticlePubMed
- Hansen C, Angot E, Bergstrom AL, Steiner JA, Pieri L, Paul G, Outeiro TF, Melki R, Kallunki P, Fog K: alpha-Synuclein propagates from mouse brain to grafted dopaminergic neurons and seeds aggregation in cultured human cells. J Clin Invest. 2011, 121 (2): 715-725. 10.1172/JCI43366.PubMed CentralView ArticlePubMed
- Meyer-Luehmann M, Coomaraswamy J, Bolmont T, Kaeser S, Schaefer C, Kilger E, Neuenschwander A, Abramowski D, Frey P, Jaton AL: Exogenous induction of cerebral beta-amyloidogenesis is governed by agent and host. Science. 2006, 313 (5794): 1781-1784. 10.1126/science.1131864.View ArticlePubMed
- Ren PH, Lauckner JE, Kachirskaia I, Heuser JE, Melki R, Kopito RR: Cytoplasmic penetration and persistent infection of mammalian cells by polyglutamine aggregates. Nat Cell Biol. 2009, 11 (2): 219-225. 10.1038/ncb1830.PubMed CentralView ArticlePubMed
- Nerelius C, Sandegren A, Sargsyan H, Raunak R, Leijonmarck H, Chatterjee U, Fisahn A, Imarisio S, Lomas DA, Crowther DC: Alpha-helix targeting reduces amyloid-beta peptide toxicity. Proc Natl Acad Sci U S A. 2009, 106 (23): 9191-9196. 10.1073/pnas.0810364106.PubMed CentralView ArticlePubMed
- Bartels T, Choi JG, Selkoe DJ: alpha-Synuclein occurs physiologically as a helically folded tetramer that resists aggregation. Nature. 2011, 477 (7362): 107-110. 10.1038/nature10324.PubMed CentralView ArticlePubMed
- True HL, Berlin I, Lindquist SL: Epigenetic regulation of translation reveals hidden genetic variation to produce complex traits. Nature. 2004, 431 (7005): 184-187. 10.1038/nature02885.View ArticlePubMed
- True HL, Lindquist SL: A yeast prion provides a mechanism for genetic variation and phenotypic diversity. Nature. 2000, 407 (6803): 477-483. 10.1038/35035005.View ArticlePubMed
- Masel J, Siegal ML: Robustness: mechanisms and consequences. Trends Genet. 2009, 25 (9): 395-403. 10.1016/j.tig.2009.07.005.PubMed CentralView ArticlePubMed
- Namy O, Galopier A, Martini C, Matsufuji S, Fabret C, Rousset JP: Epigenetic control of polyamines by the prion [PSI+]. Nat Cell Biol. 2008, 10 (9): 1069-1075. 10.1038/ncb1766.View ArticlePubMed
- Patino MM, Liu JJ, Glover JR, Lindquist S: Support for the prion hypothesis for inheritance of a phenotypic trait in yeast. Science. 1996, 273 (5275): 622-626. 10.1126/science.273.5275.622.View ArticlePubMed
- Si K, Lindquist S, Kandel ER: A neuronal isoform of the aplysia CPEB has prion-like properties. Cell. 2003, 115 (7): 879-891. 10.1016/S0092-8674(03)01020-1.View ArticlePubMed
- Heinrich SU, Lindquist S: Protein-only mechanism induces self-perpetuating changes in the activity of neuronal Aplysia cytoplasmic polyadenylation element binding protein (CPEB). Proc Natl Acad Sci U S A. 2011, 108 (7): 2999-3004. 10.1073/pnas.1019368108.PubMed CentralView ArticlePubMed
- Banerjee P, Schoenfeld BP, Bell AJ, Choi CH, Bradley MP, Hinchey P, Kollaros M, Park JH, McBride SM, Dockendorff TC: Short- and long-term memory are modulated by multiple isoforms of the fragile X mental retardation protein. J Neurosci. 2010, 30 (19): 6782-6792. 10.1523/JNEUROSCI.6369-09.2010.PubMed CentralView ArticlePubMed
- Sucgang R, Kuo A, Tian X, Salerno W, Parikh A, Feasley CL, Dalin E, Tu H, Huang E, Barry K: Comparative genomics of the social amoebae Dictyostelium discoideum and Dictyostelium purpureum. Genome Biol. 2011, 12 (2): R20-10.1186/gb-2011-12-2-r20.PubMed CentralView ArticlePubMed
- Muralidharan V, Oksman A, Pal P, Lindquist S, Goldberg DE: Plasmodium falciparum heat shock protein 110 stabilizes the asparagine repeat-rich parasite proteome during malarial fevers. Nat Commun. 2012, 3: 1310-PubMed CentralView ArticlePubMed
- Halfmann R, Alberti S, Lindquist S: Prions, protein homeostasis, and phenotypic diversity. Trends Cell Biol. 2010, 20 (3): 125-133. 10.1016/j.tcb.2009.12.003.PubMed CentralView ArticlePubMed
- Shorter J, Lindquist S: Hsp104 catalyzes formation and elimination of self-replicating Sup35 prion conformers. Science. 2004, 304 (5678): 1793-1797. 10.1126/science.1098007.View ArticlePubMed
- Senechal P, Arseneault G, Leroux A, Lindquist S, Rokeach LA: The Schizosaccharomyces pombe Hsp104 disaggregase is unable to propagate the [PSI] prion. PLoS One. 2009, 4 (9): e6939-10.1371/journal.pone.0006939.PubMed CentralView ArticlePubMed
- Zenthon JF, Ness F, Cox B, Tuite MF: The [PSI+] prion of Saccharomyces cerevisiae can be propagated by an Hsp104 orthologue from Candida albicans. Eukaryot Cell. 2006, 5 (2): 217-225. 10.1128/EC.5.2.217-225.2006.PubMed CentralView ArticlePubMed
- Stelzer G, Dalah I, Stein TI, Satanower Y, Rosen N, Nativ N, Oz-Levi D, Olender T, Belinky F, Bahir I: In-silico human genomics with GeneCards. Hum Genomics. 2011, 5 (6): 709-717. 10.1186/1479-7364-5-6-709.PubMed CentralView ArticlePubMed
- Otzen D, Nielsen PH: We find them here, we find them there: functional bacterial amyloid. Cell Mol Life Sci. 2008, 65 (6): 910-927. 10.1007/s00018-007-7404-4.View ArticlePubMed
- Lundmark K, Westermark GT, Olsen A, Westermark P: Protein fibrils in nature can enhance amyloid protein A amyloidosis in mice: Cross-seeding as a disease mechanism. Proc Natl Acad Sci U S A. 2005, 102 (17): 6098-6102. 10.1073/pnas.0501814102.PubMed CentralView ArticlePubMed
- Couthouis J, Hart MP, Shorter J, DeJesus-Hernandez M, Erion R, Oristano R, Liu AX, Ramos D, Jethava N, Hosangadi D: A yeast functional screen predicts new candidate ALS disease genes. Proc Natl Acad Sci U S A. 2011, 108 (52): 20881-20890. 10.1073/pnas.1109434108.PubMed CentralView ArticlePubMed
- Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE: The ASTRAL Compendium in 2004. Nucleic Acids Res. 2004, 32 (Database issue): D189-D192.PubMed CentralView ArticlePubMed
- Team RDC: R Foundation for Statistical Computing. R: A language and environment for statistical computing. 2008, Vienna Austria, http://www.r-project.org/,
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25 (1): 25-29. 10.1038/75556.PubMed CentralView ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.