Origins of amyloid-β
© Tharp and Sarkar; licensee BioMed Central Ltd. 2013
Received: 16 November 2012
Accepted: 24 April 2013
Published: 30 April 2013
Skip to main content
© Tharp and Sarkar; licensee BioMed Central Ltd. 2013
Received: 16 November 2012
Accepted: 24 April 2013
Published: 30 April 2013
Amyloid-β plaques are a defining characteristic of Alzheimer Disease. However, Amyloid-β deposition is also found in other forms of dementia and in non-pathological contexts. Amyloid-β deposition is variable among vertebrate species and the evolutionary emergence of the amyloidogenic property is currently unknown. Evolutionary persistence of a pathological peptide sequence may depend on the functions of the precursor gene, conservation or mutation of nucleotides or peptide domains within the precursor gene, or a species-specific physiological environment.
In this study, we asked when amyloidogenic Amyloid-β first arose using phylogenetic trees constructed for the Amyloid-β Precursor Protein gene family and by modeling the potential for Amyloid-β aggregation across species in silico. We collected the most comprehensive set of sequences for the Amyloid-β Precursor Protein family using an automated, iterative meta-database search and constructed a highly resolved phylogeny. The analysis revealed that the ancestral gene for invertebrate and vertebrate Amyloid-β Precursor Protein gene families arose around metazoic speciation during the Ediacaran period. Synapomorphic frequencies found domain-specific conservation of sequence. Analyses of aggregation potential showed that potentially amyloidogenic sequences are a ubiquitous feature of vertebrate Amyloid-β Precursor Protein but are also found in echinoderm, nematode, and cephalochordate, and hymenoptera species homologues.
The Amyloid-β Precursor Protein gene is ancient and highly conserved. The amyloid forming Amyloid-β domains may have been present in early deuterostomes, but more recent mutations appear to have resulted in potentially unrelated amyoid forming sequences. Our results further highlight that the species-specific physiological environment is as critical to Amyloid-β formation as the peptide sequence.
The Amyloid-β Precursor Protein (AβPP, APP) has been intensively studied due to its role in the generation of pathogenic cortical plaques in Alzheimer Disease . It belongs to a gene family with deep evolutionary origins and is a member of a highly conserved protein family of type-1 transmembrane proteins [2–4]. The AβPP family consists of up to three homologues in vertebrate species: AβPP, amyloid precursor like protein 1 (APLP-1), and amyloid precursor like protein 2 (APLP-2). Invertebrate species genomes encode a single homologue referred to as either amyloid precursor like 1 protein (APL-1) or AβPP-like 1 protein (APPL-1). Vertebrates and flatworms exhibit ubiquitous expression of at least one member of the AβPP family, while fruit flies express APPL-1 only in neurons. In all species, AβPP proteins are cleaved into multiple peptides and fragments by a series of proteases, but only vertebrate AβPP contains the sequence coding the pathological Amyloid-β (Aβ) peptide fragment.
The β-fold intrinsic to amyloid formation is a commonly observed biochemical property [5–7]. Amyloid formation is observed in non-pathological contexts from an efficient steric mechanism for storage of small peptide hormones to rudimentary forms of biological compartmentalization [5, 8]. The neuropathological changes observed in the brains of patients with Alzheimer Disease led to the formation of the Amyloid Hypothesis, which implicates both extracellular deposits of Aβ fibrils and low-order intracellular Aβ oligomers in the disruption of neuronal function, distortion of neural architecture, and induction of inflammation .
Mutations in the AβPP sequence and in associated proteases have been independently associated with familial early onset Alzheimer Disease characterized by rapidly progressive dementia and heavy Aβ plaque burden . Recently, a protective mutation in AβPP reducing the formation of Aβ was identified . However, >95% of sporadic Alzheimer Disease exhibits no mutation in the AβPP gene sequence. Further, deposition of Aβ is not limited to Alzheimer Disease. Aβ plaques have been observed in vascular dementias, Lewy body dementia, and Parkinson Disease with dementia, as well as in the brains of aged individuals without any cognitive deficits [11–14]. Together, these studies indicate that while the sequence of Aβ can contribute to the progression and severity of disease factors regulating the production by proteolysis and the degradation and clearance of Aβ, it also plays a critical role in generation of Aβ pathology.
Beyond the eponymous production of Aβ, AβPP processing produces other active peptides with functions ranging from hemostatic modulators to trophic factors to pro-apoptotic proteins [15–18]. There is a substantial body of knowledge focusing on the neural impacts of AβPP and Aβ. However, this family of proteins is also widely expressed in peripheral tissues of vertebrate species including skin, skeletal muscle, leukocytes, platelets, intestinal epithelia, pancreas, and adipose tissue. The function and regulation of non-neuronal AβPP are not fully understood [19–26].
The AβPP family is variably essential for viability among species. Experimental data show that the N-terminus of APL-1 is necessary for progression through molting stages by nematodes . The C-terminus of at least one member of the AβPP family is necessary for viability in early parturition of knockout mouse models [28–30]. Drosophila models without APPL-1 show subtle neuronal patterning defects but remain viable and able to reproduce . Zebrafish knockout models have impaired body development and synaptogenesis [32, 33]. Each of these models can be rescued by expression of truncated portions of AβPPs, indicating that absences of different domains are responsible for the observed lethality or defects in each model. Thus, the persistence of this protein family appears domain-dependent among species despite high evolutionary conservation of the entire gene.
Previous phylogenetic studies showed that this ancient protein family has been widely distributed among multicellular eukaryotes since at least the divergence of protostomia and deuterostomia . These studies and corresponding conclusions are based on at most ten sequences that were trimmed and concatenated, focusing solely on the major conserved domains (D1, D2, and D3 only). Use of trimmed sequences does yield cleaner sequence alignments and better branch supports on the phylogenetic tree, but ignores potentially valuable evolutionary data encoded in adjoining regions. For the AβPP gene family in particular, the omission of Aβ from the analyses occludes an understanding of the evolution of the pathological eponymous domain. Despite wide distribution of the AβPP family across species, it is not know when amyloidogenic Aβ peptides first evolved. This study uses the full complement of available molecular sequence data to provide an in silico model of the evolutionary history of this essential gene family and the origin of the Aβ peptide.
Amino acid and nucleotide sequences were collected using an automated, iterative search method from Entrez Protein (GenPept) and Entrez Nucleotide (GenBank) (see Methods and Additional file 1: Table S1). Character matrices were generated in Mesquite 2.75 and aligned using Muscle 3.8.31) and the longest sequence for each species’ homologue(s) was retained [37, 38]. Amino acid and nucleotide trees were generated under maximum parsimony using TNT 1.1 and Bayesian inference using MrBayes 3.2 [39, 40].
The presence of an AβPP-like sequence in hydra (Hydra magnipapillata) and sea anemone (Nematostella vectensis) genomes suggests that the ancestral gene arose around metazoic divergence in the Ediacaran period, between 630–540 million years ago (Mya). No related sequences from single-celled organisms were found. A single member of the gene family has persisted across invertebrate species with a major divergence around the evolution of arthropods during the Cambrian period giving rise to APPL-1 (~500 Mya). Two gene duplication events occurred during the evolution of vertebrate species. Our search recovered a single AβPP-like gene for the cephalochordate lancelet (Branchiostomidae floridae) genome that was more closely related to mollusks and cnidarians than vertebrate sequences. The cartilaginous ray (Narke japonica) genome contains a single AβPP gene with high homology to human AβPP. The results indicate that AβPP and APLP-2 genes are present in the zebrafish (Danio rerio) but only AβPP was recovered for other members of class Osteichthyes (Takifugu rubripes, Tetraodon fluviatilis, and Perca flavescens). The majority of tetrapod genomes in this study contained all three members of the vertebrate AβPP gene family. Sequences for all three genes were found for Xenopus species, but APLP-1 sequences were not found for any members of class Aves or Reptilia.
Within the gene family, the nucleotide sequence phylogenetic trees (Figure 2a) indicate that AβPP and APLP-2 are more closely related than APLP-1 and APLP-2. Furthermore, the placement in the nucleotide phylogenetic tree suggests that APLP-1 may be the original vertebrate sequence. However, placement of the AβPP branch on the nucleotide tree is weakly supported under both maximum parsimony (47% resampling support, 12% relative Bremer; see Additional file 1: Figure S2a, b) and Bayesian inference (60% posterior probability; see Additional file 1: Figure S1a). In the amino acid sequence phylogenetic trees, APLP-1 and APLP-2 are more closely related and AβPP appears to be the original vertebrate peptide (Figure 2b). This arrangement has higher support for the placement of AβPP (65% resampling support, 100% relative Bremer support [maximum parsimony] Additional file 1: Figure S2c, d; 100% posterior probability [Bayesian inference]; Additional file 1: Figure S1b).
The variability in the essential nature of the AβPP gene family can be observed by analyzing the evolutionary differences between related genes and shared residues according to specific functional domains. This was accomplished using synapomorphy frequency histograms. A synapomorphy is a trait or character shared by sister taxa of a clade that was derived from a previous common ancestor but not shared by taxa from another clade. Thus, synapomorphies contribute to the topology of a phylogenetic tree as factors in defining nodes on the tree . Using the TNT program we collected synapomorphies present at each node of the consensus amino acid tree and examined the frequency of synapomorphy for each character across the sequence matrix. High frequencies of synapomorphy indicate residue changes at a given position make large contributions to the topology of the phylogenetic tree (conversely, low frequencies on the plots are associated with highly conserved domains/characters present in all terminal taxa groupings on the tree).
Synapomorphic frequencies within conserved domains
N-Terminal Signal Peptide
Synapomorphic frequencies of domains within each branch
N-Terminal Signal Peptide
Deposition of Aβ has been well documented in mammals; the sequence is generally >95% identical across mammals and all vertebrates express β- and γ-secretases [42–44]. The Guinea pig rodent (Cavia porcelus) and the common hare (Oryctolagus cuniculus) have been shown to generate Aβ plaques, but neither the Mus musculus nor Rattus norvegicus rodents naturally produce Aβ plaques [45–48]. Evidence of Aβ accumulation in other vertebrate species is sparse. Deposition of extracellular Aβ has only been documented in one member of class Osteichthyes: Onchyrus sockeye salmon; the sockeye AβPP gene has not been sequenced . While some species of birds may generate Aβ plaques or vascular amyloid deposition, there is no evidence of plaque formation or extracellular deposition in reptiles and amphibians despite >90% sequence homology [47, 50]. No natural invertebrate amyloid-β plaques have been documented. Recently it was shown that the corresponding peptide from Drosophila can form an amyloid in vivo when co-expressed at high levels with the endogenous β-secretase gene .
This study provides the most comprehensive phylogeny of the AβPP gene family based on available data to date. The analysis reveals that the ancestral sequence evolved during metazoic divergence, which is much earlier than previously thought. The results further suggest that AβPP itself was the first vertebrate sequence and that APLP-1 and 2 are likely derived from gene duplication of AβPP.
It is possible that the vertebrate gene family arose as a duplication of APLP-1 followed by a second duplication to form APLP-2 and AβPP. However, it is also possible that the original duplication gave rise to APLP-2 and AβPP after which a duplication of APLP-2 gave rise to APLP-1. The search strategy used in this study found APLP-1 sequences only in tetrapods, AβPP in both cartilaginous and bony fish, and APLP-2 in one bony fish and most tetrapods.
Because the data used in this study were based on in silico search strategies from deposited sequences in public repositories (GenBank and GenPept), it cannot be assumed that these data are necessarily complete for each species (i.e., a de novo sequencing was not performed for each species studied). Nonetheless, these data support the hypotheses that AβPP is the ancestral sequence for vertebrates, gene duplication after the speciation of cartilaginous and bony fish gave rise to APLP-2, and a subsequent partial or degenerate duplication of APLP-2 following the speciation of tetrapods gave rise to APLP-1. Some species may have subsequently lost either APLP-1 or APLP-2 genes.
The sequence difference in Mus musculus and Rattus norvegicus results from three amino acid substitutions from three single nucleotide changes. Whether the lack of amyloidogenesis in these particular rodents comes from these three changes or from other physiological considerations is unclear, but the presence of identical sequence in other rodents and mammals in general suggests that the ancestral species to mice and rats evolved around amyloidogenic Aβ. The lack of data on Aβ deposition in fish, birds, reptiles, and amphibians also suggests unknown physiological adaptations may limit Aβ production or deposition. Recently a mutation encoding a change from alanine to threonine at position 673 of AβPP was found to be protective against developing Alzheimer Disease, likely through reduction of β-secretase processing at that site . It is interesting to note that all fish sequences in this study, with the exception of Danio rerio, have a threonine at this position, suggesting β-secretase processing may be reduced in these animals. In addition to processes that may increase or decrease Aβ production by regulating secretase efficiency or transcription, the presence of a β-secretase in the gene repertoire is an important consideration.
A whole genome assembly for Nematostella vectensis indicate the presence of the secretases but no studies have examined amyloid formation . A genome for Hydra magnipapillata predicted the presence of a γ-secretase, but not a β-secretase (REFSEQ NW_002165109). Experimental evidence suggests that the nematode Caenorhabditis elegans does not express a β-secretase, although both α- and γ-secretases have been identified . A search of Entrez Nucleotide returned no β-secretase sequences for other nematodes, crustaceans, hymenoptera, or lepidoptera in our dataset.
The increased understanding of disease genetics and increasing availability of molecular sequence data provide an opportunity to harness evolutionary approaches to provide deep insights pertaining to the etiology of disease. Using this approach we found the AβPP family to have origins in the speciation of the metazoic lineage and propose that ancestral Aβ may have arisen as deuterostomia and protostomia diverged. However, other mutations may continue to produce amyloidogenic sequences in this domain, as seen with Drosophila or unknown physiological factors may play a role in preventing Aβ formation as in mice and rats. The approach developed here may be widely applicable to the study of other critical disease genes and builds a foundation for further studies on the co-evolution of Alzheimer Disease associated proteins (e.g., co-evolution of ApoE or β-secretase with AβPP) that may yield novel approaches to treating or preventing Aβ formation.
Amino acid sequences were collected through Entrez Protein using a combination of search terms and sequence similarity searches. First, based on previous studies of sequences from the Amyloid-β Precursor Protein [2–4, 36] family five sets of metadata-based search terms developed and used to identify those sequences from across the Amyloid-β Precursor Protein family: (1) "App"[gene name] AND "animals"[porgn:__txid33208]; (2) "aplp1"[gene name] and "animals"[porgn:__txid33208]; (3) "aplp2"[gene name] and "animals"[porgn:__txid33208]; (4) "apl-1"[gene name] and "nematodes"[porgn:__txid6231]; and (5) "app_amyloid". Sequences for which the organism was either “Unknown” or listed as a “synthetic construct” were removed. Next, a stringent (E-value = 0.0) blastp (BLAST+ v.2.2.26) was used to search Entrez Protein for potential orthologous amino acid sequences for each of the sequences identified in the metadata-based search from the non-redundant protein database. An additional stringent blastp search was then done iteratively for each new sequence identified, until no additional sequences were found. The resulting dataset (which contained 435 sequences) was then subjected to multiple sequence alignment using MUSCLE v.3.8.31 . The multiple sequence alignment was manually inspected (by viewing the data in Mesquite 2.75 ) to identify the one longest representative sequence per taxon (e.g., only the sequence for human AβPP770 which contains all transcribed and translated exons was kept). As sequences were removed from the dataset, the multiple sequence alignment was redone. The resulting dataset reflected 103 taxa corresponding to 67 species. Based on identifiers within GenPept records, corresponding nucleic acid sequences were then collected for each amino acid sequence. These nucleotide sequences were also subjected to multiple sequence alignment using MUSCLE. Character maps were generated using the Mesquite character matrices.
Trees were obtained by maximum parsimony using TNT 1.1 and Bayesian inference using MrBayes 3.2.0 [39, 40]. For analyses in TNT, the ‘aquickie.run’ script was used to guide the search, which aimed to find the optimal score 20 times independently, using defaults of "xmult" plus 10 cycles of tree-drifting. This resulted in 131 nucleotide trees from more than 8x108 rearrangements and 103714 amino acid trees from more than 7x108 rearrangements. For consensus tree calculation, trees were TBR-collapsed, Bremer group supports calculated by TBR-swapping, and bootstrap resampling by 100 replications of symmetric resampling with a single random addition (see Additional file 1).
For MrBayes, the Metropolis-coupled Markov chain Monte Carlo analysis was set for 2 runs with 4 chains each with a temperature of 0.2 degrees. A General Time Reversible (GTR) model with a Dirichlet (flat) probability distribution of nucleotide rate change parameters, stationary nucleotide frequencies, no specified shape parameter for the gamma distribution of rate variation, and no invariable sites was used for the nucleotide analyses; this is the default prior model for nucleotide matrices in MrBayes. All runs favored the WAG rate matrix as the prior model for the amino acid analyses .
Markov Chain analysis was continued until the runs converged, when the standard deviation of the split frequencies remained <0.01 and likelihood analysis found the potential scale reduction factor approached 1.0 . For the nucleotide modeling this took more than 3x106 generations; for the protein analysis this took more than 2x106 generations. Consensus trees were constructed using the 50% majority rule with 95% cumulative posterior probability from 925 nucleotide trees and 1,591 amino acid trees (see Additional file 1). All tree diagrams were generated in either Dendroscope 3.1.0 or FigTree 1.3.1 [59, 60].
Unambiguous synapomorphies at each node were generated in TNT for the maximum parsimony consensus trees. The frequency of a given character being synapomorphic at a given node was examined for the entire amino acid tree and for each of the five major branches. Probabilistic models of synapomorphy have been developed to address the confounding of homoplasy and lend statistical support to defining a character as synapomorphic as opposed to homoplasious . While these are important considerations for higher resolution analysis of a gene family, use of simple statistical analysis for such a large and diverse dataset is a reasonable approach to defining areas of conservation or change, accepting internal error for random mutation producing homoplasy or loss of an actual synapomorphy.
There are a number of programs available for modeling β-folding and aggregation of amyloidogenic peptides . AmylPred is a consensus tool that predicts β-folding and aggregation based on a set of five published methods and uses agreement of 2 or more methods for determining consensus . PASTA predicts stabilizing sequences in β-fibrillar structures using a calculation of the change of energy from pairing between amino acid sequences . Regions that are known to form ordered β-fibril structures have a PASTA energy less than – 4. Using aligned amino acid sequences coded by Homo sapiens AβPP exons 16 and 17, we examined the corresponding βA4 region across all taxa and used known secretase cleavage sites to determine the aligned sequences for submission to AmylPred and PASTA [62–64]. Where cleavage sites are not known from previous studies, boundaries were chosen based on similar species and sequences. In cases where there was no clear similarity, boundaries were extended to correspond with Homo sapiens Aβ42. PASTA energies were collected until greater than – 2 by sequential truncation of the C-terminus for each sequence.
APP, Amyloid-β Precursor Protein
Amyloid-β Precursor Protein-like 1 protein
Amyloid precursor like protein 1
Amyloid precursor like protein 2
Amyloid precursor like 1 protein
Basolateral sorting signal
Million years ago.
This work was supported in part by a grant to I.N.S. from the National Library of Medicine (R01 LM009725). W.G.T. is supported by an individual fellowship award from the National Institute of Diabetes and Digestive and Kidney Diseases (F30 DK084605).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.