Evolutionary and structural annotation of disease-associated mutations in human aminoacyl-tRNA synthetases

Background Mutation(s) in proteins are a natural byproduct of evolution but can also cause serious diseases. Aminoacyl-tRNA synthetases (aaRSs) are indispensable components of all cellular protein translational machineries, and in humans they drive translation in both cytoplasm and mitochondria. Mutations in aaRSs have been implicated in a plethora of diseases including neurological conditions, metabolic disorders and cancer. Results We have developed an algorithmic approach for genome-wide analyses of sequence substitutions that combines evolutionary, structural and functional information. This pipeline enabled us to super-annotate human aaRS mutations and analyze their linkage to health disorders. Our data suggest that in some but not all cases, aaRS mutations occur in functional and structural sectors where they can manifest their pathological effects by altering enzyme activity or causing structural instability. Further, mutations appear in both solvent exposed and buried regions of aaRSs indicating that these alterations could lead to dysfunctional enzymes resulting in abnormal protein translation routines by affecting inter-molecular interactions or by disruption of non-bonded interactions. Overall, the prevalence of mutations is much higher in mitochondrial aaRSs, and the two most often mutated aaRSs are mitochondrial glutamyl-tRNA synthetase and dual localized glycyl-tRNA synthetase. Out of 63 mutations annotated in this work, only 12 (~20%) were observed in regions that could directly affect aminoacylation activity via either binding to ATP/amino-acid, tRNA or by involvement in dimerization. Mutations in structural cores or at potential biomolecular interfaces account for ~55% mutations while remaining mutations (~25%) remain structurally un-annotated. Conclusion This work provides a comprehensive structural framework within which most defective human aaRSs have been structurally analyzed. The methodology described here could be employed to annotate mutations in other protein families in a high-throughput manner. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-1063) contains supplementary material, which is available to authorized users.


Background
Mutation(s) in housekeeping proteins often lead to serious ailments in humans [1]. Analysis of molecular bases of mutations that lead to dysfunctional proteins is an important step towards acquiring a detailed understanding of genetic disorders. Several studies have annotated disease-causing mutations in the human genome [2,3]. The knowledgebase developed through these will be useful in guiding and orienting translational therapeutic research [4]. Aminoacyl-tRNA synthetases (aaRSs) drive cellular protein translation by catalyzing ligation of cognate tRNA with amino-acid for use in ribosomal protein synthesis [5]. The catalytic reaction follows a two step process as follows: AA + ATP → AMP-AA + PP i AMP-AA + tRNA → tRNA AA + AMP In the first step, amino acid (AA) is charged with ATP and pyrophosphate (PP i ) is released. The second step involves charging of cognate tRNA with amino acid and release of AMP. Evolution of a dedicated editing domain in some aaRSs highlights the stringent requirement for fidelity of these reactions [6,7]. In addition to these translational functions, aaRSs also participates in many other important physiological activities such as translational and transcriptional regulation, signal transduction, cell migration, angiogenesis, inflammation, and tumourigenesis [8][9][10]. Indeed, in pathogenic systems, the numerous attributes of aaRSs are just being uncovered [11,12]. Hence, aaRSs with their exquisite range of canonical and noncanonical functions constitute an important subset of proteomes. These enzymes have a modular architecture with separate domains for catalysis, tRNA binding and editing [13]. Based on the domain architecture and tRNA binding modes, aaRSs have been classified into two groups -Class I and II [14]. Class I aaRSs in humans are monomeric except for YRS and WRS that are dimeric. Class II aaRSs in humans are dimeric enzymes except for ARS which is monomeric. A penta-motif ('KMSKS') and a tetra-motif ('HIGH') are the two evolutionarily conserved motifs that mediate ATP binding in Class I enzymes [15][16][17]. Class II aaRSs have three conserved motifs -motif1, motif2, and motif3which facilitate ATP and amino-acid binding to the active site of the enzyme [18].
In humans, except for GRS and KRS, separate set of genes within the nuclear genome encode for cytoplasmic and mitochondrial aaRSs [19]. Cytoplasmic and mitochondrial GRSs are generated from distinct translation initiation sites on the same gene (GARS) [20] while for KRS alternate spliced products of same gene (KARS) undergo differential sub-cellular localization [21]. The human genome lacks gene for mitochondrial QRS and it has been hypothesized that mitochondrial Gln-tRNA Gln is synthesized in two stepsfirst, tRNA Gln is misacylated to Glu-tRNA Gln and second, generation of Gln-tRNA Gln by the action of glutaminyl amidotransferases [22]. Thus, the human genome in total has 37 genes coding for aaRSs. Human mitochondrial aaRS are encoded by nuclear genome and are trafficked to the organelle. Ten mitochondrial and four cytoplasmic aaRSs have been implicated in human diseases so far [23]. The Protein Data Bank (PDB) has structural representatives for all 20 members of aaRS family. For 11 aaRSs, crystallographic structures of human proteins are also available. Further, crystal structures have been solved to elucidate mechanism of interaction of these proteins with other biomolecules like ATP [24], tRNA [25] and other proteins [26]. Owing to the multitude of functions that aaRSs perform, it is no surprise that mutations in these proteins often prove to be deleterious and lead to diseases [27,28].
Over the years, many mutations have been identified in different aaRSs [28]. These mutations results in dysfunctional aaRSs leading to various neurological and metabolic disorder such as Charcot-Marie-Tooth (CMT) disease, Amyotrophic Lateral Sclerosis, cancer, and diabetes [9,28].
The repertoire of sequence and structural information for aaRSs provides an opportunity to investigate structural distribution and functional relevance for mutations in these proteins leading to diseases in humans. In this study we have systematically analyzed all aaRS mutations in cytoplasmic and mitochondrial enzyme copies. We have evaluated disease-associated mutations within aaRSs in context of their structural features and sequence conservation. Properties such as local secondary structure at the site of mutations along with solvent accessibility profiles have been investigated to evaluate potential perturbations caused by mutations. In addition, evolutionary sequence conservation information has been used to annotate all mutant sites. The possibility of mutations to influence intra-and inter-molecular interactions mediated by aaRSs has also been examined. Our methodology can be effectively used to predict whether a particular mutation in an aaRS could directly affect aminoacylation activity or alter some other attribute such as interaction with biomolecules. The results presented advance our understanding of mutation driven pathologies in humans. Further, this study offers a mutation annotation pipeline which is available for academic groups in the form of python scripts. We believe that the methodology outlined here would prove useful for examining mutations within different protein families in a high-throughput manner wherever sequence and structural information is available.

Methods
Throughout the manuscript aaRSs are referred to as 'XRS' where X is the single letter code for corresponding amino acid e.g. alanyl-tRNA synthetase is mentioned as ARS. The gene name for an aaRS is referred to as 'XARS' where X is the single letter code for corresponding amino acid e.g. the gene for alanyl-tRNA synthetase is mentioned as AARS. The gene name for mitochondrial proteins has '2' as suffix e.g. gene names for cytoplasmic and mitochondrial ARS are AARS and AARS2 respectively. Mutations in GARS and KARS genes are discussed under the cytoplasmic aaRSs although these two genes encode for both cytoplasmic and mitochondrial copies of GRS and KRS respectively. The aaRS mutations annotated in this manuscript were retrieved from the literature published till first quarter of 2014. The mutational annotation pipeline ran as follows (Additional file 1: Figure S1): these analyses required sequences of human aaRSs and a list of substitution mutations in each enzyme. Structural homologues for human aaRSs were identified using BLAST searches against PDB. These structures were then used to identify residues participating in inter-molecular interactions such as binding with ATP, amino acid, tRNA and oligomer formation. In addition, secondary structure and solvent accessibility for residues in structural homologues corresponding to mutated residue were calculated. Finally, an annotated pairwise sequence alignment between human aaRS and structural homologue was constructed. Evolutionary conservation at mutational sites was also calculated. In general, we have annotated aaRSs mutations into four categories: (a) those likely to abrogate or disturb ligand or tRNA binding due to direct contacts with substrates/products, (b) those that are part of aaRS structural core and where a change may directly affect enzyme folding/stability, (c) those that occur at protein surfaces which will end up being interfaces during assembly of oligomers, and (d) those that do not fall into any of the above. In case of latter, experimentation and validation is required to understand the mechanistic basis of mutational effects.

Calculation of intermolecular contacts
Inter-molecular contacts between aaRS and their binding partners (e.g. ATP, amino-acid, tRNA, homo-dimeric partner) were calculated using distance-based approaches. Any aaRS residue within 4 Å of the binding partner was considered to be participating in intermolecular contacts. These interacting residues were then used to annotate mutations sites.

Multiple Sequence Alignment (MSA) MSA for each aaRS
Mammalian homologues for each human aaRS were identified in the non-redundant (NR) database using BLAST. Top 100 BLAST hits were selected to generate MSA specific for each aaRS. These alignments were used to evaluate the phylogenetic distribution of residues at mutation location.

MSA for each class of aaRS
Non-redundant datasets of class I and II aaRS crystal structures were prepared with total of 57 and 39 structures respectively. These were prepared with sequence similarity cutoff of~90% using tools available at Protein Data Bank. Using this dataset, structure-based multiple sequence alignment was generated for each enzyme class separately. Structural alignment was calculated using TMalign program [29] as implemented in T-Coffee package [30]. Both class I and II aaRS, independently, have very similar folds within their classesthis allowed generation of a structure-guided sequence profile for each. These structure-based MSAs were used to calculate residue conservation score at different positions in the sequence using blosum62 matrix. Positional conservation in structurebased MSA was divided into four groups -<40%, 40%-60%, 60%-80%, and >80%. Alignment positions with scores that could fall in two intervals were assigned to an interval with identical lower limit e.g. a score of 60% would be classified under 60-80% interval.

Pair-wise sequence alignment
Human aaRS sequences were aligned with sequences of 3D structures using T-Coffee [30]. Sites of mutations were highlighted in the query sequences using in-house python scripts. The interacting residues in aaRS identified above were highlighted in the sequence data. Residues that emanate from PDB file were color-coded based on conservation scores from the MSA calculated above. Distribution of residues at the site of specific mutation was also calculated from MSA. Human aaRSs lacking crystal structure information were not used for structure-based MSA, and instead the location of corresponding residue from homologous structure in pairwise alignment was used to calculate structural conservation.
Calculations of structural features from tertiary structure Secondary structure Formatted files for human aaRSs were retrieved from PDB, and DSSP program was used to assign secondary structureswhich were classified as α-helix, 3 10 helix, β-strand, turn or unassigned.

Solvent accessibility calculations
Residue solvent accessibility (SA) was calculated using DSSP. Relative solvent accessibility (RSA) for a residue in protein structure is defined as: Based on RSA values, residues were classified as solvent accessible or buried using following criteria. If for a particular residue the value of RSA was >20% then it was considered solvent accessible else buried. PyMol was used for visual analyses of 3D structures [31].

Mutational landscape of cytoplasmic aaRSs
Disease-associated mutations have been identified in four cytoplasmic aaRS enzymes so far [28]. Mutations in LRS/ YRS within class I and ARS/GRS/KRS within class II lead to various human diseases (Table 1 and Figure 1A). Each point mutation in a given cytoplasmic aaRS has been independently associated with disease. Figure 2A shows schematic representation of domain architectures for these. Briefly: Mutations in glycyl-tRNA synthetase that causes neurological disorders A total of 14 substitution mutations have been identified by exome sequencing of GARS gene (Table 1 and Figure 2B). Our phylogenetic and sequence conservation analyses suggest that wild-type residues at mutation sites are highly conserved (Table 2 and Figure 3A). The in-house analysis program output ( Figure 3B) displays annotated pairwise sequence alignment between primary sequence and tertiary structure of human GRS where mutations are highlighted in boxes. Juxtaposition of sites in 3D structure of human GRS shows that mutations are distributed throughout GRS ( Figure 3C). For Asp554Asn change, the corresponding residue is naturally present in mammalian GRS from C. cristata (star-nosed mole). Total of five (of 14) mutations -Pro152Leu, Leu183Pro, Cys211Arg, Pro288Lys and Ile334Phe are directly involved in non-bonded interactions at GRS dimeric interface. Four of these are solvent accessible while Ile334Phe is buried. These observations suggest that Pro152Leu, Leu183Pro, Cys211Arg, Pro288Lys and Ile334Phe substitutions likely affect dimer assembly. Other mutations like Glu125Gly reside in α-helix (H1 in PDB ID 2ZT7) within the catalytic domain of GRS and likely destabilize local secondary structure because of introduction of glycine within α-helical structure. Similarly, the Gly294Arg mutation occurs in a buried location within a β-strand that forms the GRS catalytic domain. Substitution of the smaller glycine residue with a bulky and positively charged arginine may affect GRS folding and stability. Disordered regions within the protein structure have been shown to be critical for mediating interactions with other proteins [32]. The Asp554Asn alteration maps to a disordered region of human GRS but its biochemical effect remains unexplored. Two substitution mutations -Ser635Leu and Gly652Alawere observed to be part of α-helix and bend structure within the Cterminal domain of GRS where these sites are solvent accessible. It is likely that alterations at these two positions could impair or enhance the ability of GRS to interact with other proteins leading to pathological phenotypes. For the Ala111Val mutation located in GRS N-terminal region, there is no corresponding residue in the crystal structure so its significance remains unexplained. Interestingly, amongst all aaRSs, GRS has the maximum number of disease-associated mutations reported in literature to date. Our data suggest that pathology emanating due to aforementioned GRS mutations may be due to variety of biochemical reasons e.g. alterations/interference in dimer formation and structural instability within GRS core. In   Table 3).

Mutations in leucyl-tRNA synthetase that causes infantile hepatopathies
Mutations in the LARS gene identified using whole genome sequencing have been associated with life threatening hepatopathies in new born babies [40]. Symptoms include anemia, impaired liver function and overall poor infant development [40]. The two substitution mutations -Lys82Arg and Tyr373Cyswithin the LRS catalytic domain have so far been implicated (Table 1 and Figure 2B).
We show that the wild-type residues at these two sites are conserved in homologous mammalian LRSs (Table 2). Our analyses suggest a lack of direct participation by residues at these mutation sites in inter-molecular contacts with ATP, amino acid or tRNA. The Tyr373Cys position lies partially buried in a β-strand within the LRS catalytic domain ( Figure 2B). In addition, this site has~80% structural conservation based on non-redundant dataset of 57 Class I aaRSs (Table 2). These results indicate an important contribution of the site 373 in LRS. It is likely that this mutation affects aaRS conformation as substitution of a bulky buried hydrophobic residue (tyrosine) with smaller cysteine in the β-strand could destabilize the local network of non-bonded interactions within the protein structure. The second mutation location of Lys82Arg occurs within the catalytic domain in a buried environment but without a clear hint of its possible structural effects on LRS ( Figure 2B). Based on our structural analysis Lys82Arg and Tyr373Cys mutations falls under categories (d) and (b) respectively (Table 3).

Mutations in tyrosyl-tRNA synthetase cause Charcot-Marie-Tooth (CMT) disease
Dysfunctional cytoplasmic YRS results because of two substitution and one deletion mutation (Table 1 and Figure 2B) resulting in Charcot-Marie-Tooth (CMT) disease [41]. Amongst heritable disorders of peripheral nervous system, CMT is the most prevalent disease [47,48]. CMT can be of two typestype I is induced by axonal demyelination whereas type II results because of decreased amplitudes of evoked motor and sensory nerve responses [49]. Symptoms for CMT include muscular weakness, stoppage gait, high arched foot, reduced or absent deep-tendon reflexes, and impaired sensation [47,49]. Two substitution mutations in YRS have been identified using genome-wide SNP analysis followed by PCR-RFLP (Polymerase Chain Reaction-Restriction Fragment Length Polymorphism) of selected candidate genes. We observed that the wildtype YRS residues at these sites are highly conserved in mammalian YRS sequences ( Table 2). The Gly41Arg change occurs in a sector responsible for ATP recognition and hence potentially disrupts ATP binding to YRS ( Figure 2B). This drastic mutation site occurs in a solvent accessible site where it forms part of β-strand structure within the catalytic domain ( Figure 2B). In contrast, the Glu196Lys mutation site is buried as part of catalytic domain α-helix in protein core ( Figure 2B and Table 2).
Reversal of charge coupled with larger size of mutated residue (Glu196Lys) may alter the local physicochemical environment -thereby altering YRS structural stability. Finally, the four-residue deletion (Val153-Val156) occurs in a solvent exposed sector of α-helix H7 within the Rossmann fold domain and thereby likely affects the  His472Arg Glu196Lys     Figure 4). Based on our annotation criteria, the Gly41Arg and Glu196Lys mutations belong to categories (a) and (b), respectively whereas the deletion mutation falls under category (c) ( Table 3).
Mutations in alanyl-tRNA synthetase that cause CMT and muscular neuropathy Four ARS substitution mutations (Asn71Tyr, Arg329His, Glu778Ala and Asp893Asn) identified using exome sequencing are associated with CMT disease [42] (Table 1 and Figure 2B). Interestingly, for Glu778Ala and Asp893Asn the corresponding mutated residue is naturally present in other mammalian ARS sequences e.g. B. taurus ARS has Ala778 and Mustela putorius furo (domestic ferret) ARS has Asn893 (Table 2). Our results suggest that none of the four substitution mutations directly participate in intermolecular interactions with ARS substrates ( Figure 2B and Table 2). The buried site Asn71Tyr lies in catalytic domain β-strand structure where it may affect enzyme stability due to introduction of large hydrophobic residue in place of smaller hydrophilic one ( Figure 2B). The Arg329His mutation occurs in a region of high sequence and structural conservation at a solvent inaccessible site Table 3 Potential defects identified using our mutational annotation pipeline in aaRSs (Continued)  within the ARS catalytic domain suggesting structural perturbation ( Figure 2B). Previous biochemical studies have shown that ARSs with mutations of Asn71Tyr or Arg329His are defective in aminoacylation activity [42]. The Glu778Ala and Asp893Asn reside in C-terminal domain and there are no corresponding regions in the homologous crystal structure. Overall, the four ARS mutations were observed in different sub-domains of ARS and the two annotated mutations (Asn71Tyr and Arg329His) belong to category (b) whereas the other two mutations fall under category (d) ( Table 3).

Mutations in lysyl-tRNA synthetase cause CMT disease
Three KRS substitution mutations (Leu133His, Ile302Met, and Thr623Ser) identified using whole exome sequencing have been associated with CMT disease (Table 4 and Figure 5B) [45,46]. KARS gene encodes two alternately spliced isoforms that catalyze aminoacylation in cytoplasm and mitochondria. Our sequence analyses suggest that the corresponding wild-type residues are fully conserved in mammalian KRSs (Table 5). We observe that the Leu133His mutation does not participate in any direct contacts with ATP, amino acid, tRNA or dimeric partner ( Figure 5B). Interestingly, this mutation lies at a position with high sequence and structural conservation (>80%) and is part of a buried α-helix within the anticodon binding domain suggesting that it might affect KRS structural stability (Table 5). Previous reports on functional aspects of the Leu133His mutation show that the mutant protein has severely compromised aminoacylation activity [45]. The Ile302Met adopts β-strand conformation within the catalytic domain and does not participate in inter-molecular interactions with any of the components of aminoacylation reaction ( Figure 5B). It is likely that the introduction of this mutation causes steric incompatibility within the protein structure. For the third mutation (Thr623Ser) that lies towards C-terminal there is no corresponding residue in structural homologues. Overall, KRS mutations appear to affect structural stability of the enzyme leading to CMT and therefore these mutations fall in category (b) and (d). Finally, a frame shift mutation (Tyr173SerfsX7) within the anti-codon binding domain results in premature termination of the transcript, again linked to CMT [45].

Mutational landscape of mitochondrial aaRSs
Mitochondria are ATP synthesizing organelles in eukaryotic cells. The 16,569 base pairs long closed circular human mitochondrial DNA encodes for proteins, rRNA and tRNA [64]. Additional nuclear encoded proteins required for protein synthesis such as aaRSs and transcription factors are imported into mitochondria [64]. Dysfunctional mitochondria, due to mutations in mitochondrial or nuclear DNA (including in tRNA genes and those that encode mitochondrial proteins), have been implicated in numerous human diseases [28,[65][66][67]. Remarkably, the pathologies of dysfunctional mitochondria aaRSs are not restricted to systematic impairment of ATP synthesis but rather seem to have tissue-specific phenotypes [23]. Mutations have been identified in both Class I and II enzymes. Leukoencephalopathy with brainstem and spinal cord involvement and elevated lactate/brain SS and RSA could not be assigned for residues that were disordered in the crystal structure. Also see Figure 5.
Most human diseases attributable to mutations in aaRSs are due to alterations in the mitochondrial copies of aaRSs. We show that~55% of mitochondrial aaRSs have been implicated in disease-associated mutations compared with~20% for cytoplasmic aaRSs (Table 4 and Figure 1B). Each point mutation in a given mitochondrial aaRS has been independently associated with a given disease except in cases of RRS and HRS where more than one substitution has been observed in a single patient (see Table 5). Figure 5A shows schematic representations of mitochondrial aaRS domain architectures. Specific analyses of mutations follow here: Mutations in mito-leucyl-tRNA synthetase causes ovarian failure and diabetes Three substitution mutations (His324Gln, Thr522Asn and Thr629Met) in the mitochondrial LRS have been associated with ovarian failure, hearing loss and type-2 diabetes (Table 4 and Figure 5B). Multiple sequence alignment of homologous LRS sequences reveals that the wild-type residues for these are fully conserved across mammalian LRSs (Table 5). We observed that none of these mutations are likely to directly interact with aminoacylation reaction substrates ( Figure 5B and Table 5). The position for His324Gln mutation lies buried as part of α-helical region (H12 in PDB ID 4AQ7) in the catalytic domain and shows~60% structural conservation within class I aaRSs. Thr522Asn and Thr629Met are located in buried sites within the LRS catalytic domain and adopt turn and β-strand conformations respectively. In summary, our structural analysis suggests that all the three LRS mutation falls into category (b).
Mutations in mito-tyrosyl-tRNA synthetase cause MLASA Two substitution mutations (Gly46Asp and Phe52Leu) in the YRS Rossmann fold domain have been linked to myopathy, lactic acidosis and sideroblastic anemia (MLASA) [53,54]. We observed that in each of these cases, the corresponding wild-type residues are fully conserved in YRSs (Table 5). Further investigation into location of mutations suggests a lack of direct interactions with YRS substrates or participation in enzyme dimerization ( Figure 5B). The Gly46Asp and Phe52Leu mutations are solvent accessible and occur in β-strand and turn regions respectively within the N-terminal region (and distal from the catalytic sector). It therefore remains unclear from our analyses how these mutations in non-functional YRS regions lead to disease states in humans (category d). Previous biochemical studies have shown that the mutant YRSs have abnormal kinetics vis-a-vis wild-type enzyme [53].

Mutations in mito-alayl-tRNA synthetase cause cardiomyopathy
Two substitution mutations (Leu155Arg and Arg592Trp) in ARS identified using whole exome sequencing are associated with cardiomyopathies in infants [59] (Table 4 and Figure 5B). Comparison of the mammalian ARS sequences suggests very high conservation of these two ARS sites. The Leu155Arg change is part of α-helix (H4 in Rossmann fold domain), remains buried and given distance considerations it is unlikely to affect intermolecular substrate contacts ( Figure 5B). The drastic alteration in physicochemical environment (replacement of leucine with positively charged arginine) may affect ARS folding and stability. The other ARS mutation (Arg592Trp) occurs within the editing domain and is severe given the conversion of arginine into a large aromatic residue. Hence, both the mutations are likely to reduce structural stability of ARS and therefore fall in category (b).
Mutations in mito-glutamyl-tRNA synthetase cause leukoencephalopathy ERS mutations are associated with multiple pathologies including myopathy, respiratory failure and retinitis pigmentosa [52]. A total of 14 substitution mutations and one insertion mutation (Thr426_Arg427insL) have been reported for ERS (Table 4). Only one of the substitution mutations, Arg107His, is naturally present in mitochondrial ERS from Myotis davidii (mouse-eared bat) (Table 5 and Figure 6A). All the other mutations map to evolutionarily conserved positions within the ERS family ( Figure 6B). Four substitution mutations (Arg107His, Arg108Trp, Gly110Ser, and Arg516Gln) are observed at positions with very high structural conservation (>80%, residues are red in Figure 6B) and therefore may affect enzyme structural core. In addition, the loss of positive charge at three out of these four positions would alter the electrostatic properties of ERS. The Gly317Cys mutation occurs in the tRNA binding sector and could potentially affect tRNA binding to ERS ( Figure 6C). Four additional mutations -Arg55His, Glu96Lys (catalytic domain), Arg516Gln (C-terminal domain), and a deletion mutation at position 398 (catalytic domain) occur in α-helical structures of ERS. Amongst these substitution mutations, Arg55His and Arg516Gln are buried and therefore could potentially alter enzyme stability. In cases of the Arg107His and Arg108Trp mutations, the substitution of exposed arginines will likely alter ERS electrostatic properties. Taken together, the ERS mutations fall within two different categories -(a) and (b) ( Table 3).
Mutations in mito-arginyl-tRNA synthetase cause Ponto cerebellar hypoplasia Five substitution mutations and one frame shift mutation have been identified in human RRS (Table 4 and Figure 5B) [55]. Our annotation analysis suggests that none of the five substitutions are likely to directly contact substrate atoms ( Figure 5B). Of the five, sequence comparisons show that Gln12Arg mutation is naturally present in rabbit RRS sequence (Table 5) suggesting that this mutation may not alter enzyme activity as it is tolerated in other mammals. However, in case of other three -Trp241Arg, Arg245Gln and Arg469His the wild-type residues are highly conserved and these form part of an α-helix. The Trp241Arg and Arg245Gln substitutions occur at buried sites that show high structural conservation (>80%) within the Rossmann fold domainhence these are likely to cause structural perturbation of RRS (category b) ( Figure 5B). Further, the Arg469His lies in the tRNA binding region of RRS editing domain and hence falls in category (a). For the remaining two Nterminal mutations (Ile9Val and Gln12Arg) our analyses suggest structural disorder in homologs and hence these remain un-annotated (category d). Amongst the five disease-associated mutations in RRS, Arg245Gln and Arg469His substitutions were observed in single patient family; similarly, Gln12Arg and Trp241Arg substitutions were identified within one disease affected family [55].

Mutations in mito-seryl-tRNA synthetase cause several disorders
The Asp390Gly and Arg402His mutations in SRS are associated with fatalities in newborns [56] (Table 4 and Figure 5B). Patients with the HUPRA (Hyperuricemia, pulmonary hypertension, renal failure, and alkalosis) syndrome generally die because of multi-organ failure, respiratory insufficiency or lung hypertensionthese SRS mutations were identified using SNP microarray analysis [56]. Our analysis suggests that mutation hot sites occur in residues of high conservation across mammalian SRSs ( Table 5). The Asp390Gly and Arg302His changes occur in buried sites within catalytic domain of SRS, although they likely do not contact enzyme substrates ( Figure 5B). Interestingly, it has been reported that the Asp390Gly mutant protein exhibits lower level of aminoacylation activity for one of the (two) isoacceptor tRNA Ser [56]. However, it seems that the Asp390Gly mutation does not affect charging of the other isoacceptor tRNA Ser UCN [56]. These data present an enigma as they suggest differential recognition of cognate tRNAs by the Asp390Gly mutant protein. Nonetheless, the Asp390Gly mutation leads to loss of function leading the pathological condition. From our analysis, it is evident that both these mutations are in category (b).
Mutations in mito-histidyl-tRNA synthetase cause Perrault syndrome HRS mutations identified using genomic sequencing are associated with ovarian dysgenesis and Perrault syndromes [57]. Two substitution mutations (Leu200Val and Val368-Leu) and one deletion mutation in HRS have been implicated (Table 4 and Figure 5B). The affected individual was observed to harbor both these mutations in HRS simultaneously. Sequence analyses within mammalian HRSs reveal high conservation of wild type residues at these positions (Table 5), suggesting evolutionarily important functional roles. Both Leu200Val and Val368Leu form part of α-helix, where the former locates in HRS catalytic domain while the latter is at the junction of Rossmann fold domain and the loop connecting C-terminal domain ( Figure 5B). Our analysis shows that both these mutation sites are buried, and hence it is unlikely that these participate in direct contacts with either the enzyme substrates or its dimeric partner. These observations suggest that both these mutations fall under category (b). Finally, a deletion mutant where residues Leu200 to Lys211 are missing is associated with Perrault syndrome [57]. These residues (200 to 211) constitute a turn and a β-strand structure within the dimeric interface of HRS ( Figure 7A). It is likely that the deletion mutant protein has severely compromised ability to form dimers (category c) and hence impaired enzymatic activity.
Mutations in mito-phenylalanyl-tRNA synthetase cause muscular and neurological disorders Three FRS substitution mutations identified using whole exome sequencing have been implicated in various neurological and metabolic diseases in humans [58] (Table 4 and Figure 5B). Wild-type positions at mutant sites display very high sequence conservation (>90%) within mammalian FRSs (Table 5). Our structural evaluations suggest that the Ile329Thr mutation within the catalytic domain is likely to make direct contacts with ATP ( Figure 5B). The Ile329Thr and Asp391Val (in C-terminal domain) sites are present in 3 10 helical regions of FRS ( Figure 5B). The third substitution mutation (Tyr144Cys) is buried and is not part of a well-defined secondary structure element within the protein. Interestingly, the site for Tyr144Cys mutation suggests (See figure on previous page.) Figure 6 Annotation for mutations in human mitochondrial ERS. (A) Frequency distribution of residues at sites of ERS mutations. The distribution was calculated based on top 100 mammalian homologous sequences identified using BLAST. A '-' represents frequency for gaps in the alignment. (B) Sequence alignment of mitochondrial ERS (UniProt ID Q5JPH6) and T. maritima ERS where substitutions in human protein are shown in boxes. Residues in T. maritima ERS that interact with ATP/analogue and tRNA are underlined in black and red respectively. Structural conservation for T. maritima ERS is graded in four categories <40%, 40-60%, 60-80%, and >80% which are highlighted as magenta, blue, green and red respectively. (C) Mutations (in parentheses) in human ERS mapped on to the crystal structure of T. maritima ERS where mutation in tRNA-binding interface is highlighted with boxed labels.
high structural conservation (>80%). Overall, our results are consistent with previous reports that correlate these FRS mutations to disease states by affecting aminoacylation activity as well as destabilizing the structure of FRS [58]. Hence, the three FRS mutations fall within categories (a) and (b).

Mutations in mito-aspartyl-tRNA synthetase cause leukoencephalopathy
Autosomal recessive mutations in mitochondrial DRS induce LBSL (Leukoencephalopathy with brainstem and spinal cord involvement and elevated lactate) [68,69] that is characterized by abnormal muscle stiffness (spasticity) and difficulty with coordinating movements (ataxia) [68,69]. Ten substitution mutations, one frame shift mutation, and one deletion mutation have been identified in DRS (Table 4 and Figure 5B). These sites are highly conserved across mammalian DRSs (Table 5). Four substitution mutations -Ser45Gly, Cys152Phe, Arg179His, and Arg263Gln are at the dimeric interface of DRS ( Figure 5B). It is likely that these mutations disrupt the network of non-bonded interactions at the dimeric interface leading to dysfunctional DRS. The Leu613Phe mutation occurs in the tRNA binding region -however this mutation has been shown to have no affect on enzyme activity [70]. The site for Leu626Gln mutation has very high structural conservation (>80%) indicating that substitution at this site might perturb DRS structure. Two nonsense mutations at positions Arg263 and Glu425 result in the introduction of a stop codon leading to inactive protein. One deletion within the anticodon-binding domain ranging from Met134 to Lys165 has also been associated with LBSL in humans ( Figure 7B) [69]. Overall, our mutational annotations suggest that the pathological consequences of dysfunctional DRS could stem from various molecular abnormalities including defective dimerization, incompetent tRNA binding and structural instability. The DRS mutations hence fall under categories (a), (b), and (c) ( Table 3).

Conclusions
We have developed a mutational annotation pipeline that functionally categorizes mutations in proteins. The pairwise sequence alignment, which is the primary output from our analysis pipeline, pictorially provides valuable structural and evolutionary sequence conservation information. Our approach also offers rapid structural annotation for mutations in proteins vis-à-vis methods that first build a 3D model of a sequence followed by mapping of the mutations. Our methodology can be automated to annotate and classify mutations in any protein family. In addition to the annotations adopted in this study, customized annotations can also be included in the output e.g., the pairwise sequence alignment can have annotations for residues interacting with biomolecular partner(s). The approach presented here is built on an open-source platform that is freely available for academic research, and it provides a facile tool to decipher molecular effects of mutations in proteins.
Mutations in proteins can cause pathological effects through a variety of molecular mechanisms. Integrated computational and structural biology offers an opportunity to analyze molecular phenotypes for disease-associated mutations in a high-throughput manner within structural contexts of mutant proteins. The aaRSs are ubiquitous enzymes that drive protein translation in cells [5]. These enzymes accurately catalyze charging of tRNAs with their cognate amino acids and therefore effectively decode the genetic code [5,7]. Mutations in aaRSs lead to a variety of diseases in humans [28]. Considering the diverse roles of aaRSs, it is likely that mutations within these proteins could manifest their pathological phenotypes by either affecting the canonical activities (aminoacylation and/or editing) or non-canonical functions (gain-or lossof-function). The etiology of these mutant phenotypes could involve multitude of facets such as altered intermolecular interactions; abnormal sub-cellular localization; changes in oligomeric states; variations in structural stability or protein aggregation tendencies.
In this study, we have performed systematic and rigorous computational analyses of human aaRSs disease-associated mutations to evaluate their evolutionary, structural and functional characteristics. Our results show that aaRS mutations can occur in structurally conserved regions within both cytoplasmic and mitochondrial aaRSs. Four mutations in cytoplasmic and seven in mitochondrial aaRSs were observed to participate in inter-molecular interactions with substrates of aminoacylation reactions such as ATP, amino acid or tRNA ( Figure 6). Such mutations therefore could directly manipulate the ability of aaRSs to accurately execute enzymatic activities potentially leading to slower kinetics or even loss of activity. In addition, mutational alterations distal from the active site regions could affect non-canonical functions that are often associated with this protein family (Table 3). Evidence supporting pathological effects of mutations due to anomalous non-canonical aaRS functions comes from mouse model for CMT where a mutant GRS retains wild-type aminoacylation activity [71]. Apart from mutations annotation, we also show that deletion mutations in YRS and HRS likely disrupt their dimeric assemblies, which in turn would lead to stalling of new protein translation (Figures 4 and 8A). Similarly, the deletion mutations in ERS and DRS could compromise ERS-tRNA interactions and editing activity respectively ( Figures 6C and 8B).
In humans, 9 aaRSs and 3 auxiliary proteins assemble to form multisynthetase complex (MSC) within the cytoplasm [72]. These MSCs plays are critical role in various non-translational activities within the cell. Cytoplasmic LRS is the only component of MSC for which diseaseassociated mutations have been reported [40]. It has been suggested that the C-terminal domain of LRS interacts with RRS within MSC [73]. Our analysis shows that disease-associated mutations in LRS localize within the N-terminal catalytic domain, and thus may not have a direct affect on the assembly of MSC. Further crystallographic studies on MSC would provide necessary insights into the assembly of different synthetases within these complexes, as has been achieved for other cellular assemblies [74][75][76].
Out of 63 mutations annotated in this work, only 12 (~20%) were observed in regions that could directly affect aminoacylation activity via either binding to ATP/ amino-acid, tRNA or by involvement in dimerization. Mutations in structural cores or at potential biomolecular interfaces account for 55% mutations while remaining mutations (25%) remain structurally un-annotated ( Figure 8). These observations call for further experimental investigations to understand the molecular effects caused by mutations in aaRSs. Overall, the landscape for mutated aaRSs highlights that no particular site within aaRSs is specifically prone to mutations, or seems mutated often. Amongst cytoplasmic and mitochondrial aaRSs, the most frequently mutated residues were glycine (smallest residue, 6 mutations) and arginine (largest polar residue, 10 mutations) respectively. Further, Gly to Arg, Arg to His, and Arg to Gln were the three most frequent substitution mutations in aaRSs (Tables 1 and 4). Intriguingly, only~20% of cytoplasmic aaRSs have been reported to harbor diseaseassociated mutations compared to~55% in mitochondrial aaRSs ( Figure 1A and 1B). Understanding the determinants for higher propensity of mutations in mitochondrial aaRSs within the nuclear genome requires additional functional and genetic data. Such thrusts would advance our understanding of heritable disorders since mitochondria plays a critical role in maternal inheritance and has been implicated in numerous human diseases.

Additional file
Additional file 1: Figure S1. Outline for the mutational annotation pipeline.