Has BLASTp vs SP results? | Has BLASTp vs. internal DB results? | Has HHSearch Results? | Has RPS-BLAST Results? | Has BLASTp vs. nr results? | Are the results significant? | Are Results Consistent Across Databases? | Are Results Consistent for a Specific Database? | Is the length of the result protein / domain similar to the query? | How much of the result sequence does the alignment cover? | How much of the query sequence does the alignment cover? | Procedure |
---|---|---|---|---|---|---|---|---|---|---|---|
No | No | No | No | No | Label as ‘hypothetical protein’. | ||||||
No | |||||||||||
No | No | No | No | Yes | Yes | No | Results can typically be disregarded and the protein can be labelled as ‘hypothetical protein’. If there is a particular interest in the query protein, examine results in greater detail to determine if they are reliable. | ||||
No | No | No | No | Yes | Yes | No | |||||
No | No | No | No | Yes | Yes | Yes | Yes | Look further into nr results to determine if they are reliable. Look to see if the results are supported by literature or if there was another tool/database used. | |||
No | No | No | Yes | Yes | Yes | Yes | All/Most | All/Most | If the result decribes a singular function, such as the “Phage terminase large subunit (GpA)” domain included in CDD, label the protein according to the function. If the result does not describe a meaningful function, such as one of CDD’s domains of unknown function (DUF), mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the domain. | ||
No | No | No | Yes | Yes | Yes | No | All/Most | Some | Mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the domain. | ||
No | No | No | Yes | Yes | No | No | All/Most | Some | If the inconsistent domain hits align to the same location in the query, mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the most significant domain. If inconsistent domain hits align to separate locations in the query, the query likely contains multiple domains. If a name can describe the function of all or a majority of the domains, that name should be used. Otherwise, consider the functional importance of the domains and statistical significance of the results to select a single domain to name the annotation with. | ||
Yes | No | No | No | Yes | Yes | Yes | All/Most | All/Most | SwissProt results are generally trustworthy and can be used directly to annotate your query. | ||
Yes | No | No | No | Yes | No | Confirm accuracy of the SwissProt annotations by using the link in the results table to view the entry within SwissProt. | |||||
Yes | Yes | Yes | No | All/Most | Some | If the result protein is known to have homologs which are functionally similar but differ in length, the result may still be trustworthy. If the length of the result and query differ by more than an expected amount, examine the position of the alignment in the query. If the result aligns to the C-terminus of the query, the start codon may be incorrect. If you are annotating a viral protein, consider if this is a polyprotein. | |||||
Yes | |||||||||||
Yes | |||||||||||
Yes | Yes | Yes | No | Some | All/Most | If the result protein is known to have homologs which are functionally similar but differ in length, the result may still be trustworthy. If the length of the result and query differ by more than an expected amount, examine the position of the alignment in the result protein. If the query aligns to the C terminus of the result, the start codon in the query may be incorrect and the true start codon may be upstream. If the query aligns to the N-terminus of the result, the query may be truncated. This can be caused by a mutation or a transposable element. If the start codon is correct, examine the result protein to see if the location of the alignment represents a functional domain. | |||||
Yes | |||||||||||
Yes | |||||||||||
Yes | Yes | Some | Some | Examine the location of the alignment within the result protein and query. The result protein and query may share a functional domain. | |||||||
Yes | |||||||||||
Yes | |||||||||||
Yes | No | No | Moderate Significance | Yes | Yes | All/Most | All/Most | When only a moderate level of significance is exhibited for the results and there is not confirmation with multiple databases and tools, consider prepending the annoation label with “putative”. | |||
No | Yes | No | |||||||||
No | No | Yes | |||||||||
No | Yes | No | No | Yes | Yes | Yes | All/Most | All/Most | Review the result protein in MAS to confirm the accuracy of the annotation. | ||
No | No | Yes | No | Yes | Yes | All/Most | All/Most | Since HHSearch is more sensitve than the other tools, extra care needs to be taken to not overannotate the query protein with a highly specific function based on its results. For example, a result protein may be labelled as a specific type of nuclease, such as a cell death-related nuclease. If you only have HHSearch results to base your annotation on, it is safer to label the query annotation in a more general manner, i.e. ‘nuclease’. When it is not clear how to generalize the annotation, consider prepending the annoation label with “putative”. | |||
Yes | Yes | Yes | Yes | Yes | Yes | Base the labelling of the annotation on the results from the internal database so naming conventions remain consistent. | |||||
Yes | Yes | ||||||||||
Yes | Yes | ||||||||||
Any combination of 2 or more | Yes | No | Investigate the cause for the inconsistency. Consider the evidence for competing results. Which result has a lower e-value? Which database is is more higly curated? View the result protein within its native database to see what evidence is given to support the given annotation. |