Skip to main content

Table 1 A general guide for using results in MAS to determine an annotation label. This guide is meant to serve as an example for how a user could interrogate results and how a user could proceed in a variety of different situations. Each row represents a particular situation, which leads to a procedure. Depending on the results, one or more procedures may be applicable. Grey cells in a row mean the question is non-applicable or not considered

From: Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes

Has BLASTp vs SP results?

Has BLASTp vs. internal DB results?

Has HHSearch Results?

Has RPS-BLAST Results?

Has BLASTp vs. nr results?

Are the results significant?

Are Results Consistent Across Databases?

Are Results Consistent for a Specific Database?

Is the length of the result protein / domain similar to the query?

How much of the result sequence does the alignment cover?

How much of the query sequence does the alignment cover?

Procedure

No

No

No

No

No

      

Label as ‘hypothetical protein’.

     

No

     

No

No

No

No

Yes

Yes

 

No

   

Results can typically be disregarded and the protein can be labelled as ‘hypothetical protein’. If there is a particular interest in the query protein, examine results in greater detail to determine if they are reliable.

No

No

No

No

Yes

Yes

  

No

  

No

No

No

No

Yes

Yes

 

Yes

Yes

  

Look further into nr results to determine if they are reliable. Look to see if the results are supported by literature or if there was another tool/database used.

No

No

No

Yes

 

Yes

 

Yes

Yes

All/Most

All/Most

If the result decribes a singular function, such as the “Phage terminase large subunit (GpA)” domain included in CDD, label the protein according to the function. If the result does not describe a meaningful function, such as one of CDD’s domains of unknown function (DUF), mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the domain.

No

No

No

Yes

 

Yes

 

Yes

No

All/Most

Some

Mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the domain.

No

No

No

Yes

 

Yes

 

No

No

All/Most

Some

If the inconsistent domain hits align to the same location in the query, mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the most significant domain. If inconsistent domain hits align to separate locations in the query, the query likely contains multiple domains. If a name can describe the function of all or a majority of the domains, that name should be used. Otherwise, consider the functional importance of the domains and statistical significance of the results to select a single domain to name the annotation with.

Yes

No

No

No

 

Yes

 

Yes

Yes

All/Most

All/Most

SwissProt results are generally trustworthy and can be used directly to annotate your query.

Yes

No

No

No

 

Yes

 

No

   

Confirm accuracy of the SwissProt annotations by using the link in the results table to view the entry within SwissProt.

Yes

   

Yes

 

Yes

No

All/Most

Some

If the result protein is known to have homologs which are functionally similar but differ in length, the result may still be trustworthy. If the length of the result and query differ by more than an expected amount, examine the position of the alignment in the query. If the result aligns to the C-terminus of the query, the start codon may be incorrect. If you are annotating a viral protein, consider if this is a polyprotein.

 

Yes

 
  

Yes

Yes

    

Yes

 

Yes

No

Some

All/Most

If the result protein is known to have homologs which are functionally similar but differ in length, the result may still be trustworthy. If the length of the result and query differ by more than an expected amount, examine the position of the alignment in the result protein. If the query aligns to the C terminus of the result, the start codon in the query may be incorrect and the true start codon may be upstream. If the query aligns to the N-terminus of the result, the query may be truncated. This can be caused by a mutation or a transposable element. If the start codon is correct, examine the result protein to see if the location of the alignment represents a functional domain.

 

Yes

    
  

Yes

   

Yes

    

Yes

   

Some

Some

Examine the location of the alignment within the result protein and query. The result protein and query may share a functional domain.

 

Yes

     
  

Yes

    

Yes

No

No

  

Moderate Significance

 

Yes

Yes

All/Most

All/Most

When only a moderate level of significance is exhibited for the results and there is not confirmation with multiple databases and tools, consider prepending the annoation label with “putative”.

No

Yes

No

   

No

No

Yes

   

No

Yes

No

No

 

Yes

 

Yes

Yes

All/Most

All/Most

Review the result protein in MAS to confirm the accuracy of the annotation.

No

No

Yes

No

 

Yes

  

Yes

All/Most

All/Most

Since HHSearch is more sensitve than the other tools, extra care needs to be taken to not overannotate the query protein with a highly specific function based on its results. For example, a result protein may be labelled as a specific type of nuclease, such as a cell death-related nuclease. If you only have HHSearch results to base your annotation on, it is safer to label the query annotation in a more general manner, i.e. ‘nuclease’. When it is not clear how to generalize the annotation, consider prepending the annoation label with “putative”.

Yes

Yes

   

Yes

Yes

Yes

Yes

  

Base the labelling of the annotation on the results from the internal database so naming conventions remain consistent.

 

Yes

Yes

    
 

Yes

 

Yes

   

Any combination of 2 or more

 

Yes

No

    

Investigate the cause for the inconsistency. Consider the evidence for competing results. Which result has a lower e-value? Which database is is more higly curated? View the result protein within its native database to see what evidence is given to support the given annotation.