Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes

Lueder, Matthew R.; Cer, Regina Z.; Patrick, Miles; Voegtly, Logan J.; Long, Kyle A.; Rice, Gregory K.; Bishop-Lilly, Kimberly A.

doi:10.1186/s12864-021-08029-8

Table 1 A general guide for using results in MAS to determine an annotation label. This guide is meant to serve as an example for how a user could interrogate results and how a user could proceed in a variety of different situations. Each row represents a particular situation, which leads to a procedure. Depending on the results, one or more procedures may be applicable. Grey cells in a row mean the question is non-applicable or not considered

From: Manual Annotation Studio (MAS): a collaborative platform for manual functional annotation of viral and microbial genomes

Has BLASTp vs SP results?	Has BLASTp vs. internal DB results?	Has HHSearch Results?	Has RPS-BLAST Results?	Has BLASTp vs. nr results?	Are the results significant?	Are Results Consistent Across Databases?	Are Results Consistent for a Specific Database?	Is the length of the result protein / domain similar to the query?	How much of the result sequence does the alignment cover?	How much of the query sequence does the alignment cover?	Procedure
No	No	No	No	No							Label as ‘hypothetical protein’.
					No						Label as ‘hypothetical protein’.
No	No	No	No	Yes	Yes		No				Results can typically be disregarded and the protein can be labelled as ‘hypothetical protein’. If there is a particular interest in the query protein, examine results in greater detail to determine if they are reliable.
No	No	No	No	Yes	Yes			No
No	No	No	No	Yes	Yes		Yes	Yes			Look further into nr results to determine if they are reliable. Look to see if the results are supported by literature or if there was another tool/database used.
No	No	No	Yes		Yes		Yes	Yes	All/Most	All/Most	If the result decribes a singular function, such as the “Phage terminase large subunit (GpA)” domain included in CDD, label the protein according to the function. If the result does not describe a meaningful function, such as one of CDD’s domains of unknown function (DUF), mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the domain.
No	No	No	Yes		Yes		Yes	No	All/Most	Some	Mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the domain.
No	No	No	Yes		Yes		No	No	All/Most	Some	If the inconsistent domain hits align to the same location in the query, mark the protein as ‘XXX domain-containing protein’, where XXX is replaced by the name of the most significant domain. If inconsistent domain hits align to separate locations in the query, the query likely contains multiple domains. If a name can describe the function of all or a majority of the domains, that name should be used. Otherwise, consider the functional importance of the domains and statistical significance of the results to select a single domain to name the annotation with.
Yes	No	No	No		Yes		Yes	Yes	All/Most	All/Most	SwissProt results are generally trustworthy and can be used directly to annotate your query.
Yes	No	No	No		Yes		No				Confirm accuracy of the SwissProt annotations by using the link in the results table to view the entry within SwissProt.
Yes					Yes		Yes	No	All/Most	Some	If the result protein is known to have homologs which are functionally similar but differ in length, the result may still be trustworthy. If the length of the result and query differ by more than an expected amount, examine the position of the alignment in the query. If the result aligns to the C-terminus of the query, the start codon may be incorrect. If you are annotating a viral protein, consider if this is a polyprotein.
	Yes
		Yes
Yes					Yes		Yes	No	Some	All/Most	If the result protein is known to have homologs which are functionally similar but differ in length, the result may still be trustworthy. If the length of the result and query differ by more than an expected amount, examine the position of the alignment in the result protein. If the query aligns to the C terminus of the result, the start codon in the query may be incorrect and the true start codon may be upstream. If the query aligns to the N-terminus of the result, the query may be truncated. This can be caused by a mutation or a transposable element. If the start codon is correct, examine the result protein to see if the location of the alignment represents a functional domain.
	Yes
		Yes
Yes					Yes				Some	Some	Examine the location of the alignment within the result protein and query. The result protein and query may share a functional domain.
	Yes
		Yes
Yes	No	No			Moderate Significance		Yes	Yes	All/Most	All/Most	When only a moderate level of significance is exhibited for the results and there is not confirmation with multiple databases and tools, consider prepending the annoation label with “putative”.
No	Yes	No
No	No	Yes
No	Yes	No	No		Yes		Yes	Yes	All/Most	All/Most	Review the result protein in MAS to confirm the accuracy of the annotation.
No	No	Yes	No		Yes			Yes	All/Most	All/Most	Since HHSearch is more sensitve than the other tools, extra care needs to be taken to not overannotate the query protein with a highly specific function based on its results. For example, a result protein may be labelled as a specific type of nuclease, such as a cell death-related nuclease. If you only have HHSearch results to base your annotation on, it is safer to label the query annotation in a more general manner, i.e. ‘nuclease’. When it is not clear how to generalize the annotation, consider prepending the annoation label with “putative”.
Yes	Yes				Yes	Yes	Yes	Yes			Base the labelling of the annotation on the results from the internal database so naming conventions remain consistent.
	Yes	Yes
	Yes		Yes
Any combination of 2 or more					Yes	No					Investigate the cause for the inconsistency. Consider the evidence for competing results. Which result has a lower e-value? Which database is is more higly curated? View the result protein within its native database to see what evidence is given to support the given annotation.

Back to article page

ISSN: 1471-2164

Contact us

Submission enquiries: bmcgenomics@biomedcentral.com
General enquiries: ORSupport@springernature.com

BMC Genomics

Contact us