Table 2 Definition of the level of curation terms

From: The grapevine gene nomenclature system

Value Definition
Hypothetical protein Allocated to each locus at the beginning of the process, meaning that the gene codes for a protein, for which no information regarding its function or actual existence is known. It should be removed only when existence of transcript is proven.
Expressed Replaces “hypothetical” if existence of transcripts has been proven through expression data (proof of existence of RNA(s): RT-PCR, EST, RNA-seq, Northern blots, microarrays, etc.). The next step is to determine if similarity with sequences in other species can be observed.
ZZZ domain containing Allocated if by comparison with other sequences or by performing a domain analysis, the highest level of information on the coding protein is the presence of a given domain ZZZ.
Similar to Indicates that the existence of a protein is probable because a minimal level of similarity with a protein from a plant species was met. An e-value of e-20 is considered to be a reasonable cut-off or to have at least 30% identity for at least 80 contiguous amino acids, which places it into the “safe zone” as defined by [32]. The gene is labelled here as “similar to XXX”, with “XXX” being the homologous protein from another species.
YYY If the gene has been experimentally characterized and named YYY or if there is >95% identical amino acids on the whole sequence to a grapevine protein YYY with a known function, then the label should be the value “YYY” that corresponds to a gene whose function has been discovered and characterized in the Vitis Genus.
Putative Derived from in silico evidence on function, indicates that there is some logical or conclusive evidence that the given annotation could apply. This non-experimental qualifier is often used to present results from protein sequence analysis software, which are only annotated if the result makes sense in the biological context of a given protein. A typical example is the annotation of N-glycosylation sites in secreted proteins.
Probable Indicates stronger evidence than the qualifier “putative” on function. This qualifier implies that there must be at least some experimental evidence, which indicates that the information is expected to be found in the natural environment of a protein.
Uncertain Indicates that the existence of the protein is unsure and that there is evidence that the sequence corresponds to a pseudogene.
Translated Is acquired when experimental evidence at the protein level indicates that there is clear proof of the existence of the protein. The criteria include partial or complete Edman sequencing, clear identification by mass spectrometry, X-ray or NMR structure, good quality protein-protein interaction or detection of the protein by antibodies.