Skip to main content

Table 3 SGP automatic annotation statistics.

From: Annotated Expressed Sequence Tags (ESTs) from pre-smolt Atlantic salmon (Salmo salar) in a searchable data resource

Database % Annotated (automatic annotation)
  Contigs and singlets Contigs Singlets
GO-GOA 30.5 48.5 22.0
pdb or swiss-prot 32.6 51.1 23.9
pdb 17.3 31.4 10.7
swiss-prot 31.7 50.1 23
nr 41.1 60.3 32.1
nt 36.7 53.7 28.6
Distribution between databases    
pdb 17.3 31.4 10.7
swiss-prot 15.3 19.7 13.2
nr 8.9 9.6 8.6
nt 8.8 7.7 9.3
any database (pdb + swiss-prot + nr + nt) 50.3 68.4 41.8
no hits 49.7 31.6 58.2
  1. Detailed SGP dataset annotation statistics is available at SGP data resource > Data and results > Annotations > SGP full annotation, statistics.
  2. Databases. NCBI databases – pdb: RCSB-PDB; swiss-prot: SWISS-PROT protein sequence database; nr: all non-redundant GenBank CDS translations + RefSeq Proteins + PDB + SwissProt + PIR + PRF; nt (nucleotide sequences): all GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS), no longer "non-redundant". GO-GOA: Gene Ontology assignments for the UNIPROT database produced by the GOA project.
  3. % Annotated. Calculated as the number of sequences with successful annotations for a given subset (i.e. contigs and singlets, contigs, singlets) in a given database to the total number of sequences of this subset where annotation was attempted; GO annotation statistics was calculated separately from other databases as GO hits/GO no-hits. BLAST threshold E-values of 10-10 for PDB and 10-15 for other databases were used.
  4. Distribution between databases. Only one successful annotation per sequence was counted, in the following ranking order: pdb OR swiss prot OR nr OR nt.
  5. SGP dataset. 20019 contig consensus and singlet sequences.