Skip to main content

Table 3 SGP automatic annotation statistics.

From: Annotated Expressed Sequence Tags (ESTs) from pre-smolt Atlantic salmon (Salmo salar) in a searchable data resource

Database

% Annotated (automatic annotation)

 

Contigs and singlets

Contigs

Singlets

GO-GOA

30.5

48.5

22.0

pdb or swiss-prot

32.6

51.1

23.9

pdb

17.3

31.4

10.7

swiss-prot

31.7

50.1

23

nr

41.1

60.3

32.1

nt

36.7

53.7

28.6

Distribution between databases

   

pdb

17.3

31.4

10.7

swiss-prot

15.3

19.7

13.2

nr

8.9

9.6

8.6

nt

8.8

7.7

9.3

any database (pdb + swiss-prot + nr + nt)

50.3

68.4

41.8

no hits

49.7

31.6

58.2

  1. Detailed SGP dataset annotation statistics is available at SGP data resource > Data and results > Annotations > SGP full annotation, statistics.
  2. Databases. NCBI databases – pdb: RCSB-PDB; swiss-prot: SWISS-PROT protein sequence database; nr: all non-redundant GenBank CDS translations + RefSeq Proteins + PDB + SwissProt + PIR + PRF; nt (nucleotide sequences): all GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS), no longer "non-redundant". GO-GOA: Gene Ontology assignments for the UNIPROT database produced by the GOA project.
  3. % Annotated. Calculated as the number of sequences with successful annotations for a given subset (i.e. contigs and singlets, contigs, singlets) in a given database to the total number of sequences of this subset where annotation was attempted; GO annotation statistics was calculated separately from other databases as GO hits/GO no-hits. BLAST threshold E-values of 10-10 for PDB and 10-15 for other databases were used.
  4. Distribution between databases. Only one successful annotation per sequence was counted, in the following ranking order: pdb OR swiss prot OR nr OR nt.
  5. SGP dataset. 20019 contig consensus and singlet sequences.