Skip to main content

Table 2 GenBank data sets

From: An overabundance of phase 0 introns immediately after the start codon in eukaryotic genes

Organism group Vertebrata Arthropoda Fungi Magnoliophyta
Total CDS with introns 54729 34336 31441 95711
pseudogene 1899 90 101 789
not experimental 1204 515 504 9150
incomplete 5' end (<) 15622 10583 11143 11659
incomplete 3' end (>) 5417 1664 569 1561
cross-reference 10445 231 2 60
join (complement) 0 16 0 34
contains 'X' 106 120 71 100
contains 'U' 26 4 0 0
no initial 'M' 222 51 9 34
zero or negative length 36 7 17 35
annotated gap 480 6 0 25
length mismatch 466 19 11 18
Used for length statistics 18807 21030 19014 72247
non-gt...ag 1734 818 1159 3368
intron too short 550 1244 2354 12125
CDS accepted 16523 18968 15501 56754
After homology reduction 3542 4179 4525 12751
With signal peptides 755 769 431 1051
Without signal peptides 2552 3202 3814 10370
  1. The number of genes (CDS features) found in GenBank within the four organism groups studied. The number of genes discarded for various reasons. The number kept after homology reduction. The numbers predicted to contain or not to contain a signal peptide.
\