Skip to main content

Table 1 EST sequence datasets

From: Improved coverage of cDNA-AFLP by sequential digestion of immobilized cDNA

Organism EST database FASTA file size (MB) No. ESTs Nucleotide letters Ø EST length (nucleotides) Ambiguous bases/1000 nucleotides
Arabidopsis thaliana Complete UniGene set 48.1 29215 42.9 E06 1467 0.9 E-01
  NM_RefSeq sequences 28.9 16710 26.8 E06 1606 2.6 E-04
Mus musculus Complete UniGene set 108.0 66691 93.9 E06 1439 0.94
  NM_RefSeq sequences 14.4 4044 14.1 E06 3067 1.3 E-02
Homo sapiens Complete UniGene set 131.1 85967 114.5 E06 1335 0.73
  NM_RefSeq sequences 24.9 6537 24.5 E06 3746 0.4 E-02
  1. UniGene ESTs of Arabidopsis Build #58, mouse Build #162 and human Build #201 all from the NCBI library collection were chosen as test sequences and used for the simulation of different cDNA-AFLP protocols. RefSeq sequences with a _NM identifier provided a subset of high quality ESTs in terms of length and undefined nucleotides.