Skip to main content

Table 1 EST sequence datasets

From: Improved coverage of cDNA-AFLP by sequential digestion of immobilized cDNA

Organism

EST database

FASTA file size (MB)

No. ESTs

Nucleotide letters

Ø EST length (nucleotides)

Ambiguous bases/1000 nucleotides

Arabidopsis thaliana

Complete UniGene set

48.1

29215

42.9 E06

1467

0.9 E-01

 

NM_RefSeq sequences

28.9

16710

26.8 E06

1606

2.6 E-04

Mus musculus

Complete UniGene set

108.0

66691

93.9 E06

1439

0.94

 

NM_RefSeq sequences

14.4

4044

14.1 E06

3067

1.3 E-02

Homo sapiens

Complete UniGene set

131.1

85967

114.5 E06

1335

0.73

 

NM_RefSeq sequences

24.9

6537

24.5 E06

3746

0.4 E-02

  1. UniGene ESTs of Arabidopsis Build #58, mouse Build #162 and human Build #201 all from the NCBI library collection were chosen as test sequences and used for the simulation of different cDNA-AFLP protocols. RefSeq sequences with a _NM identifier provided a subset of high quality ESTs in terms of length and undefined nucleotides.