Skip to main content

Table 1 Comparison of the evidence-based (EB) protein coding gene model sets predicted here for the nuclear genomes of Micromonas pusilla (CCMP1545) and Micromonas commoda (RCC299) and the original model sets (“Catalog”) published in Worden et al. 2009 [6]

From: Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

 

CCMP1545

EB set

Catalog CCMP1545

RCC299

EB set

Catalog RCC299

Protein coding genes (number)

9895

10575

10307

10056

Average transcript length (nt)

1799

1390

1808

1497

Average coding length (nt)

1653

1317

1663

1419

Introns (number)

11229

9531

5418

5650

Average intron length (nt)

180

187

152

163

Exons (number)

21060

20106

15651

15706

Average exon length (nt)

840

731

1182

958

Spliced genes (number)

5666

5311

3584

3688

Exons per multiple-exon gene

2.98

2.79

2.51

2.54

Total intergenic bases (Mb)

2.3

n.c.

1.7

n.c.

Total exonic bases (Mb)

17.6

n.c.

17.4

n.c.

Total intronic bases (Mb)

0.20

n.c.

0.78

n.c.

GC splice donors (%)

25.60

5.90

1.50

0.70

  1. Note that for average transcript and coding lengths introns have been removed. For the EB sets gene characteristics were computed on 9826 (CCMP1545) and 10233 (RCC299) proteins, to which 69 and 74 models were later added, respectively, due to new RNA-seq support. Abbreviation: n.c., not computed