Skip to main content

Table 1 The different sequence sets used in the validation experiment.

From: A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

class

fraction

abbrev.

size

source

WGS

Pilot Bacterial Artificial Chromosomes

BAC

14.8

[35]

WGS

BAC End Sequences

BES

8.0

AGI [50]

Repeat

Transposable Elements

RepI

8.0

MIPS [28]

Repeat

DNA transposons

RepII

0.1

MIPS [28]

GE

Expressed Sequence Clusters

EST

56.3

PlantGDB [51]

GE

AZM4 High-C0t

AZM4 HC

188.8

TIGR [52]

GE

AZM4 Methyl-filtered

AZM4 MF

156.8

TIGR [52]

GA

TIGR4 assembly of rice (Oryza sativa L. ssp. japonica)

Osj:TIGR4

420.0

TIGR [37]

GA

TAIR7 assembly of Arabidopsis thaliana

At:TAIR7

115.4

TAIR [42]

GA

JGI1.1 assembly of poplar (Populus trichocarpa)

Pt:JGI1.1

410.0

[43]

GA

Genoscope1 assembly of grapevine (Vitis vinifera)

Vc:GEN1

487.0

[44]

  1. WGS stands for survey of 'Whole Genome Sequences'. GE stands for 'gene enrichment'. GA stands for 'genome assemblies'. The sequence sizes are given in million base pairs.