Skip to main content

Table 1 The different sequence sets used in the validation experiment.

From: A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

class fraction abbrev. size source
WGS Pilot Bacterial Artificial Chromosomes BAC 14.8 [35]
WGS BAC End Sequences BES 8.0 AGI [50]
Repeat Transposable Elements RepI 8.0 MIPS [28]
Repeat DNA transposons RepII 0.1 MIPS [28]
GE Expressed Sequence Clusters EST 56.3 PlantGDB [51]
GE AZM4 High-C0t AZM4 HC 188.8 TIGR [52]
GE AZM4 Methyl-filtered AZM4 MF 156.8 TIGR [52]
GA TIGR4 assembly of rice (Oryza sativa L. ssp. japonica) Osj:TIGR4 420.0 TIGR [37]
GA TAIR7 assembly of Arabidopsis thaliana At:TAIR7 115.4 TAIR [42]
GA JGI1.1 assembly of poplar (Populus trichocarpa) Pt:JGI1.1 410.0 [43]
GA Genoscope1 assembly of grapevine (Vitis vinifera) Vc:GEN1 487.0 [44]
  1. WGS stands for survey of 'Whole Genome Sequences'. GE stands for 'gene enrichment'. GA stands for 'genome assemblies'. The sequence sizes are given in million base pairs.