Skip to main content

Table 1 Resource requirements for running the GDA feature extraction pipeline on a range of genomes. The GDA feature extraction pipeline was run with four genomes of different sizes. De novo repeat detection had a large effect on run time while genome size caused increases in both run time and memory usage

From: Characterising genome architectures using genome decomposition analysis

Feature set

 

P. falciparum

C. elegans

S. mansoni

H. sapiens

 

Assembly size (Mbp)

23.33

100.29

409.57

3272.09

Seq + gene + orth (without RepeatModeler)

Run time

17 min

1 h 1 min

1 h 37 min

11 h 42 min

 

CPU time (s)

2475.22

13,444.56

16,204

126,110

 

Max memory use (Mb)

4573

8878

11,738

145,277

Seq + gene + rep + orth (with RepeatModeler)

Run time

11 h 16 min

8 h 59 min

41 h 13 min

86 h 6 min

 

CPU time (s)

408,912

238,074

1,184,172

1,862,049.88

 

Max memory use (Mb)

4278

9326

11,730

128,683