Skip to main content

Table 3 Comparisons of the Drosophila transcriptome assemblies of our postprocessing algorithm, Oases and Trans-ABySS using six publicly available libraries over different values of k-mer coverage cutoff c.

From: A memory-efficient algorithm to obtain splicing graphs and de novoexpression estimates from de Bruijn graphs of RNA-Seq data

postprocess

k_c

initial nodes

largest tangle

largest SCC

splicing graphs

max length

N50

>1-node graphs

max nodes

avg nodes

SNPs

total hits

unique hits

>1-hit graphs

max hits

time (mins)

memory (GB)

35_3

227614

178545

88094

75367

10539

544

2048

124

6

16703

38448

10719

392

5

86,18

22,2

35_5

125414

87895

41654

47958

8678

705

1720

93

6

11334

27010

9889

429

13

86,17

22,2

35_10

57978

31785

12695

27695

6383

705

1020

63

6

5034

17271

8070

308

5

86,16

22,2

Oases

k_c

locus

max length

N50

>1-trans locus

max trans

avg trans

total hits

unique hits

>1-hit locus

max hits

time

(mins)

memory

(GB)

35_3

39584

15586

801

3824

13

3

29928

10898

256

4

94,28

29,32

35_5

28537

15586

936

2616

16

3

22460

10103

245

4

94,26

29,30

35_10

17075

11104

982

1377

14

3

13800

8201

185

5

94,24

29,26

Trans-ABySS

k_c

trans

max length

N50

>1-node trans

max nodes

avg nodes

total hits

unique hits

time

(mins)

memory

(GB)

35_3

91365

15586

898

50467

60

8

33600

10639

205,1

4,1

35_5

55164

10582

997

27763

46

7

25779

9944

195,1

4,1

35_10

28455

8865

929

13665

43

6

16032

8154

178,1

4,1

  1. The k-mer length is fixed to 35 because Oases is only capable of assembling these libraries on machines with 32 GB physical memory when k is large. For our postprocessing algorithm, the notations are the same as in Table 1. For Oases, locus denotes the number of predicted locus, max length denotes the length of the longest predicted transcript, N50 denotes the N50 value of the longest transcript length in a predicted locus, >1-trans locus denotes the number of predicted locus with more than one transcript, max trans denotes the maximum number of transcripts in a predicted locus, avg trans denotes the average number of transcripts in predicted locus with more than one transcript, total hits denotes the total number of hits from translated BLAST search of each predicted transcript to Drosophila (isoforms are considered the same gene, only the top hit with E-value below 10−7 is considered for each transcript in a predicted locus, and hits from transcripts within the same predicted locus to the same gene are counted once), unique hits denotes the number of unique hits to different genes, >1-hit locus denotes the number of predicted locus that has BLAST hits to more than one gene, max hits denotes the maximum number of different genes that have BLAST hits to a predicted locus, time (mins) denotes the computational time in minutes, with the values to the left and to the right of "," indicating the running time of Velvet (without setting cov_cutoff) and Oases respectively, and memory (GB) denotes the memory requirement in gigabytes, with the values to the left and to the right of "," indicating the memory requirement of Velvet (without setting cov_cutoff) and Oases respectively. For Trans-ABySS, trans denotes the total number of predicted transcripts, max length denotes the length of the longest predicted transcript, N50 denotes the N50 value of the length of predicted transcripts, >1-node trans denotes the number of predicted transcripts that are the concatenation of more than one node, max nodes denotes the maximum number of nodes in a predicted transcript, avg nodes denotes the average number of nodes in predicted transcripts with more than one node, total hits denotes the total number of predicted transcripts that have BLAST hits, unique hits denotes the number of unique hits to different genes, time (mins) denotes the computational time in minutes, with the values to the left and to the right of "," indicating the running time of ABySS and Trans-ABySS respectively, and memory (GB) denotes the memory requirement in gigabytes, with the values to the left and to the right of "," indicating the memory requirement of ABySS and Trans-ABySS respectively.