Skip to main content

Table 1 Statistics of the simulated transcriptome assemblies of Drosophila using its known complete genome over different values of k and k-mer coverage cutoff c with 0.1% mismatches in the reads.

From: A memory-efficient algorithm to obtain splicing graphs and de novoexpression estimates from de Bruijn graphs of RNA-Seq data

k_c

initial nodes

largest tangle

largest SCC

splicing graphs

max length

N50

>1-node graphs

max nodes

avg nodes

SNPs

total hits

unique hits

>1-hit graphs

max hits

time (mins)

memory (GB)

25_3

38884

17900

9937

15713

37380

2366

1361

3106

10

883

12731

10162

643

27

80,3

21,2

25_5

34822

16979

9255

15521

37380

2374

1351

266

7

517

12708

10160

643

27

80,3

21,2

25_10

34494

16712

9057

15486

37380

2373

1345

194

7

481

12699

10158

639

27

80,3

21,2

31_3

28342

5037

2080

13819

45158

2704

1719

1007

7

496

12523

11112

546

12

76,3

18,2

31_5

27307

4971

1898

13740

45158

2714

1717

167

6

381

12494

11110

552

13

76,3

18,2

31_10

27265

4947

1885

13829

45158

2704

1698

161

6

377

12536

11109

542

13

76,3

18,2

  1. Initial nodes denotes the number of nodes that are in the initial assembly. Largest tangle denotes the number of nodes of the largest connected component. Largest SCC denotes the number of nodes of the largest strongly connected component. Splicing graphs denotes the number of splicing graphs. Max length denotes the length (in nucleotides) of the longest path over all splicing graphs. N50 denotes the N50 value of the length (in nucleotides) of the longest path in each graph. >1-node graphs denotes the number of graphs with more than one node. Max nodes denotes the maximum number of nodes in these non-linear graphs. Avg nodes denotes the average number of nodes in these non-linear graphs. SNPs denotes the number of SNPs recovered. Total hits denotes the total number of hits from translated BLAST search of each node to Drosophila (isoforms are considered the same gene, only the top hit with E-value below 10−7 is included for each node in a splicing graph, and hits from nodes within the same splicing graph to the same gene are counted once). Unique hits denotes the number of unique hits to different genes. >1-hit graphs denotes the number of splicing graphs that have BLAST hits to more than one gene. Max hits denotes the maximum number of different genes that have BLAST hits to a splicing graph. Time (mins) denotes the computational time in minutes, with the values to the left and to the right of "," indicating the running time of Velvet and our postprocessing algorithm respectively. Memory (GB) denotes the memory requirement in gigabytes, with the values to the left and to the right of "," indicating the memory requirement of Velvet and our postprocessing algorithm respectively.