Skip to main content

Table 1 Validation of final assembly and comparison of long read assembler performance

From: Telomere length de novo assembly of all 7 chromosomes and mitogenome sequencing of the model entomopathogenic fungus, Metarhizium brunneum, by means of a novel assembly pipeline

 

Raw reads

FMLRC/Canu trimmed corrected reads

Flye assemblies of corrected reads

Assembler/read correction

Canu

Flye

Miniasm/Minipolish

NECAT

Raven

Shasta

wtdbg2

Canu

Raven

Shasta

wtdbg2

Ratatosk corrected

NECAT corrected

Canu corrected

# contigs

101

37

30

11

23

158

36

5184

33

29

39

12

12

20

# telomere length contigs

0

0

1

4

0

0

0

0

0

0

0

2

3

1

# telomere ends

13

10

10

14

2

1

3

63

1

0

4

12 (1 internal)

12

12

Largest contig

7,044,699

7,995,000

7,469,658

7,468,681

7,206,934

4,617,081

5,443,409

272,978

5,081,705

11,177

7,124,112

9,037,030

9,016,449

8,996,444

Total length

38,376,550

37,703,727

37,958,957

37,735,059

37,936,851

37,183,651

37,096,535

103,756,147

37,639,251

104,717

37,144,653

37,798,138

37,733,634

37,685,204

N50

1,881,118

3,318,636

3,131,456

4,285,476

3,481,404

1,876,944

3,054,299

24,679

2,314,652

5391

2,990,362

5,751,808

4,624,294

4,146,824

N75

767,395

2,040,015

2,377,107

3,139,426

2,013,237

930,220

1,521,719

14,748

1,029,583

3459

1,523,795

4,158,166

4,289,507

1,999,248

# misassembled contigs

0

1

0

0

2

0

0

–

2

–

2

1

0

0

Genome fraction (%)

99.478

99.56

99.711

99.972

99.836

97.935

98.252

99.898

99.257

0.269

98.296

99.883

99.916

99.824

Duplication ratio

1.021

1.002

1.008

0.999

1.005

1.005

0.998

2.744

1.004

1.03

1

1.001

1

0.999

# predicted genes

10,951

11,041

11,116

11,040

11,171

10,897

10,418

31,240

11,219

–

11,163

11,289

11,073

10,855

Total aligned length

38,336,586

37,639,589

37,919,257,

37,732,682

37,899,111

37,159,837

36,997,473

103,544,164

37,609,029

104,319

37,120,414

37,774,636

37,704,114

37,680,427

  1. The final polished assembly was compared to assemblies produced by alternate long read assemblers. Mis-assemblies were detected by blasting the final chromosomes against each assembly in bandage and telomere presence was assessed by blasting searching for the telomere sequence TTAGGGn5