Characterization of Pacific Biosciences data. a) Base error mode rate for deletions, insertions and mismatches. b) Length distribution of reads in the Pacific Biosciences discovery dataset (here some raw reads are as long as 5,000 bases). c) Pacific Biosciences error rate by position. Shown are all errors (mismatch, insertion and deletion) by base position, including every base sequenced despite any previously known variation (this is why the average is slightly higher than 15%). Due to the diminishing number of reads with bases beyond 1000 we only plot here positions up to 1000. d-f) GC bias of the Pacific Biosciences instrument represented by the genomes of P. falciparum (low GC), E. coli (average GC) and R. sphaeroides (high GC) shows good balance in GC coverage where there is sufficient data in the genome, regardless of GC content.