Multiple contigs aligning to the same
protein sequence. A. A histogram (log-scale) showing the number of contigs aligning to the same D. rerio protein sequence (with > 80% sequence coverage of the D. rerio sequence). While most contigs align to a single D. rerio sequence (7,692 sequences), 491 D. rerio proteins have alignments to exactly two contigs in the assembly and 34 D. rerio proteins have alignments to exactly three contigs (Additional file 7: Table S5). In addition, five D. rerio proteins have alignments to exactly 4 contigs, two D. rerio proteins align to five contigs, and two D. rerio proteins align to either 6 or 7 contigs (Table 4). B. Example reference D. rerio protein sequence, ubiquitin-conjugating enzyme E2 D2 (UBC4/5 homolog, yeast) (Q6PBX6_DANRE), which had four contigs in the assembly with alignments covering ≥80% of the Q6PBX6_DANRE sequence. Shown are the ORFs from the transcripts from each assembly contig with the longest alignment. Blue and green boxes show α-helices and β-strands, respectively, as determined from the X-ray crystal structure of the highly homologous human ubiquitin-conjugating enzyme E2 D2 (PDB ID: 2C4O). The active site cysteine is shown in red. All assembly ORFs contained a stop codon, which aligned with the C-terminal of Q6PBX6_DANRE, with the exception of comp170839_c0_seq6, which had a stop which eliminates the final helix. The ORF for comp105613_c0_seq1, as determined by TransDecoder, contained additional sequence in the N-terminal region, although this may not be translated since there was no evidence for an upstream start codon, and a methionine showed alignment in agreement with the other sequences. Alignment performed with ClustalW2: * (asterisk) = fully conserved residue; : (colon) = conservation between groups of strongly similar properties (>0.5, Gonnet PAM 250 matrix); . (period) = conservation between groups of weakly similar properties (≤0.5).