Comparison of protein length and codon usage for the unknown and known regions. (a) The length distributions of annotated proteins is visualized as Gaussian kernel density estimates. Based on the distributions we see no reason to suspect that the protein sequences from the two unknown regions are the result of random ORFs. (b) No differences in the relative usage of alternative codons for amino acids are observed between annotated CDSs from either of the two unknown regions and CDSs annotated in the known regions. This strongly indicates that the majority of the annotated novel genes in the unknown regions are true protein coding genes.