Venn diagram showing distribution of similarity search results. (a) The number of unique sequence-based annotations is the sum of unique best BLASTX hits from the NR (NR counts including the unique BLASTX hits from the plant proteins and Arabidopsis proteins), Uniprot and KEGG databases (E-value ≤ 1.0e-5), respectively. The overlap regions among the three circles contain the number of unigenes that share BLASTX similarity with respective databases. (b) The number of unique domain-based annotations is the integration of unique similarity search results against the InterPro, Pfam and COGs databases (E-value ≤ 1.0e-5), respectively. (c) Number of all annotated C. sinensis unigenes is figured out based on the summation of both unique sequence-based annotations and unique domain-based annotations. The circle "a" and "b" indicate the two subsets of C. sinensis unigenes with sequence-based annotations (53,966 counts in Figure 3a) and domain-based annotations (44,705 counts in Figure 3b), respectively.