A maximum likelihood tree of our CYP97 database and some other CYP97s from higher plants. The sequences information of CYP97s from higher plants was downloaded from NCBI database and summarized as follow: Arabidopsis thaliana [GenBank: CYP97A3, gb|AEE31394.1, CYP97B3, gb|AEE83557.1, CYP97C1, sp|Q6TBX7.1 and CYP86A, AED97111.1], Zea mays [GenBank: CYP97A16, ACG28871.1], Glycine max [GenBank: carotene epsilon-monooxygenase, XP_003537025.1], Solanum lycopersicum [GenBank: CYP97C11, NP_001234058.1] and Oryza sativa Japonica Group [GenBank: carotene epsilon-monooxygenase, AAK20054.1]. A partial protein sequence (position: 272–926) has been selected for phylogenetic analysis. A maximum likelihood phylogenetic tree (loglk = −27808.68723) as inferred from amino acid sequences (655 amino acid characters) of CYP97 proteins was computed using LG model for amino acid substitution (selected by PROTTEST) with discrete gamma distribution in four categories. All parameters (gamma shape = 1.924; proportion of invariants = 0.011; number of categories: 4) were estimated from the dataset. Numbers above branches indicate ML bootstrap supports. ML bootstraps were computed using the above mentioned model in 300 replicates. The arrow indicates an ancient gene duplication event creating CYP97A/C, respectively. Stars indicate where later gene duplications led to creation of paralogs genes found within one species. Black circle indicate two genes belonged to no one subfamily of CYP97. Major groups of organisms are labeled to allow comparison between the phylogeny of CYP97A/B/C and algae evolution.