Zmat2 in mammals: conservation and diversification among genes and Pseudogenes

Background Recent advances in genetics and genomics present unique opportunities for enhancing our understanding of mammalian biology and evolution through detailed multi-species comparative analysis of gene organization and expression. Yet, of the more than 20,000 protein coding genes found in mammalian genomes, fewer than 10% have been examined in any detail. Here we elucidate the power of data available in publicly-accessible genomic and genetic resources by querying them to evaluate Zmat2, a minimally studied gene whose human ortholog has been implicated in spliceosome function and in keratinocyte differentiation. Results We find extensive conservation in coding regions and overall structure of Zmat2 in 18 mammals representing 13 orders and spanning ~ 165 million years of evolutionary development, and in their encoded proteins. We identify a tandem duplication in the Zmat2 gene and locus in opossum, but not in other monotremes, marsupials, or other mammals, indicating that this event occurred subsequent to the divergence of these species from one another. We also define a collection of Zmat2 pseudogenes in half of the mammals studied, and suggest based on phylogenetic analysis that they each arose independently in the recent evolutionary past. Conclusions Mammalian Zmat2 genes and ZMAT2 proteins illustrate conservation of structure and sequence, along with the development and diversification of pseudogenes in a large fraction of species. Collectively, these observations also illustrate how the focused identification and interpretation of data found in public genomic and gene expression resources can be leveraged to reveal new insights of potentially high biological significance.


Background
Of the more than 20,000 protein coding genes found in human and in other mammalian genomes, fewer than 10% have been studied in any detail [1][2][3]. This is true despite that fact that ready access to public genomic and gene-expression databases [4] means that nearly any gene is available for intensive analysis from the molecular and cellular to the individual and population levels [5][6][7][8][9][10]. Part of this disparity may reflect social or historical reasons, but it also is likely that direct association with human diseases and the ready availability of experimental models influences decisions to gravitate toward scientific areas that appear more amenable to higher profile publications or grant funding [2,3].
ZMAT2 is an excellent example of a gene that had essentially been unstudied until late 2018 [11]. ZMAT2, which encodes a protein that contains a zinc finger domain, is part of a 5-gene family of limited intra-familial amino acid similarity except for the zinc finger region. The lack of interest in this gene is potentially surprising, since it is the ortholog of Snu23, a yeast protein that plays an important role in the spliceosome [12], an essential molecular machine in eukaryotes that removes introns from primary gene transcripts [13]. Although human ZMAT2 also has been mapped to the spliceosome in structural biological studies [14], even this observation has not much generated interest in the protein.
Here, by using information extracted from public repositories, we have studied Zmat2 genes and proteins from a broad group of 18 mammalian species comprising 13 orders, and representing~165 million years (Myr) of evolutionary diversification [15][16][17][18]. Our results show extensive conservation in coding regions of these genes and in their encoded proteins, define a collection of Zmat2 pseudogenes in half of the mammals studied, and identify one mammal in which Zmat2 has undergone a tandem duplication. Our observations provide an illustration of how the focused application and analysis of data found in publicly-available genomic and gene expression resources can be leveraged to reveal new insights of potentially high biological significance.

Mammalian ZMAT2/Zmat2 genes are poorly annotated in genomic databases
Human ZMAT2 is an ortholog of yeast Snu23, a zincfinger-containing protein that is a key component of the spliceosome [12], the molecular machine responsible for the removal of introns from primary gene transcripts [13]. The human ZMAT2 gene has been incompletely characterized in the Ensembl and UCSC genomic repositories. We thus mapped the gene and its transcripts and protein (Fig. 1, Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted). Based on these results, which also revealed that 6-exon human ZMAT2 and its encoded 199-residue protein was highly conserved Fig. 1 Organization of the human ZMAT2 locus and gene. a. Diagram of the human HARS-HARS2-ZMAT2 locus on chromosome 5. Exons are depicted by lines and boxes (red for HARS, blue for HARS2, black for ZMAT2), with coding regions solid and non-coding regions white. The direction of transcription for each gene is indicated and a scale bar is shown. b. Map of the human ZMAT2 gene. Coding regions are in black and noncoding segments in white. A scale bar is shown. c. Diagrams of human ZMAT2 mRNA, as characterized in (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted). Coding regions are labeled in black and non-coding segments in white. The length is indicated in nucleotides (nt) as are the number of codons in the open reading frame. d. Schematic of human 199-residue ZMAT2 protein, with NH 2 (N) and COOH (C) terminal (term), and zinc finger (ZnF) regions labeled and color-coded among non-human primates (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted), we now sought to extend knowledge about Zmat2 by defining it in other mammalian species.
A preliminary examination within Ensembl revealed that the assignments of mammalian Zmat2 genes were even more incomplete than was observed for human ZMAT2, not only for the 18 species chosen here to cover a range of mammalian orders, but also for most of the mammalian and non-mammalian vertebrates in which Zmat2 has been identified in their genomes in Ensembl. For example, 5′ untranslated regions (UTRs) in exon 1 were described in only 6 of 18 species, and 3′ UTRs in exon 6 in only 7 of 18 species (Table 1). We thus developed an iterative strategy to define these genes, in which mouse Zmat2 was initially characterized in detail. Its exons then were used to perform homology searches in other mammalian genomes. As needed, these queries were supplemented by individual comparisons with Zmat2 cDNAs when available in the National Center for Biotechnology Information (NCBI) nucleotide database (cDNAs were listed in this resource for only 6 different species; see Methods), and by secondary searches using Zmat2 gene segments from species that were evolutionarily more similar to specific target species (e.g., using koala exon 1 to identify opossum exon 1). Most importantly, a final series of studies used the resources of the NCBI Sequence Read Archive (SRA) to map the putative 5′ and 3′ ends of each gene by analysis of expressed transcripts [19,20]. As described below, results revealed substantially higher levels of gene complexity and completeness than had been found in the data curated by Ensembl.
The mouse Zmat2 gene A search of Ensembl revealed that mouse Zmat2 appeared to be a 6-exon gene on chromosome 18, and like human ZMAT2 was located adjacent to Hars2 in the same transcriptional orientation (compare Fig. 2a and Fig. 1a). Of two proposed mouse Zmat2 transcripts in Ensembl, only one was stated to include all 6 exons (Fig. 2b) and to encode a protein of 199-amino acids, while the other was thought to include parts of 3 exons and a retained intron (see: https://useast.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000001383;r=18:36793876-3 6799666). Inspection of the presumptive full-length Zmat2 transcript revealed a proposed 5′ UTR of 66 base pairs (Table 1), that could not be extended by comparison with   Table S1), using 60 base pair genomic segments a-e, and a-f, respectively, as probes. The DNA sequence below the left graph depicts the putative 5′ end of exon 1, with the locations of the 5′ end of the longest RNA-sequencing clone indicated by a vertical arrow. Shown below the right graph is the DNA sequence of the putative 3′ end of exon 6. A potential polyadenylation signal (AATAAA) is underlined and a vertical arrow denotes the possible 3′ end of Zmat2 transcripts. d. Diagram of the mouse Zmat2 mRNA. Coding regions are in black and non-coding segments in white. The length is indicated in nucleotides (nt), as are the number of codons in the open reading frame. e. Schematic of the mouse ZMAT2 protein, with NH 2 (N) and COOH (C) terminal (term), and zinc finger (ZnF) regions labeled and color-coded Zmat2 cDNA NM_025594 from the NCBI nucleotide database (5′ UTR of 19 base pairs).
Direct analysis of mouse Zmat2 gene expression using RNA-sequencing libraries from liver and keratinocytes (Additional file 1: Table S1) revealed that transcripts containing Zmat2 exon 1 were expressed at low levels (read counts of no more than 2 sequences per probe, Fig. 2c). Nevertheless, examination of these libraries revealed that exon 1 was at least 96 nucleotides in length (Fig. 2c). However, no potential TATA boxes, which position RNA polymerase II at the start of transcription [21], or initiator elements, which function similarly [22], were found adjacent to this transcript. Thus, the 5′ end of the mouse Zmat2 gene remains tentatively mapped.
Similar studies using probes from different parts of exon 6 showed that this exon was 1774 nucleotides in length, and thus was~14 nucleotides shorter than stated in Ensembl. The 3′ end of exon 6 contained an 'AATAAA' presumptive poly A recognition sequence, and a poly A addition site [23] was mapped 7 base pairs further 3′ (Fig. 2c), thus supporting our analysis. Taken together, these results describe a 6-exon mouse Zmat2 gene of 5786 base pairs in length (Table 2), that is transcribed and processed into a mRNA of 2306 nucleotides (Fig. 2d), and that encodes a 199-amino acid ZMAT2 (Fig. 2e).

The Zmat2 gene in other mammals
By searching genome databases with mouse exons, the few homologous cDNAs, and in selected cases, exons from closely related species, Zmat2 was characterized in 17 other mammals representing 9 different orders, and spanning~165 Myr of evolutionary history. These other mammalian Zmat2 genes also all appeared to consist of 6 exons (Fig. 3, Table 2), and when their 5′ and 3′ ends were mapped using species-homologous RNAsequencing libraries (Additional file 1: Table S1, Additional file 2: Table S2, Additional file 3: Figure S1, Additional file 4: Figure S2 and Additional file 5: Figure  S3), their overall structures closely resembled mouse Zmat2 (Fig. 3, Table 2). In particular, there was perfect congruence in the lengths of coding exons 2-5 ( Table  2), and high levels of DNA sequence identity (84.3 to 97.8%, Table 3). Total gene sizes varied over a 2-fold range, from 5477 base pairs in megabat to > 10,457 base pairs in dog, with most of the differences attributable to longer or shorter 3′ UTRs in exon 6 and to some variation in intron lengths ( Table 2).
DNA conservation also was relatively high for Zmat2 exon 1 among the mammals studied (87.1 to 96.8% identity, Table 3), even though it is comprised primarily of 5′ UTR. The exception here is opossum (55.8 and 56.8% identity, Table 3 and see below). Exon 6 was more  Tables 2 and 3 for more details dissimilar among the different species (Table 3), particularly in the noncoding segments (e.g., no identity in Tasmanian devil or koala).
The opossum genome contains tandem Zmat2 genes Initial screening of the opossum genome revealed several sets of DNA sequences with comparable levels of identity with mouse Zmat2 exons 2-5 (84.9 to 94.3%, Table  3). Two of these groups of DNA segments were distributed to adjacent locations in the opossum genome, and when compiled and evaluated in detail (including identifying exon 1 by using koala Zmat2 exon 1) consisted of tandem genes that were oriented 'head-to-head' in divergent transcriptional direction (Fig. 4a). Further analysis showed that the 5′ ends of exon 1 of both genes potentially overlapped (Fig. 4a, b), that exons 1 through 5 were 99.73% identical, that the lengths of exon 6 matched each other and that they were 99.9% identical in DNA sequence ( Fig. 4b and not shown). By using probes that differed by a single nucleotide (Additional file 2: Table  S2) to screen an RNA-sequencing library, we found that both opossum Zmat2 genes were expressed, at least in liver, with transcripts for gene 1 being more abundant than those for gene 2 (Fig. 4c). Moreover, both opossum Zmat2 mRNAs were the same length (Fig. 4d), and they encoded proteins that varied by a single amino acid (valine at position 128 in protein 1, and methionine in protein 2 (Fig. 4e).
Multiple Zmat2 pseudogenes arose independently in different mammalian genomes Screening of different mammalian genomes with individual mouse Zmat2 exons led to the identification of additional related DNA sequences in nine species (rat, guinea pig, rabbit, dog, dolphin, microbat, megabat, opossum, and platypus; Table 4). The levels of similarity with mouse Zmat2 exons ranged from 80.1 to 93.4% identity (Table 4). In rat, rabbit, dog, dolphin, megabat, microbat, and opossum paralogs of all 6 Zmat2 exons were detected, and except for rabbit, were composed of continuous DNA sequences (Table 4, Fig. 5). In the latter an unreadable DNA segment of~406 nucleotides separated 'exons' 2 and 3. These 'full-length' DNAs thus appeared to be pseudogenes that resembled processed mRNAs, and that presumably were retro-transposed as DNA copies back into the respective genomes [24]. In guinea pig, paralogs of only 'exons' 4 through 6 could be found, in platypus, individual representations of 'exon 2' and 'exon 3' mapped to different locations in the genome, and in rat two copies of 461 base pairs of 'exon 6' were found in different parts of the X chromosome (87.4% identity with the corresponding portions of the mouse exon, Table 4). The two putative Zmat2   Table S1), using 60 base pair genomic segments a-e, and a-f, respectively, as probes. Shown below the right graph is the DNA sequence of the putative 3′ end of exon 6. A potential polyadenylation signal (AATAAT) is underlined and a vertical arrow denotes the possible 3′ end of Zmat2 transcripts. c. Gene expression data from SRX3040092 for each opossum Zmat2 gene, using probes for exons 1 + 2, and exon 6 (Additional file 2: Table S2) that discriminate between transcripts from Zmat2-1 and Zmat2-2.  Fig. 5b). In dolphin, in which two of the four pseudogenes encoded 199-codon open reading frames (Fig. 5c), one was predicted to be identical to authentic ZMAT2, while the other matched it in 185/199 residues (Fig. 5d).
Previous studies have shown that some potential pseudogenes for the human protein phosphatase 1 regulatory subunit (PP1R2) are transcribed and thus are not actually pseudogenes since they are expressed as RNAs [25]. To determine whether or not any mammalian Zmat2 pseudogenes are functional, their gene expression was examined by querying RNA-sequencing libraries. As shown for rat, rabbit, guinea pig, dog, dolphin, megabat, and opossum, no transcripts could be detected in these libraries even though in all cases authentic Zmat2 mRNA was readily expressed (Fig. 6a-g; no microbat RNA sequencing library was available in the NCBI SRA).
Phylogenetic analysis of all 13 'full-length' Zmat2 pseudogenes from 7 different mammals (including marmoset (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted), Table 4) demonstrated that the DNA sequence of each pseudogene was more closely related to the paralog or paralogs from the homologous species than to other Zmat2 pseudogenes (Fig. 5e), suggesting that these retro-transposition events each arose independently after the divergence of each species from their nearest mammalian ancestors.

ZMAT2 protein sequences are highly-conserved among mammals
ZMAT2 was identical to the mouse and human protein in ten species studied here (Table 5, Fig. 7a, b). In each of the other 8 species, only one or two amino acid substitutions was found, except for platypus, in which the NH 2 -terminus of the protein could not be established because of incomplete genomic sequence (Fig. 7). Phylogenetic mapping further showed that marsupial ZMAT2 proteins clustered together, as all were identical except for opossum 2 (Fig. 7b). Of note for all variant ZMAT2 proteins, the altered amino acids were located throughout the protein, but none were found in the zinc finger domain (Fig. 7a).

Discussion
The focus of this study was to characterize Zmat2 genes in mammals by analyzing data available in genomic and gene expression repositories, and to place these findings in an evolutionary context. Prior to this and to our recent report (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted), there had been no publications on ZMAT2/Zmat2 genes from any species, despite the significance of the protein in the fundamentals of eukaryotic pre-RNA splicing [12,14]. Our main observations here have included, first, demonstrating that 6exon Zmat2 is a single-copy gene in all mammals studied, except for opossum, in which a gene duplication event occurring after the divergence of monotremes from other marsupials~80 Myr ago [15,26] has led to paired tandem Zmat2 genes (Fig. 4). Second, we have elucidated the presence of Zmat2 pseudogenes in at least ten different mammalian species, have ) and have shown that they appear to have arisen recently in these genomes (Fig. 5e); and third, we have found that the ZMAT2 protein is highly conserved among mammals (Table 5, Fig. 7). Importantly, our data demonstrate that a strategy involving the focused and complementary examination of genomic and gene expression databases can lead to new insights about mammalian biology and gene evolution, and illustrate how investigating unstudied genes can lead to the development of new experimentally-testable hypotheses.
The Zmat2 gene and pseudogenes in mammals The data described and examined here define Zmat2 as a 6-exon gene in 18 different mammalian species representing 9 orders (Tables 2, 3, Figs. 3, 4). They are thus very similar to their human and non-human primate orthologs in terms of both gene organization and the encoded ZMAT2 protein (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted), supporting the idea that the protein plays a conserved and potentially essential role in pre-RNA splicing and possibly in keratinocyte differentiation (see below). Pseudogenes have been described in both prokaryotes and eukaryotes [27], and are fairly common in the human and in other mammalian genomes [27]. Preliminary analysis of data generated by ENCODE, performed nearly a decade ago had suggested that there are more than 10,000 pseudogenes in the human genome, comprising~0.7% of the DNA sequence [28]. Among these pseudogenes,~77.5% were thought to represent processed mRNAs that had been retro-transposed as individual DNA copies into the genome, and the other~22.5% were thought to be the result of gene duplication events [28].
Zmat2 pseudogenes could be identified in about half of the mammals studied here, and in all evaluable cases were not expressed in organs or tissues in which authentic Zmat2 could be detected readily (Fig. 6), thus marking them as 'real' pseudogenes, unlike what was shown recently for human PP1R2, in which at least four previously identified pseudogenes were transcribed, and thus should be considered as genes [25]. Remarkably, the number of Zmat2 pseudogenes varied among these species, ranging from 1 to 4 per mammal (Table 4, Fig. 5). In addition, although most Zmat2 pseudogenes contained components of all 6 Zmat2 exons, in the guinea pig genome, the pseudogene was composed of exons 4-6, and in platypus, copies of exon 2 and exon 3 were located on different genome segments (Table 4). In the rat genome, two partial copies of 461 nucleotides of Zmat2 exon 6 were found in different locations on the X chromosome, but these were not detected in any of the other mammals studied (Table 4). While the fulllength pseudogenes seem likely to have arisen via retrotransposition of mRNAs as DNA copies back into the respective genomes [24], the origins of the partial Zmat2 gene sequences in guinea pig, platypus, and rat are unclear. Since Zmat2 pseudogenes were not identified in half of the mammals analyzed here, and since phylogenetic analysis of the 'full-length' pseudogenes indicated that they were more similar to their paralogs than to any orthologous DNA sequences in other mammals (Fig.  5e), it seems likely that they arose independently in each species subsequent to its evolutionary divergence from its closest ancestors.

ZMAT2 proteins
ZMAT2 proteins are remarkably similar to one another in the mammalian species examined in this manuscript. Only 7 amino acid substitution variants were detected, with none found in the zinc finger domain. Including human and non-human primate ZMAT2, the protein was identical in 18/27 different mammals, and at most a variant protein in a given species contained 2 amino acid differences (Table 5, (See figure on previous page.) Fig. 5 Mammalian genomes contain multiple Zmat2 pseudogenes. a. Schematic of the two Zmat2 pseudogenes in the microbat genome. The color-coding indicates regions of each pseudogene that are similar in DNA sequence to individual coding segments of authentic Zmat2 (redexon 2; blueexon 3; yellowexon 4; greenexon 5; pinkcoding region of exon 6). The white areas depict segments similar to the 3′ UTR of authentic Zmat2 exon 6 in each pseudogene. A scale bar is shown. b. Alignment of amino acid sequences of microbat ZMAT2 and the predicted pseudogene protein (Z1). Similarities and differences are shown, with identities being indicated by asterisks. Differences are marked in red text. The blue text denotes the two amino acids that are different from mouse or human ZMAT2 (also see Fig. 7). c. Schematic of the four Zmat2 pseudogenes in the dolphin genome. The color-coding indicates regions of each pseudogene that are similar in DNA sequence to individual exons of authentic Zmat2, as per part a above, and the white areas depict segments similar to the 3′ UTR of authentic Zmat2 exon 6 in each pseudogene. A scale bar is shown. d. Alignment of amino acid sequences of dolphin ZMAT2 and predicted pseudogene proteins (Z1 and Z3). Similarities and differences are shown, with identities being indicated by asterisks. Differences also are marked in red text. e. Phylogenetic tree of mammalian Zmat2 pseudogenes. The data on marmoset are from (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted). The scale bar indicates 0.01 substitutions per site and the length of each branch approximates the evolutionary distance Fig. 6, and (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted)), although, in platypus, the NH 2 -terminus of the protein could not be characterized because of poor quality genomic DNA sequence. In addition, we had shown recently that ZMAT2 is remarkably non-polymorphic in humans (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted), with only 41 different potential codon changes identified that predicted amino acid substitutions in over 280,000 alleles found in the gnomAD Fig. 6 Lack of expression of mammalian Zmat2 pseudogenes using analysis of RNA-sequencing libraries. Gene expression data were obtained by querying species-specific RNA-sequencing libraries with DNA probes that detect differences between mammalian Zmat2 genes and potential pseudogenes. See Additional file 1: Table S1 for the list of RNA-sequencing libraries and Additional file 2: Table S2 for the DNA probes. a. Rat; b. Guinea pig; c. Rabbit; d. Dog; e. Dolphin; f. Megabat; g. Opossum project [29], corresponding to just 0.014% of the alleles in the entire study population (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted). This level of variation in the human population is 6-90-fold lower than detected previously for at least 19 other human genes [30][31][32]. Moreover, and unlike these other genes [30][31][32], no frame shift or splicing site alterations were found in human ZMAT2 (Baral K, Rotwein P: The story of ZMAT2: a highly conserved and understudied human gene, manuscript submitted). One possibility for the high level of conservation of ZMAT2 among mammals is that the protein plays a key role in pre-mRNA splicing. ZMAT2 and its yeast homolog Snu23 have been found in the spliceosome [12,14], and based on structural data, the protein has been postulated to facilitate activation of the U6 snRNP at the 5′ splice site of the intron [14]. Human ZMAT2 also may have a more specialized function, as it was described as a negative regulator of human keratinocyte differentiation, potentially by blocking the splicing of selected primary gene transcripts [11]. Defining the specific functions of ZMAT2 by genetic or other approaches in one or more tractable organisms will be an important topic for future study.

Conclusions
Stitching together genes in pieces: improving the quality of genome resources Publicly available genomic databases contain extensive information on genes from many species, and are valuable resources for the entire scientific community. Unfortunately, as shown here, the quality of available information in certain circumstances is very poor. In nearly two-thirds of the species studied here, the annotated Zmat2 gene in Ensembl lacked either 5′ or 3 UTRs, or both (Table 1), and in some cases could be identified only by screening with exons from other mammals. These types of problems may be quite common, and appears to be the norm for Zmat2 genes from other mammalian and non-mammalian vertebrates in Ensembl. Poor annotation also has been described for several other genes in multiple species [19,33]. Ideally, the data quality in these genomic repositories should be nearly perfect, not only to enhance the opportunity for future discoveries, but also to minimize the propagation of false information in scientific publications.

Final comments
It has been estimated that only a tiny fraction of the2 0,000 human protein coding genes has been evaluated [1][2][3]. In fact, a recent report has suggested that~90% of human genes are understudied [3], including several, such as ZMAT2, that have been the main topic of only a single publication [11]. It is likely that these statistics are more dismal for genes in other mammals and in nonmammalian vertebrates, even including species such as mouse and zebrafish that are favorites of experimentalists [34,35]. Certainly, a concerted effort to broaden discovery horizons by focusing on understudied and unstudied genes could lead to new insights of potentially high biological and biomedical significance.
Mapping the 5′ and 3′ ends of Zmat2 genes Inspection of ZMAT2 and its proposed mRNAs in the Ensembl genome database revealed for most species Fig. 7 Mammalian ZMAT2 proteins. a. Alignments of amino acid sequences of ZMAT2 proteins from selected mammalian species are shown in single letter code. Identities and differences among species are indicated, with identities labeled by asterisks. Dashes indicating no residue have been placed to maximize alignments. The red text depicts differences from the mouse protein. The zinc finger region in highlighted. b. Phylogenetic tree of ZMAT2 in mammals. The protein sequences not shown in a are identical to mouse ZMAT2, as can be seen in the tree. The scale bar indicates 0.01 substitutions per site and the length of each branch approximates the evolutionary distance