Genome-wide identification and analysis of class III peroxidases in Betula pendula

Class III peroxidases (POD) proteins are widely present in the plant kingdom that are involved in a broad range of physiological processes including stress responses and lignin polymerization throughout the plant life cycle. At present, POD genes have been studied in Arabidopsis, rice, poplar, maize and Chinese pear, but there are no reports on the identification and function of POD gene family in Betula pendula. We identified 90 nonredundant POD genes in Betula pendula. (designated BpPODs). According to phylogenetic relationships, these POD genes were classified into 12 groups. The BpPODs are distributed in different numbers on the 14 chromosomes, and some BpPODs were located sequentially in tandem on chromosomes. In addition, we analyzed the conserved domains of BpPOD proteins and found that they contain highly conserved motifs. We also investigated their expression patterns in different tissues, the results showed that some BpPODs might play an important role in xylem, leaf, root and flower. Furthermore, under low temperature conditions, some BpPODs showed different expression patterns at different times. The research on the structure and function of the POD genes in Betula pendula plays a very important role in understanding the growth and development process and the molecular mechanism of stress resistance. These results lay the theoretical foundation for the genetic improvement of Betula pendula.


Background
Peroxidases or peroxide reductases (POD, EC number 1.11.1.x) are a large group of oxidases existing in animals, plants and microorganisms, which catalyzes the oxidation of a particular substrate by hydrogen peroxide [1]. Among them, class III peroxidases are plant specific oxidoreductases, which are extremely widespread presence in the plant kingdom [2]. The Class III peroxidase in plants are also reported as POX [3,4], GPX [5], Prx [6], ClassIII PRX [7], and POD [8,9]. Most plant species contain dozens of Class III peroxidases, for example, switchgrass [7] genome contains more than 200 POD coding genes, and Populus [10], rice and Arabidopsis contain 93, 138 and 73 members of POD family, respectively [6,11].
POD are secreted peroxidase derived from higher plants, participate in a variety of physiological processes in the whole plant life cycle [12]. Recent studies indicate that POD has two most important functions in plants: on the one hand, it is related to the normal morphogenesis of plants and plays a role in the growth and development of plants. On the other hand, it is related to the resistance of plants, including disease resistance, cold resistance, drought resistance, etc., and it is one of the important protective enzymes in plants [13,14]. Although it is known that POD play a key role in cell growth and response to abiotic stress, the specific function of each member of the family is still elusive. Therefore, it is very important to study the molecular mechanisms of POD in plant development and stress resistance [15].
Gene family is a group of genes derived from the same ancestor, which are composed of two or more copies of a gene through gene doubling or duplication [16]. During the last decade, several molecular biology approaches have been developed to isolate, characterize and study the expression of POD gene family in plants [6]. Betula pendula is a pioneer boreal tree that can be induced to flower within 1 year [17], it plays an important role in people life [18,19]. However, so far, there has been no report about the POD gene family in B. pendula. It has been shown that POD is related to the synthesis of lignin [20] and cork [21,22], and lignin is considered as an important defense means against invasion and expansion of pathogens [23,24]. At the same time, a large number of experimental evidences of stress treatment showed that under the stress of drought and low temperature, the expression of POD increased significantly [25,26].
Since Betula pendula is a widespread species and has many applications in the pulp and paper industry, it is necessary to study its development and physiology [27]. Understanding the role of POD family in lignin synthesis and resistance to biotic and abiotic stresses in B. pendula, it will contribute to its application in industrial production [28]. Fortunately, with the completion of the whole genome sequencing of B. pendula [29,30], bioinformatics analysis of the POD gene family in B. pendula at the genome level has become possible.
In the study, we used bioinformatics methods to identify POD gene family members in B. pendula from the genomic level, and analyzed their protein physical and chemical properties, subcellular localization, evolutionary relationship, conserved motifs and other information [31]. Our study provides important insights for further study of the potential role of POD gene family in B. pendula growth and development.

Identification of POD genes
To identify members of POD family in B. pendula, we used the 73 POD genes of Arabidopsis to obtain the best hits in the B. pendula genome by BLASTP. A total of 90 putative PODs were identified in the B. pendula genome. We further examined the conserved domains of proteins encoded by these genes using Pfam [32] and SMART [33] databases. The results revealed that all the genes have classical POD domain structures, which demonstrate the reliability of the results. The B. pendula genome contains more PODs than Arabidopsis (73) [6], but fewer than Populus trichocarpa (93) [34], Pyrus bretschneideri (94) [31], and rice (138) [11]. We defined the BpPODs as BpPOD1 to BpPOD90. The isoelectric points (PI) ranged from 4.28 to 9.6, and 46 POD proteins were greater than 7.5. In addition, subcellular locations of these BpPODs are mainly in the cytoplasm, cell membrane, vacuole, chloroplast and nucleus. The subcellular location, molecular weight (MW) and other information of each BpPOD genes was listed in Table 1.

Phylogenetic analyses of POD gene family in B. pendula
To investigate the evolutionary relationships, we performed multiple sequence alignment of POD family genes in B. pendula and Arabidopsis, and constructed the phylogenetic tree by MEGA 7.0 software (Fig. 1). The BpPOD proteins were classified into 12 groups with high bootstrap probabilities, designated group I to group XII. The POD genes of each subgroup is unevenly distributed, with the number of members varies from 4 to 15. Subgroup VIII contains the most members (15), subgroup X, XI, XII contains the least number of members, with only 4 members.

Gene structures
To understand the structural diversity of the POD genes, exon-intron analysis was performed in BpPODs (Fig. 2). The result reveals several variations, in terms of the number of introns, BpPODs contains one to six introns, and some members contain three introns. Noteworthy, there were no introns in five BpPODs (BpPOD9, BpPOD11, BpPOD16, BpPOD57 and BpPOD61). In addition, BpPOD76 and BpPOD87 have the most introns (6), followed by BpPOD24 and BpPOD51 (5). Moreover, we found that the genes of the same group are similar in gene structure. For example, BpPOD20, BpPOD22 and BpPOD82 have three exons and two intron, both of which belong to Group V; BpPOD73 and BpPOD74 have two exons and one intron, both of which belong to Group XI [35].

Analysis of conserved amino acid motifs
To understand the functional regions of BpPODs, conserved amino acid motifs analyses of BpPOD proteins were performed. A total of eight conserved amino acid motifs were identified in the BpPOD proteins (Fig. 3). All BpPOD proteins contain at least one conserved amino acid motif. For example, BpPOD55 only contains motif 8, BpPOD83 contains motif 1 and 7, while BpPOD10 proteins contain all the eight conserved amino acid motifs.
The conserved motifs of POD proteins clustered in the same group are similar in composition, indicating that these members have close evolutionary relationships [36]. In addition, most members of BpPOD proteins contain motif 1, motif 2, motif 3, motif 4 and other conserved motifs, these motifs might play an important role in BpPOD proteins.

Chromosomal location and evolution analysis of BpPODs
Based on the genomic information of B. pendula, we analyzed the chromosomal distribution of 90 BpPODs.
There are eight BpPODs on chromosome 5 and chromosome 7, and only one BpPODs on chromosome 14. Noteworthy, there is no POD gene distribution on chromosome 11. We also found that the relatively high density of BpPODs on chromosome 13 and chromosome 8. Gene duplication, including segmental and tandem duplication, is considered to be one of the primary driving forces in the evolution of genomes [37,38]. In this study, among the 90 BpPODs identified, a large number of BpPODs have the same duplicated regions (Fig. 5). In general, gene tandem duplication is one of the basic reasons for the formation of gene clusters [39]. In this study, we found that some BpPODs were adjacent to each other (Fig. 4). For instance, BpPOD17-20 on  [34]. However, in previous studies, many species also have produced some different results. For example, in the report on the POD gene family of pear, it was found that segmental duplication was the main reason for the extension of the POD family [31]. In the maize, segmental and tandem duplication affect the extension of maize POD gene family [36]. These results indicate that there are significant differences in the POD genes expansion pattern in B. pendula, maize and Chinese pear, which suggested that POD gene family have different expansion patterns among different species.
Considering the selection pressures of the BpPOD duplicated genes, Ka, Ks, and Ka/Ks ratios were calculated for the 23gene pairs (Table 2). In the process of evolution, Ka/Ks > 1 represents positive selection, Ka/Ks = 1 represents neutral selection and Ka/Ks < 1 represents negative selection [35]. Ka/Ks analysis showed that the Ka/Ks value of most BpPOD gene pairs were less than 1, indicating that these genes underwent negative selection and were relatively conservative in evolution, with relatively stable structure and consistent function.
To analyze the evolution of BpPODs family, we created the comparative syntenic diagram of the birch and three representative species (Fig. 6; Tables 3, 4 and 5). The results showed that the number of orthologous pairs between B. pendula and Arabidopsis thaliana, Populus trichocarpa and Vitis vinifera were 17, 49 and 43, respectively. In these gene pairs, some BpPOD genes (BpPOD3, BpPOD7, BpPOD16, BpPOD21, BpPOD40, BpPOD4, BpPOD48, BpPOD52, BpPOD57 and BpPOD84) were indicated to have collinear relationships with three species. Interestingly, two or more POD genes from Arabidopsis thaliana, Populus trichocarpa and Vitis vinifera matched one birch POD gene, these genes may play a more important role than other genes in  (Tables 3, 4

Tissue-specific expression of BpPODs
To explore the functions of POD genes in Betula platyphylla × Betula pendula, the expression profiles in different tissues (including root, xylem, young leaf and flower) were investigated with available experimental data. Of the 90 BpPODs, 69 genes were expressed in one or more birch tissues, while 21 BpPOD genes were not expressed in different tissues (Relative expression value > 0 as basal expression) [41]. As shown in Fig. 7, most BpPODs were expressed preferentially in different tissues. For example, BpPOD6, BpPOD21 and BpPOD37 were highly expressed in xylem. Several BpPODs were expressed in root during development, such as BpPOD62, BpPOD63 and BpPOD65. BpPOD78 and BpPOD19 showed higher expression levels in young leaf and flower, respectively. The expression level of BpPOD6 was high in xylem and low in root, leaf and flower. In contrast, BpPOD67, BpPOD68, BpPOD80 and BpPOD81 had no expression in any of the investigated tissues. BpPOD21, BpPOD59 and BpPOD62 were highly expressed in developing xylem, root, leaf and flower. In conclusion, the expression changes of BpPODs in these tissues indicated that POD genes played an important role in the growth and development of B. pendula.

Responses of BpPODs expression to cold treatment
POD participates in a variety of physiological processes in the plant, and especially in resisting various stresses play an important role [42]. In recent years, many scholars have investigated the performance of POD genes in response to abiotic stress [36]. For example, Arabidopsis overexpressing AtPOD3 showed an increase in dehydration and salt tolerance, whereas the antisense suppression of AtPOD3 exhibited dehydration and salt sensitive phenotypes [43]. In this study, we examined the expression levels of the BpPODs in response to low temperature stress. As shown in Fig. 8, the result indicated that the expression of BpPODs was altered under cold treatment, some of BpPODs are induced but most of them not or slightly induced. After cold treatment, the expression levels of BpPOD4, BpPOD13, BpPOD15, BpPOD17 and BpPOD21 were significantly induced at a relatively early stage (0.5 h after treatment), and with the increase of cold treatment time, the relative expression level of these genes was also at a high level. The Fig. 8 shows that the expression levels of BpPOD19, BpPOD21, BpPOD39 and BpPOD47 were increased after 1.5 h treatment of low temperature. BpPOD50 and BpPOD58 did not respond to cold treatment at the beginning (0.5 h), and were slightly increased after 2 h exposure to low temperature. In addition, other genes are also induced by cold stress, such as BpPOD14, BpPOD16, BpPOD59, etc. In general, the BpPODs may play important roles in birch under cold stress.

Validation of transcriptome data by qRT-qPCR analysis
To verify the accuracy of the RNA-Seq data under cold treatment (6°C) in B. pendula, six randomly selected BpPODs were tested for Quantitative real-time PCR (qRT-PCR). The expression pattern of six BpPODs using qRT-qPCR were in accordance with that detected by RNA-seq (Fig. 9). BpPODs including BpPOD15, BpPOD47 and BpPOD49 showed the highest transcript level when exposed to a low temperature for 1.5 h. BpPOD4, BpPOD17 and BpPOD26 showed the highest transcript level at 3 h. In general, all the results indicated that the expression profile results of RNA-seq were reliable.

Discussion
It is reported that Class III Peroxidases participates in a variety of physiological processes in the plant [6,34], and play a important role in biological and abiotic stress responses during plant development [36]. At present, POD gene family have been published for Arabidopsis thaliana [6], Populus trichocarpa [34], Zea mays [36] and Oryza sativa [11], but there are no reports on the identification and function of POD gene family in Betula pendula. Fortunately, with the completion of the complete genome sequence of B. pendula [29,30], bioinformatics analysis of the POD gene family in B. pendula at the genome level has become possible.
In the present study, based on the genomic information of B. pendula, a total of 90 POD gene family members were identified, the number of POD family members was higher than that of Arabidopsis (73), which was similar to that of Populus trichocarpa (93) and Pyrus bretschneideri (94). Subsequently, phylogenetic relationships, subcellular localization, conserved motifs, gene structure and other information were analyzed [44].   In the process of genome evolution, gene duplication was the main factors that led to the expansion of gene family [38]. It has been reported that tandem duplication plays an important role in gene family extension in B. pendula [36]. For example, Chen, et al. found that tandem duplication is the main reason for the expansion of the NAC gene family in B. pendula [16]. However, in pears, segmental duplication is the main driver of gene family expansion [31]. Interestingly, in this study, we found that some BpPODs were adjacent to each other, suggesting tandem duplications play the major role in the evolution of the BpPOD gene family. Noteworthy, segmental and tandem duplication contributed to the evolution of POD gene family in maize [36]. The results may be one of the reasons why the number of POD genes varies among different species. In addition, Ka/ Ks analysis showed that the Ka/Ks value of most BpPOD gene pairs were less than 1, indicating that these genes underwent negative selection. Furthermore, BpPOD5/− 6, BpPOD24/− 25 and BpPOD24/− 27 gene pairs had higher Ka/Ks values than other gene pairs, indicating that these genes evolved rapidly and had relatively stable structures. We also constructed the comparative syntenic maps of birch associated with Arabidopsis thaliana, Populus trichocarpa and Vitis vinifera. The results showed that there are 49 pairs of orthologous gene between birch and Populus trichocarpa, while the number of orthologous gene pairs (17) between Arabidopsis and birch is relatively small, which may be due to the genetic relationship between Populus trichocarpa and birch is close, but the relationship with Arabidopsis is far away.
In the study, the 90 BpPOD proteins possess ten highly conserved motifs. In addition, we found that the number and type of conserved motifs in 90 BpPOD proteins were slightly different. Notably, most BpPOD proteins contain all the conserved motifs, while only a few BpPOD proteins contain one or two motifs, which means that these motifs may be involved in the important basic function of the POD protein. The diversity of gene structure plays an important role in the evolution of gene families [45,46]. In this study, we performed the structure of BpPOD genes. The results showed that 90 BpPOD genes contained different number of exons and introns, and the characteristics of BpPODs from different subgroup were different. These results indicated that the POD gene family of B. pendula has great diversity. In addition, the study of gene structure also found that some BpPODs lack introns, which may be caused by a specific pathway [10,47].
RNA-seq is usually used to study the mRNA expression amount of specific tissue or cells transcribed during a certain period of time, and then to analyze the related genes and phenotypes [48]. In this study, we used the acquired transcriptome data to investigate the function of BpPODs. RNA-Seq analysis of different tissues found that different BpPODs have tissue expression specificity, indicating that BpPODs had diverse functions. We found that of the 90 BpPODs, the most abundant expression was in the root, followed by the xylem. The results showed that most of the expressed POD genes participated in the reproductive growth process. The highest expression levels of BpPOD6, BpPOD21 and BpPOD37 genes were found in xylem. The results implied that three genes may play an important role of xylem synthesis in B. pendula. BpPOD59 is most expressed in flowers and leaves, suggesting that it may be related to leaf spreading and flowering formation in B. pendula. In addition, some BpPODs were expressed in all tissues, implying that they may have important effects on the growth and development process in B. pendula. In conclusion, POD family genes play an important regulatory role in the growth and development of B. pendula.
Abiotic stresses such as drought, low temperature and high salinity are serious natural disasters in plants, which seriously affect the growth and development of plants [46]. Plants have established a series of signal transduction and regulation molecular mechanisms to improve their ability to cope with adversity stress [49,50]. A large number of experimental studies [36] on stress treatment showed that under the stress of low temperature and other conditions, POD genes expression increased significantly [25,26]. However, there are few studies on the response of POD genes to cold stress in B. pendula. Therefore, we studied the expression patterns of the BpPODs under cold treatment. The results suggested  that some of BpPODs are induced but most of them not or slightly induced. A small number of BpPODs were highly expressed at 0.5 h after treatment, and with the extension of time, the expression reached the highest level. This suggested that these genes may be important in the process of resistance to stress in B. pendula. By contrast, the expression level of BpPOD30 and BpPOD8 gradually increased at 2 h after treatment, implying that these genes participated in the late reaction of cold treatment. In addition, the expression of a few BpPODs decreased under cold treatment, we speculate that these genes may also have defense and other specific functions in B. pendula. These results suggested that BpPOD genes play an important regulatory role in the stress response.

Conclusion
In short, we identified 90 POD genes in Betula pendula. According to phylogenetic relationships, these POD genes were classified into 12 groups. The BpPODs are distributed in different numbers on the 14 chromosomes. In addition, we identified eight conserved domains of BpPOD proteins. Finally, expression patterns analysis revealed that some BpPODs might play significant roles in root, xylem, leaf and flower. Furthermore, under low temperature conditions, some BpPODs showed different expression patterns at different times. In this study, a preliminary study was conducted on the POD genes in B. pendula, which laid a foundation for further research on the function of POD gene family in future.
We also downloaded all annotated POD proteins sequences of Arabidopsis from the TAIR database (http:// www.arabidopsis.org/). The POD family protein sequence of Arabidopsis thaliana was used as seed sequence, and the whole genome of B. pendula was searched by BLASTP. To verify the reliability of the results, all the acquired candidate sequences were examined for the presence of the POD domain using PFAM [51] and SMART [52]. Finally, all candidate POD sequences were compared by ClustalW [53] and redundant genes were manually checked and removed, and all non-redundant POD genes were used for further analysis. The theoretical molecular weights (MWs) and isoelectric points (pIs) of the BpPOD protein sequences were analyzed by the ExPASY PROTPARAM tools (http://web.expasy.org/protparam/) [54].

Phylogenetic analyses of peroxidase genes in B. pendula
To investigate the phylogenetic information of the peroxidase genes of B. pendula, an unrooted tree was constructed using amino acid sequences of the peroxidase genes. The MUSCLE with default parameters were used for multi-sequence alignment analysis [55]. Subsequently, The phylogenetic tree was constructed by using MEGA 7.0 software, which was constructed by neighbor-joining method and repeated 1000 times (Bootstrap: 1000). The phylogenetic tree was beautified and annotated by using the online tool ITOL (https:// itol.embl.de/).

Gene structure and conserved motif analysis
The CDS sequences of PODs were extracted from the genomic structure information (GFF) of the genome (https://genomevolution.org/CoGe/GenomeInfo.pl?gid= 35079), and the intron and exon structures were visually analyzed using Gene Structure Display Server [56]. MEME software was used to analyze the conserved motif of BpPOD proteins [57], and TBtools was used to draw the schematic diagram.

Chromosomal localization and gene collinearity analysis
According to BpPODs starting positions on the birch chromosomes, TBtools software was used to determine the chromosome location image of the BpPODs [58]. In addition, the rate of Ka/Ks was calculated for the duplicated gene pairs by using TBtools [58]. For gene collinearity analysis, syntenic maps of birch associated with three representative species were visualized by MCScanX [59].

Differential expression profile of BpPOD gene family
To determine the expression patterns of BpPODs in different tissues in Betula platyphylla × Betula pendula, we downloaded the sequencing data from the NCBI SRA database with an accession number of PRJNA535361 (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA535361) [16]. To identify the expression of BpPODs during cold treatment in Betula platyphylla × Betula pendula, we designed the experiment including six time points. In this study, two-month-old Betula platyphylla × Betula pendula plants grown in the greenhouse of Northeast Forestry University were exposed to low temperatures (6°C) for 0.5 h, 1 h, 1.5 h, 2 h, 2.5 h, and 3 h, respectively [16]. In addition, plants without cold treatment were used as the control. After cold treatment, all young leaves were harvested at the same time to avoid changes in gene expression due to different harvest times. Total RNA samples were isolated from the leaves using the RNAprep Kit. The constructed cDNA libraries were sequenced using the Illumina HiSeq platform at Biomarker Technologies Corporation (Beijing, China). We can downloaded the sequencing data from the NCBI SRA database with an accession number of PRJNA532995 (https://www. ncbi.nlm.nih.gov/sra/?term=PRJNA532995) [16].

qRT-PCR test
To evaluate the reliability of the RNA-seq data, six randomly BpPODs with cold treatment were selected and