Genome-Wide Identi cation and Analysis of Class III Peroxidases in Betula pendula

Background Class III peroxidases (POD) proteins are widely present in the plant kingdom that are involved in a broad range of physiological processes including stress responses and lignin polymerization throughout the plant life cycle. At present, POD genes have been studied in Arabidopsis, rice, poplar, maize and Chinese pear, but there are no reports on the identification and function of POD gene family in Betula pendula. Results We identified 90 nonredundant POD genes in Betula pendula. (designated BpPODs). According to phylogenetic relationships, these POD genes were classified into 12 groups. The BpPODs are distributed in different numbers on the 14 chromosomes, and some BpPODs were located sequentially in tandem on chromosomes. In addition, we analyzed the conserved domains of BpPOD proteins and found that they contain highly conserved motifs. We also investigated their expression patterns in different tissues, the results showed that some BpPODs might play an important role in xylem, leaf, root and flower. Furthermore, under low temperature conditions, some BpPODs showed different expression patterns at different times. Conclusions The research on the structure and function of the POD genes in Betula pendula plays a very important role in understanding the growth and development process and the molecular mechanism of stress resistance. These results lay the theoretical foundation for the genetic improvement of Betula pendula.

response to abiotic stress, the speci c function of each member of the family is still elusive. The comprehensive researches are necessary to explore the role of POD in plant growth and defense.
The gene family always arose from multiple ways including tandem duplication, duplicative transposition, and whole genome duplication (WGD), which was followed by mutation and divergence [15]. During the last decade, several molecular biology approaches have been developed to isolate, characterize and study the expression of POD gene family in plants [6]. Betula pendula is a pioneer boreal tree that can be induced to ower within one year [16], it plays an important role in people life [17,18]. Up to now, however, no genome-wide characterization of the POD family in B. pendula. It has been shown that POD is related to the synthesis of lignin [19] and cork [20,21], and lignin is considered as an important defense means against invasion and expansion of pathogens [22,23]. At the same time, a large number of experimental evidences of stress treatment showed that under the stress of drought and low temperature, the expression of POD increased signi cantly [24,25].
Since Betula pendula is a widespread species and has many applications in the pulp and paper industry, it is necessary to study its development and physiology [26]. To understand the role of POD family in lignin synthesis and resistance to biotic and abiotic stresses in B. pendula will greatly contribute to its application in industrial production. Fortunately, B. pendula has attracted much attention, particularly by the availability of its genome sequences [27], which gave us an opportunity to carry out a comprehensive genome-wide analysis for exploring the potential functions of the POD gene family in B. pendula.
In the present study, a genome-wide analysis of POD gene family from B. pendula was conducted via genomic sequence, including BpPOD gene models, phylogenetic relationship, conserved motif, chromosome location and other structural features [28]. we performed for the rst time the comprehensive analysis to the POD genes involved in lignin synthesis and abiotic stress response in B. pendula. Our study provides important insights for further study of the potential role of POD gene family in B. pendula growth and development.

Identi cation of POD genes
To identify members of POD family in B. pendula, we used the 73 POD genes of Arabidopsis to obtain the best hits in the B. pendula genome by BLASTP. A total of 90 putative PODs were identi ed in the B. pendula genome. We further examined the conserve domains of proteins encoded by these genes using Pfam [29] and SMART [30] database. The results revealed that all the genes have classical POD domain structures, which demonstrates the reliability of the results. The B. pendula genome contains more PODs than Arabidopsis (73) [6], but fewer than Populus euphratica (93) [31], Pyrus bretschneideri (94) [28], and rice (138) [11]. We de ned the BpPODs as BpPOD1 to BpPOD90. The isoelectric point (PI) varied from 4.28 to 9.6 with a mean of 7.25 and >7.0 of 52.2% POD proteins. In addition, subcellular locations of these BpPODs are mainly in the cytoplasm, cell membrane, vacuole, chloroplast and nucleus. Their detailed information, including chromosome location, gene name, subcellular location and molecular weight (MW) gene size of each BpPOD gene/protein, was listed in Table 1.

Phylogenetic analyses of POD proteins in B. pendula
To investigate the evolutionary history and phylogenetic relationships among the members of POD family in B. pendula, a phylogenetic tree was constructed with the Neighbor-Joining method based on multiple sequence alignment of the 90 BpPODs, with 1000 bootstrap replicates ( Figure 1). The BpPOD proteins were divided into twelve major subgroups with high bootstrap probabilities, designated group I to group XII. The POD genes of each subgroup is unevenly distributed, with the number of members varies from 4 to 15. Subgroup VIII contains the most members (15), subgroup X, XI, XII contains the least number of members, with only 4 members.

Gene structures
To further gain insights into the structural diversity of the POD genes, we subsequently performed exonintron analysis in BpPODs. The result reveals several variations (Figure 2), including ve BpPODs (BpPOD9, BpPOD11, BpPOD16, BpPOD57 and BpPOD61) lacking an intron in their gene structures. In the remaining BpPODs, the number of introns varies from one to six, while the major members have one to three introns. In addition, BpPOD76 and BpPOD87 have the most introns (6), followed by BpPOD24 and BpPOD51 (5). Furthermore, genes in the same subfamily display similar exon/intron structures. For example, BpPOD20, BpPOD22 and BpPOD82 have three exons and two intron, both of which belong to Group V; BpPOD73 and BpPOD74 have two exons and one intron, both of which belong to Group XI. However, in some cases, the number of exons/introns varies among genes clustered together in the phylogenetic tree. For example, BpPOD52 has one more intron than BpPOD42, BpPOD56 has two more introns than BpPOD43. These differences may be derived from a single intron loss or gain events during the long evolutionary period [32].

Analysis of conserved amino acid motifs
To understand the functional regions of BpPODs, conserved amino acid motifs analyses of BpPOD proteins were performed. A total of eight conserved amino acid motifs were identi ed in the BpPOD proteins ( Figure 3). All BpPOD proteins contain at least one conserved amino acid motif. For example, BpPOD55 only contains motif 8, BpPOD83 contains motif 1 and 7, while BpPOD10 proteins contain all the eight conserved amino acid motifs.
Most of the closely related members have the same motif compositions, suggesting that there are functional similarities between POD proteins within the same subgroup [33]. We found that motifs 1, 2, 3, 4, 5 and 7 appeared in nearly all members of BpPOD proteins, these motifs might be important for the functions of BpPOD proteins.

Chromosomal location and evolution analysis of BpPODs
To investigate the genome organization and distribution of BpPODs on different chromosomes of B. pendula, a chromosome map was constructed. The results show that the 90 BpPODs were distributed among 14 chromosomes, as shown in Figure 4, the physical locations of these BpPODs on chromosomes were scattered and uneven. Chromosome 1 and 8 contains the most BpPODs (14), followed by chromosome 13 (10). Eight BpPODs were simultaneously distributed on chromosomes 5 and 7, whereas chromosomes 14 had only one and chromosomes 11 does not include the POD gene. In addition, some chromosomes exhibit a relatively high density of BpPODs, such as the bottoms of chromosomes 13 and the top of chromosome 8.
Gene duplication, including segmental and tandem duplication, is considered to be one of the primary driving forces in the evolution of genomes [34,35]. In this study, among the 90 BpPODs identi ed, a large number of BpPODs have the same duplicated regions ( Figure 5). Generally, a gene cluster is the result of gene tandem duplication [36]. In this study, we found that some BpPODs were adjacent to each other ( Figure 4). For instance, BpPOD17-20, BpPOD22-29 and BpPOD11-15 were located sequentially in tandem on chromosomes 5, 8, and 13, respectively, implying that these genes might arise from recent tandem duplication events [37]. The result indicated that tandem duplications play main contributors in the expansion of the BpPOD gene family. The result was consistent with the previously reported for Populus euphratica POD genes, tandem duplications also contributed signi cantly to the expansion of POD gene family in Populus euphratica [31]. However, in previous studies, many species also have produced some different results. For example, segmental duplication events were the major contributors to the expansion of the pear POD family [28] and segmental duplication and tandem duplication were identi ed in maize POD family [33]. These results indicate that there are signi cant differences in the POD genes expansion pattern in B. pendula, maize and Chinese pear, which strongly implied that POD family members have different expansion patterns among different species. It may be the reason why the POD family members (90) in B. pendula were less than those in the maize (119) [33].
To explore the selection pressures among BpPOD duplicated genes, we calculated the Ka, Ks and Ka/Ks values for the 23 gene pairs ( Table 2). In the general, Ka/Ks > 1 indicates positive selection, Ka/Ks = 1 indicates neutral selection and Ka/Ks < 1 indicates negative or purifying selection [32]. The Ka/Ks ratios of most BpPOD gene pairs were < 1, suggesting that these gene pairs evolved under negative selection in B. pendula. The results of this Ka/Ks analysis suggest that negative selection was vital to the functional divergence of BpPODs.
To further elucidate the evolution mechanisms of BpPODs family, we constructed the comparative syntenic maps of birch associated with representative species containing two dicots (Arabidopsis thaliana and Populus trichocarpa) and one monocots (Vitis vinifera) ( Figure 6; Table3). A total of 17 (Arabidopsis thaliana), 49 (Populus trichocarpa) and 43 (Vitis vinifera) orthologous pairs were found between birch and the other species. Interestingly, among these gene pairs, some BpPOD genes (BpPOD3, BpPOD7, BpPOD16, BpPOD21, BpPOD40, BpPOD4, BpPOD48, BpPOD52, BpPOD57 and BpPOD84) were shown to have collinear relationships with all of the above three species, showing that these orthologous pairs might already have been present before the divergence of dicotyledonous and monocotyledonous plants [38].

Tissue-speci c expression of BpPODs
To better understand the functions of POD genes in the growth and development of Betula Platyphylla × Betula Pendula, their expression pro les in different tissues (including root, xylem, young leaf and ower) were analyzed with publicly available transcriptome data. Of the 90 BpPODs, 69 genes were expressed in one or more birch tissues, while 21 BpPOD genes exhibited no expression in various individual tissues. The heat map (Figure 7) demonstrated that most BpPODs had tissue-speci c or preferential expression patterns. As shown in Figure 7, BpPOD6, BpPOD21 and BpPOD37 were highly expressed in xylem. Several BpPODs were expressed in root during development, such as BpPOD62, BpPOD63 and BpPOD65. BpPOD78 and BpPOD19 showed higher expression levels in young leaf and ower, respectively. The expression level of BpPOD6 was high in xylem and low in root, leaf and ower. In contrast, BpPOD67, BpPOD68, BpPOD80 and BpPOD81 had no expression in any of the investigated tissues. BpPOD21, BpPOD59 and BpPOD62 were highly expressed in developing xylem, root, leaf and ower. In conclusion, the variations in the expression of BpPODs in different tissues revealed that POD genes may be involved in several processes during B. pendula growth and development.

Responses of BpPODs expression to cold treatment
Several roles have been attributed to plant peroxidases in response to biotic and abiotic stresses [39]. In recent years, the number of studies on POD genes response to abiotic stress have been reported [33]. For example, Arabidopsis overexpressing AtPOD3 showed an increase in dehydration and salt tolerance, whereas the antisense suppression of AtPOD3 exhibited dehydration and salt sensitive phenotypes [40]. The expression of POD genes is induced by various environmental stresses, such as metal, pathogens, humidity, temperature, anoxia and potassium de ciency [39], suggesting that POD genes are involved in plant defense. In this study, we examined the expression levels of the BpPODs in response to low temperature stress. As shown in Figure 8, the result indicated that the expression of most BpPODs was altered under cold treatment. After cold treatment, the expression levels of BpPOD4, BpPOD13, BpPOD15, BpPOD17 and BpPOD21 were signi cantly induced at a relatively early stage (0.5 h after treatment), and with the increase of cold treatment time, the relative expression level of these genes was also at a high level. As shown in Figure 8, the expression levels of BpPOD19, BpPOD21, BpPOD39 and BpPOD47 were increased after 1.5 h treatment of low temperature. BpPOD50 and BpPOD58 did not response to cold treatment at the beginning (0.5 h), and were slightly increased after 2 h exposure to low temperature. In general, the low temperature responsive BpPODs may play important roles in birch under cold stress.
Validation of DEGs identi ed by RNA-seq using qRT-PCR We used qRT-PCR (Quantitative Real-time PCR) to validate the expression pro les of BpPODs after low temperature treatment. A total of 6 cold-responsive BpPODs were selected for qRT-PCR analysis. As shown in Figure 9, the qRT-PCR results indicated that all the 6 BpPODs were up-regulated by lowtemperature stress, which was consistent with the results derived from the RNA-seq data. BpPODs including BpPOD15, BpPOD47 and BpPOD49 showed the highest transcript level when exposed to a low temperature for 1.5h. BpPOD4, BpPOD17 and BpPOD26 showed the highest transcript level at 3h. In general, this qRT-PCR result supports the reliability of the RNA-seq analysis.

Discussion
It is reported that members of Class III Peroxidases gene family are involved in the regulation of a variety of processes [6,31], and play a key role in biological and abiotic stress responses during plant growth and development [33]. Systematic and comprehensive analyses of POD gene families have been published for Arabidopsis thaliana [6], Populus trichocarpa [31], Zea mays [33] and Oryza sativa [11], but a genome-wide study of the POD family has not previously been reported in B. pendula. The published genome data of B. pendula [27] provides a useful tool for analysis of the POD gene family in B. pendula.
In the present study, 90 non-redundant POD genes were identi ed in B. pendula, the number is higher than that in Arabidopsis (73), but lower than that Populus euphratica (93) and Pyrus bretschneideri (94), which indicates that the POD genes in B. pendula have expanded compared to those in Arabidopsis. Subsequently, we performed analyses of the phylogenic relationships, gene structures, chromosomal locations, conserved motifs and expression pro les [41].
In the process of genome evolution, tandem duplication and segmental duplication were the main factors that led to the expansion of gene family [35]. Certain studies have shown that tandem duplication was largely responsible for the expansion of birch gene family [33], such as tandem duplication are the main reason for the expansion of B. pendula NAC gene family [15]. By contrast, segmental duplication has contributed signi cantly to the expansion of this gene family in Pyrus bretschneideri [28]. Interestingly, in this study, we found that some BpPODs were adjacent to each other, suggesting tandem duplications play main contributors to the expansion of the BpPOD gene family. However, in maize, the segmental duplication and tandem duplication almost identically contributed to the POD gene family expansion [33]. These results also explain why the number of POD genes in B.pendula (90), pear and populus were less than those in maize (119). According to the above analysis, we speculated that the expansion of the POD gene family differed between different plants. In addition, the Ka/Ks < 1 of the most BpPOD duplicated pairs showed that negative selection may be largely responsible for maintaining the functions of birch POD enzymes. Furthermore, the Ka/Ks ratios for the BpPOD5/-6, BpPOD24/-25 and BpPOD24/-27 gene pairs are relatively high, suggesting that these genes experienced rapid evolutionary diversi cation following duplication. We also constructed the comparative syntenic maps of birch associated with Arabidopsis thaliana, Populus trichocarpa and Vitis vinifera. We identi ed 49 orthologous gene pairs between Populus trichocarpa and birch, while only 17 orthologous gene pairs between Arabidopsis thaliana and birch were found, perhaps due to the closer relationship between Populus trichocarpa and birch versus Arabidopsis thaliana and birch.
The 90 BpPOD proteins possess ten highly conserved motifs. MEME analysis revealed that different conserved motifs are present in each of the BpPOD proteins. Notably, most BpPOD proteins contain all the conserved motifs, while only a few BpPOD proteins contain one or two motifs, which means that these motifs may be involved in the important basic function of the POD protein. However, a few motifs with unknown functions are present in nearly every subgroup, these motifs might play important roles in the BpPOD family. Additionally, gene structures were also investigated in BpPODs. We found that 90 BpPODs contained different numbers of exons or introns, with most BpPODs containing more than two introns, indicating that there is a great diversity in POD gene family of B. pendula. Many studies have shown that structural diversi cation of genes plays an important role in the evolution of multi-gene families [42,43]. In our research, we found that there presented different characteristics of BpPODs from different subfamilies, suggesting that BpPODs members are functionally diversi ed. In addition, many studies have reported that introns could be speci cally inserted into the plants and were retained in the genome during the course of evolution [10,44]. Therefore, we inferred that loss or gain of introns may be caused by speci c approach.
Gene expression patterns are an important aspect of the study of gene function [45]. High-throughput microarray technology provides a good platform for the study of genome-wide gene expression patterns [43]. We used publicly available genome-wide transcript pro ling data from B. pendula tissues as a resource to investigate the expression patterns of BpPODs. Most BpPODs exhibited variable expression patterns, suggesting functional diversi cation of BpPODs. Twenty-one BpPODs exhibited no expression in four individual tissues, indicating that BpPODs are expressed under speci c conditions or at speci c developmental stages. In this study, we found that of the 90 BpPODs, the most abundant expression was in the root, followed by the xylem. The result showed that most highly expressed POD genes might play signi cant roles in root. The highest expression levels of BpPOD6, BpPOD21 and BpPOD37 genes were found in xylem. It was suggested that these three genes were participated in regulation of the xylem synthesis in B. pendula. BpPOD59 is most expressed in owers and leaves, suggesting that it may be related to leaf spreading and owering formation in B. pendula. In addition, several BpPODs were expressed in all tissues, suggesting that they might play basic roles in B. pendula. In conclusion, the expression pro ling in this study provides an important basis for further studying of expression and biological functions of the POD gene family in B. pendula.
The growth and development of plants are usually affected by abiotic stress, such as drought, low temperature and high salinity [43]. A lot of stress-related genes were induced to adapt to these abiotic stresses [46,47]. A large number of experimental studies [33] on stress treatment showed that under the stress of low temperature and other conditions, POD genes expression increased signi cantly [24,25]. However, no POD genes responding to cold treatment have been reported in B. pendula. Thus, we performed a survey of the expression patterns of the POD genes in B. pendula under cold treatment. The results suggested that fty BpPODs were responsive to cold treatment. Most BpPODs were highly expressed at a relatively early stage (0.5 h after treatment), and with the extension of time, the expression reached the highest level. This indicated that these genes might play an important role in B. pendula. By contrast, the expression level of BpPOD30 and BpPOD8 gradually increased at 2 h after treatment, indicating that these genes are involved in the late reaction of cold treatment. In addition, the expression of a few BpPODs decreased under cold treatment, we speculate that these genes may also have defense and other speci c functions in B. pendula. These results indicated that most POD genes were induced by low temperature and might contribute to the defense against abiotic stresses in B. pendula.

Conclusion
In short, a total of 90 POD genes were identi ed in B. pendula and divided into twelve major subgroups. A total of eight conserved amino acid motifs were identi ed in the BpPOD proteins. Chromosomal location and microsynteny analysis suggested that these BpPODs were unevenly distributed in fourteen chromosomes. Tandem duplication were identi ed as the main patterns contributors to the expansion of POD genes expansion in B. pendula. Finally, expression patterns analysis revealed that some BpPODs might play signi cant roles in root, xylem, leaf and ower. Furthermore, under low temperature conditions, some BpPODs showed different expression patterns at different times. This present study increases our understanding of POD genes in B. pendula and lays the foundation for further clarify of the biological functions of these POD proteins in other plants.

Methods
Identi cation of B. pendula peroxidase genes To identify B. pendula peroxidase genes, the B. pendula genome sequences were downloaded [27]. The protein sequences of POD family members in the genome of Arabidopsis were retrieved from the TAIR database. The 73 Arabidopsis POD members were used as queries to identify the candidate sequences of B. pendula POD genes using BLASTP. To verify the reliability of the results, all the acquired candidate sequences were examined for the presence of the POD domain using Pfam [48] and SMART [49]. Finally, all candidate POD sequences were aligned using ClustalW [50] and checked manually to remove potentially redundant genes, and all of the non-redundant POD genes were used for further analysis. The theoretical molecular weights (MWs) and isoelectric points (pIs) of the BpPOD protein sequences were analyzed by the ExPaSy Compute pI/Mw tool [51].

Phylogenetic analyses of B. pendula peroxidase genes
To investigate the phylogenetic relationships of the peroxidase genes of B. pendula, a phylogenetic tree was constructed. Prior to phylogenetic analysis, multiple sequence alignments were generated using MUSCLE [52]. Subsequently, the RAxML [53] was employed to construct an unrooted phylogenetic tree based on alignments using the Neighbor-Joining (NJ) method with the following parameters: model (pdistance), bootstrap (1000 replicates), and gap/missing data (pairwise deletion).

Gene structure and conserved motif analysis
The genomic and CDS sequences of B. pendula PODs, extracted from B. pendula genome databases, were compared by using the Gene Structure Display Server [54] program to infer the exon/intron organization of POD genes. To determine conserved motifs structures of the BpPOD proteins , the conserved motifs were detected using the online MEME Tool [55]. The conserved motifs were analyzed with the SMART and PFAM programs.

Analysis of chromosomal location
Information about the physical locations of all POD genes on chromosomes was obtained from the Phytozome database. According to BpPODs starting positions on the birch chromosomes, TBtools software was used to determine the chromosome location image of the BpPODs [56]. In addition, to explore the selection pressures among POD duplicated genes, we calculated the nonsynonymous mutation rate (Ka), synonymous mutation rate (Ks), and Ka/Ks values for the duplicated gene pairs with TBtools [56]. The ratio of nonsynonymous to synonymous nucleotide substitutions (Ka/Ks) between paralogs was analyzed to detect the mode of selection. Syntenic maps of birch associated withthree representative species were visualized by MCScanX [57].

Differential expression pro le of BpPOD gene family
To determine the expression patterns of BpPODs in different tissues in Betula platyphylla × Betula pendula, we downloaded the sequencing data from the NCBI SRA database with an accession number of PRJNA535361 [15]. To identify the expression of BpPODs during cold treatment in Betula platyphylla × Betula pendula, we downloaded the sequencing data from the NCBI SRA database with an accession number of PRJNA532995 [15]. The clean reads of each sample were obtained by ltering out reads of low quality. All the clean reads were aligned to the B. pendula reference genome [27] using bowtie2 [58]. The RNA-seq (RNA-sequencing) data were analyzed using the RSEM (RNA-seq by Expectation-Maximization) pipeline [59] and the data were processed using a paired-end sequencing mode. RSEM [59] could compute transcript abundance, estimating the number of RNA-seq fragments corresponding to each gene, and normalized expression values as TMM (trimmed mean of M-values).

Quantitative RT-PCR
The two month old plants of birch were treated with low temperature stress for 1.5 h and 3 h, and three biological replicates were prepared at each time point. Total RNA of young leaves was then isolated for the qRT-PCR experiment. The rst-strand cDNA synthesis was performed with the resulting RNA using a PrimeScript RT Master Mix Perfect Real-Time kit (TAKARA, Japan). Quantitative real-time RT-PCR was performed on an ABI 7500 Real-Time system (Applied Biosystems). Primer pairs for real-time quantitative PCR were designed using A plasmid Editor v1.11. Each reaction contained 10 µL of THUNDERBIRD SYBR qPCR Mix (QPS-201, TOYOBO, Japan), 2 µL of cDNA, 1 µL of forward primer, 1 µL of reverse primer, and 6 µL of double-distilled water (ddH2O). The thermal cycle was used as follows: 95℃ for 10 min, which was followed by 45 cycles of 95℃ for 30 s and 60℃ for 10s [33]. The relative mRNA level for each gene was calculated using the 2 −ΔΔCT method. The real-time PCR experiment was carried out at least three times under identical conditions [15]. Availability of data and materials All data generated or analysed during this study are included in this published article.
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.          Expression patterns of six stress-responsive BpPOD genes under cold stress treatments. Relative expression levels of BpPOD genes in response to cold stress treatments were examined by RT-qPCR and normalized expression values use TMM. The y-axis represents the relative expression level and the x-axis represents the time course of stress treatment. Seedlings were sampled at 0, 1.5 and 3 after cold treatment. Data represent means ± SD in three replicates.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Table.xlsx