Genome-wide analysis of proline-rich extension-like receptor protein kinase (PERK) in Brassica rapa and its association with the pollen development

Background Proline-rich extension-like receptor protein kinases (PERKs) are an important class of receptor kinases located in the plasma membrane, most of which play a vital role in pollen development. Results Our study identified 25 putative PERK genes from the whole Brassica rapa genome (AA). Phylogenetic analysis of PERK protein sequences from 16 Brassicaceae species divided them into four subfamilies. The biophysical properties of the BrPERKs were investigated. Gene duplication and synteny analyses and the calculation of Ka/Ks values suggested that all 80 orthologous/paralogous gene pairs between B. rapa and A. thaliana, B. nigra and B. oleracea have experienced strong purifying selection. RNA-Seq data and qRT-PCR analyses showed that several BrPERK genes were expressed in different tissues, while some BrPERKs exhibited high expression levels only in buds. Furthermore, comparative transcriptome analyses from six male-sterile lines of B. rapa indicated that 7 BrPERK genes were downregulated in all six male-sterile lines. Meanwhile, the interaction networks of the BrPERK genes were constructed and 13 PERK coexpressed genes were identified, most of which were downregulated in the male sterile buds. Conclusion Combined with interaction networks, coexpression and qRT-PCR analyses, these results demonstrated that two BrPERK genes, Bra001723.1 and Bra037558.1 (the orthologs of AtPERK6 (AT3G18810)), were downregulated beginning in the meiosis II period of male sterile lines and involved in anther development. Overall, this comprehensive analysis of some BrPERK genes elucidated their roles in male sterility.

their extracellular domain motifs [6,7]. For instance, the leucine-rich repeat receptor kinases (LRR-RKs) are the largest of the RK family in Arabidopsis and sense a diverse set of hormone signaling pathways to regulate cell proliferation and expansion, stomatal development and stem cell maintenance [5,[8][9][10].
Proline-rich extension-like receptor-kinases (PERKs) are a small family of RKs containing cytoplasmic kinaselike domains, transmembrane motifs and extracellular proline-rich domains [1,11,12]. The first PERK gene was identified in Brassica napus, which is rapidly induced by wounding, ubiquitously expressed in the stem, petals and pistils and controls root and stem branching [13,14]. Recent work addressing the PERK family of Gossypium hirsutum has revealed the expression patterns of these genes in response to various abiotic stresses and hormonal homeostasis [15]. In addition to these functions, PERKs are thought to act as sensors/receptors to monitor changes in the cell wall during its expansion [1,12].
Evidence from Arabidopsis shows that most PERKs not only contribute to cell growth and cell wall deposition but also play important roles in male-female interactions and precisely control pollen/pollen tube behaviors [1,18]. Male gametogenesis requires several metabolic pathways, including the tapetum development, callose metabolism, anther dehiscence, pollen wall and pollen tube wall formation pathways [24][25][26], and defects in any of these processes can cause abnormal development and ultimately lead to male sterility [26]. However, the molecular mechanisms of PERKs function in pollen development, especially in male sterility, are still unknown.
Thus far, the genome-wide characterization of PERK genes has been reported only in cotton and Arabidopsis [15,16]. Little information on the PERK family and their biological functional roles in B. rapa are available. Chinese cabbage (B. rapa L.) is one of the most important species in the Brassicaceae family and is widely cultivated in most parts of China [26,27]. In this study, we performed a systematic analysis of the PERK family in B. rapa. The biophysical properties of BrPERK genes were determined. Furthermore, RNA-Seq data from six malesterile plants (two male-sterile mutants [26,28], two GMS lines [29,30] and two ogu-CMS lines [27,31]) were analyzed, and tissue-specific expression patterns were examined to identify the specific PERK genes related to pollen development and male sterility in B. rapa. The present study will contribute to a detailed understanding of the molecular and biological functions of BrPERK genes in male-sterile plants of B. rapa.

Identification of PERK proteins
In our study, a total of 418 PERK gene family members in 16 Brassicaceae species were identified based on several confirmations. With the exception of B. juncea (n = 18), B. napus (n = 19) and Camelina sativa (n = 20), the PERK gene numbers in the other 13 species ranged from 17 to 27 (Table S1; Fig. S1), which might be related to their chromosome numbers and indicated that the PERK genes have undergone expansion in Brassicaceae species. A maximum likelihood (ML) tree was generated for these PERK proteins, showing that all PERKs were grouped into four subfamilies (Groups I-IV; Fig. 1, Table  S2). Although Group III contained the fewest PERKs (only 22 members), it included PERK members from all Brassicaceae species, and most species exhibit only one PERK (Table S2).
Among Brassica species, 25, 27 and 22 PERK family members were identified in B. rapa (AA), B. nigra (BB) and B. oleracea (CC), respectively (Table S1; Fig. S1). The phylogenetic tree of these genes along with those of A. thaliana showed that the total 89 PERKs could again be divided into four groups (Fig. 2a, . Group 1 and Group 3 constituted the largest clades, containing a total of 65 PERKs, and accounted for 46.07 and 38.20% of the sequences, respectively. Interestingly, the PERK protein numbers of the three Brassica species in Groups 1-3 were almost the same (Fig. 2b), indicating that these PERK genes from the three Brassica plants may have come from a common ancestor. Furthermore, the sequence logos of the homologous domain sequences of the PERK proteins revealed that the domain sequences were highly conserved in A. thaliana and the three Brassica species (Fig. S2).
We also determined the physical and chemical characteristics of BrPERK proteins. The BrPERK proteins showed wide variation in their length, molecular weight (MW) and isoelectric point (pI) (Table S3). Moreover, the majority of the proteins were localized to the chloroplast thylakoid membrane, followed by the endoplasmic reticulum (membrane), plasma membrane, cytoplasm, nucleus and chloroplast stroma (Table S3).

Conserved motifs, gene structures and cis-elements analyses
The functional motifs and gene exon-intron positions were analyzed based on evolutionary tree relationships (Fig. 3). The ten most conserved motifs were identified in the BrPERK members (Fig. S3), showing that almost all of the BrPERK proteins contained the same motifs in terms of both the type and distribution patterns of the motifs, except for Bra037558 (Fig. 3b). The exon-intron organization suggested that all BrPERK genes showed conserved patterns of gene structure, and all conserved motifs were associated with the Pkinase domain (Pkc_ like superfamily and STKc_IRAK) (Fig. 3c). Overall, the highly similar gene structures and motif distributions of the BrPERK members were consistent with their phylogenetic relationships (Fig. 3). cis-element analysis showed that several promoters contained meristem expression motifs, and a few promoters contained  (Table S4; Fig. S4), indicating the roles of several PERK genes in plant flowering.

Gene duplication, synteny and Ka/Ks analyses
The mapping of the BrPERK genes to chromosomal loci showed that an inconsistent distribution of the genes, with 1 to 5 genes being distributed on chromosomes A1-A9 ( Fig. S5A; Table S3). Synteny and Ka/Ks analyses indicated that 13 paralog pairs were distributed on different chromosomes, not including the A04 and A10 chromosomes (Table S5, Fig. S5B).
The Ks values ranged from 0.29 to 1.23, and the average duplication time of paralog pairs was indicated to be 22.84 million years ago (Mya) ( Table S5, Fig. S6A), possibly because some BrPERK gene duplications occurred before the whole-genome triplication observed in B. rapa.
Further analyses of PERK gene evolution and divergence among A. thaliana and the three Brassica species showed that a total of 67 orthologous gene pairs exhibited a collinear relationship (17 Br-At, 27 Br-Bni, 23 Br-Bo; Fig. 4, Table S5). These results demonstrated that the PERK genes of these Brassica species appeared to be derived from a common ancestor and that the function of these PERK genes of Brassica plants might be the same as those of A. thaliana. In addition, among the orthologous gene pairs, each AtPERK gene presented 1-3 BrPERK orthologous genes ( Fig. 4 and S7, Table S5), suggesting that a few BrPERK genes had been duplicated by genome triplication. The Ka/Ks values of these gene pairs were all less than 0.35 except for one pair (Bra028371.1-Bo1013798, Ka/Ks = 0.56), and the average divergence times were estimated to be 17.12 Mya (Br-At), 11.96 Mya (Br-Bni), and 8.95 Mya (Br-Bo) ( Table S5, Fig. S6B-D). These results demonstrated that the PERK gene pairs shared between B. rapa and B. nigra, B. oleracea, and A. thaliana had undergone strong purifying selection with limited functional divergence after whole-genome duplication (13)(14)(15)(16)(17)).

Tissue-specific expression pattern analysis
The RNA-Seq data from different tissues of B. rapa showed that the BrPERK members exhibited variable expression patterns; several genes were broadly expressed in all different tissues, while some members were exhibited a more tissue-specific pattern and were especially highly expressed in flower tissue (Fig. 5a). The qRT-PCR results from different organs confirmed the RNA-Seq data, and several BrPERK genes were highly expressed in sepals and petals (Fig.  5b). Furthermore, BrPERK gene expression differed in different anther developmental stages, with some of these genes exhibiting high expression in the uninucleate, binucleate and mature pollen periods (Fig. 5c-d and S8). Together, these results suggested that some BrPERK genes may play important roles in flower development, especially in pollen development.
Comparative transcriptome analysis of BrPERK genes in B. rapa male-sterile lines Transcriptome datasets from six different male-sterile lines were analyzed, including two ogura cytoplasmic male-sterile (ogu-CMS) lines (wucai and turnip), two genic male-sterile (GMS) lines (AB01 and AB03) and two male-sterile mutants (msm and ftms). A total of 14 differentially expressed genes (DEGs) of BrPERKs were detected, and seven of them were downregulated in all male-sterile (MS) plants ( Fig. 6 and S9A). qRT-PCR analysis of different developing buds showed that these BrPERK genes presented variable expression patterns, and most of them were downregulated after the tetrad period ( Fig. 6b and c). These results demonstrated that these downregulated BrPERK genes participated in anther development.

Regulatory subnetworks involving BrPERKs and other genes
To gain further insight into the roles of these downregulated BrPERK genes in anther development, interaction networks of the BrPERK genes in Chinese cabbage were constructed. From the STRING database, 19 coexpressed genes were identified (Fig. 7a). RNA-Seq data from 6 male-sterile lines and qRT-PCR results were analyzed to assess the expression patterns of these genes. Ten genes were identified in at least 2 male lines (Fig. 7b), and most of the genes were downregulated in these male sterile buds (Fig. 7b). The interaction network of these genes with BrPERKs and their expression levels during pollen development are shown in Fig. 7c and d, respectively. Gene Ontology (GO) analysis indicated that 8 genes were enriched in pectinesterase activity, nucleic acid binding, cation binding, the membrane, and protein kinase activity; Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that three genes were involved in starch and sucrose metabolism, metabolic pathways, and pentose and glucuronate interconversion (Table S6).

Discussion
In the present study, we successfully identified 25 BrPERK members from the whole-genome sequence of B. rapa and analyzed their biophysical properties (Table S3). However, 6 more BrPERK members were identified in our analyses that have been reported in previous studies [1]. Among the 25 BrPERKs, 6 members (Bra022359, Bra037558, Bra034626, Bra011374, Bra004022 and Bra037291) were shorter, which generally lacked partial sequences in the 5′ flanking region (Fig. 3c); all of these proteins are extracellular proteins (Table S3) and exhibit similar patterns of the main motif distribution (Fig. 3b). These shortened proteins might exist because partial sequences were deleted in the process of gene duplication [33,34]. Nevertheless, these 6 genes still belong to the PERK gene family, and they were retained in the subsequent analyses. Unlike AtPERK members [1,13,16], all of the proline-rich regions of BrPERKs contained less than 2 Ser-Pro (2)(3) motifs in the 5′ flanking region (data not shown).
In addition to B. rapa and A. thaliana, we identified PERKs in 14 other Brassicaceae species (Table S1), and the number of PERKs did not differ greatly in these species except in three polyploid plant species, suggesting that the PERK family is well conserved in these species [17]. The phylogenetic tree of a total of 418 PERKs showed that Group III among the four subgroups is an ancient clade, similar to the findings from a previous phylogenetic tree of 207 PERKs from 15 other plant species [15]. Interestingly, in Group III, most of the Brassicaceae species presented only one PERK member, except for C. rubella (n = 8) and T. salsuginea (n = 7) and three polyploid plant species, B. juncea (n = 18), B. napus (n = 19) and C. sativa (n = 20) (Fig. 1, Table S2), which exhibited far more PERK members than other diploid plant species (Fig. S1, Table S1). These results revealed that the PERK gene family is affected by polyploidy and has experienced segmental and genome duplication in these species [15,35].
The phylogenetic tree of the three Brassica species with A. thaliana showed that Groups 1-3 included almost equal numbers of PERKs in these three Brassica species, while Group 4 lacked an AtPERK gene (Fig. 2); the PERK genes of this group might acquire new functions via divergence following duplication [33,34,36,37]. Furthermore, the sequence logos of the conserved PERK amino acid residues showed that the PERK genes are evolutionarily conserved among the Brassica species and A. thaliana (Fig.  S2). This situation can be attributed to the fact that B. rapa underwent whole-genome duplication twice in the Brassicaceae lineage and experienced an additional wholegenome triplication event [32,34].
Additionally, we estimated the divergence times of the orthologous PERK pairs between B. rapa and A. thaliana, B. nigra, and B. oleracea (Fig. 4 and S7), which showed Ka/Ks ratios less than 0.35 and indicating the occurrence of strong purifying selection (Fig. S6, Table S5), reflecting the strong control exerted over these genes in evolution [38]. Moreover, the average divergence times of B. rapa from A. thaliana, B. nigra, and B. oleracea were estimated to be 17.12, 11.96 and 8.95 Mya, respectively (Table S5, Fig. S6), indicating that Brassica species shared a common ancestor and exhibit different differentiation times [32,35,39,40]. It has been reported that the divergence time of the genomes of B. rapa and A. thaliana was approximately 13 to 17 Mya [41]. These results imply that the PERK gene family might be a good candidate molecular reference for analysis of plant evolution. In addition, we analyzed BrPERK gene duplication events within the B. rapa genome, which are thought to play important roles in the amplification of the genome [42,43]. In B. rapa, 13 paralog pairs of duplicated BrPERK genes have been retained, but none of the gene pairs exhibit tandem duplication (Fig. S5A), suggesting that segmental duplication events have played a leading role in BrPERK family evolution, similar to what is observed in the IQD gene family of B. rapa [44].
It has been reported that AtPERK members are involved in several biochemical pathways related to developmental processes in A. thaliana, such as pathways that are active in root, rosette leaf, stem and pollen  [1,11,16]. In B. rapa, the functions of BrPERKs were explored on the basis of RNA-Seq datasets and qRT-PCR results (Fig. 5). Among different tissues, the gene expression levels of different BrPERK genes clearly differed, and some of these genes were very highly and specifically expressed in reproductive organs. These results were similar to those obtained in A. thaliana [11,16] and Gossypium hirsutum [15], suggesting that some PERK genes play important roles in reproductive developmental processes and might be related to sterility and fertility.
Thus, we assayed 6 RNA-Seq datasets from B. rapa male-sterile mutants [26,28], GMS [29,30] and CMS [27,31] lines, and found that 9 BrPERK genes were downregulated in at least five MS lines (Fig. 6a). Among these genes, Bra024608.1 and Bra016342.1 were paralogous genes and they were all orthologs of IGI1 (AT1G23540); Bra022359.1, Bra001723.1 and Bra037558.1 were also paralogous genes, and they were all orthologs of AtPERK6 (Fig. 4, Table S5). IGI1 is strongly expressed in the anthers, and the igi1 mutant exhibited semisterility and smaller siliques [22]. In addition, a phosphoproteomic study of mature Arabidopsis pollen grains showed that AtPERK6 (AT3G18810) was phosphorylated during pollen grain development [23]. In our study, Bra024608.1 and Bra016342.1 were found to be downregulated during anther development from the tetrad period onward, and Bra001723.1 and Bra037558.1 were downregulated beginning in the meiosis II period (Bra022359.1 could not be detected; Fig.  6b and c). Thus, these results suggested that the four BrPERK genes might be related to male sterility.
Further analysis of the coexpression network revealed 19 genes that were coexpressed with 13 downregulated BrPERKs (except Bra004331.1 in Fig. 6a) screened by STRING (Fig. 7a), and 13 coexpressed genes were identified in the 6 male-sterile RNA-Seq datasets, almost all of which were also downregulated (Fig. 7b). Among these coexpressed genes, no gene coexpressed with Bra024608.1 and Bra016342.1 was screened; nevertheless, 7 genes coexpressed with Bra001723.1 and Bra037558.1 were identified (Fig. 7c, Table S6). In addition, it was reported that Bra001723.1 (Unigen37636) was participated in anther and pollen development [27], and this gene was involved in starch and sucrose metabolism (ko00500) and other metabolic pathways (ko01100) ( Table S6). These results demonstrated that Bra001723.1 and Bra037558.1 interact in the regulation of anther and pollen development and the downregulation of these genes might be related to male sterility in B. rapa. However, further research is required to investigate the detailed regulatory mechanisms of Bra001723.1 and Bra037558.1 related to male sterility in B. rapa.

Conclusions
In this study, we identified 418 PERK gene family members in 16 Brassicaceae species and characterized 25 BrPERK genes in B. rapa. Analyses of phylogenetic relationships, conserved motifs, gene structures, cis-elements, gene duplication and synteny were performed to comprehensively evaluate their biophysical properties in B. rapa. Additionally, expression pattern assays based on RNA-Seq and qRT-PCR data revealed that some PERKs present tissue-specific patterns and are involved in reproductive development, especially pollen development. Combined analyses of RNA-Seq datasets from six malesterile lines of B. rapa demonstrated that 13 BrPERK genes were downregulated in the buds of at least four male-sterile lines, possibly indicating crucial roles in pollen tissue. The expression profiles of these BrPERK genes showed that Bra024608.1, Bra016342.1, Bra001723.1 and Bra037558.1 were downregulated during anther development, suggesting that their abnormal expression might be related to male sterility. Furthermore, the coexpression networks assay exhibited that only Bra001723.1 and Bra037558.1 can be detected the coexpressed genes among these four PERK genes. These results provided basic information for studying the mechanism of Bra001723.1 and Bra037558.1 function in the male sterility of B. rapa.

Identification of PERK genes in 16 Brassicaceae species
The sequences of AtPERK family proteins were downloaded from the TAIR website following Nakhamchik et al. [16] and served as queries in BLAST searches to identify PERK members in the other 15 Brassicaceae species in the Brassica Database (BRAD) [32]. After the removal of redundant sequences, all candidate protein sequences were verified using the SwissProt database [45] and the SMART [46], Pfam [47] and InterProScan tools [48]. The confirmed protein sequences were aligned and used to generate a phylogenetic tree in MEGA 7.0 [49]. The conserved amino acid sequence logos were constructed with the WEBLOG tool [50].

Structural analysis of PERKs
The biophysical properties, protein localization sites, signal peptides, and transmembrane helices of the proteins were predicted using the ExPASy program [51], Prot-Comp (v6), PSORT Prediction, SignalP [52] and TMHMM Servers (v2.0), respectively. Information on the exon/intron structure, conserved motifs and domains of BrPERKs was obtained from BRAD with the MEME [53] and CDD [54] tools. The predicted motifs were annotated by InterProScan [48]. Tbtools (v0.6679) [55] was used for visual analysis. The promoter regions (1.5 kb upstream sequences) of all BrPERK genes were used to predict cis-acting elements in the PlantCARE database [56], and promoter structures were obtained with the GSDS (v2.0) [57] tool.

Genomic distribution and collinearity analyses
The distribution of the BrPERK genes on chromosomes was searched and plotted using Tbtools. Synteny analysis among the three Brassica crops and A. thaliana genomes was conducted via all against all BLASTP comparisons. The entire protein sequences were used as queries to search the corresponding protein sequence data (E < 1e − 5 , top 5 matches). Collinear pairs were extracted using MCScanX [58] to identify syntenic blocks and duplications within the PERKs.

Calculation of K a /K s values
The rates of synonymous (K s ) and nonsynonymous (K a ) substitutions were calculated for duplicated BrPERK genes with the KaKs calculator [59]. K s values > 2.0 were discarded to avoid the risk of substitution saturation, and the divergence time was calculated according to Khan et al. [35] and Yuan et al. [44].

RNA-Seq data analysis
Seven raw RNA-Seq datasets of B. rapa plant were obtained from the NCBI Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) [26][27][28][29][30][31]60]. One of these datasets consisted of the genome-wide transcriptomes of tissues [60], and the other six datasets consisted of the transcriptomes of male-sterile lines, including two male-sterile mutants [26,28], two GMS lines [29,30] and two ogu-CMS lines [27,31]. These male-sterile lines are no pollen phenotype and their anther abortion occurs during the tetrad period. The raw data were filtered and mapped onto the B. rapa genomes (V1.5 and V3.0). Gene expression levels were estimated using Reads Per Kilobase of exon model per Million mapped reads (FPKM) values [61], and differentially expressed genes (DEGs) were determined with the DESeq or DEGseq software package [62]. The transcriptome expression of PERKs was illustrated via heatmap analysis [27].

Microscopy observations, gene expression and interaction network analyses
The ogu-CMS line 4-2B and its 4-2A maintainer line from wucai (B. rapa ssp. Chinensis var. rosularis Tsen) were used as the plant materials in this study, as described previously [63,64]. Different developmental stages of buds from CMS and fertile lines were used for paraffin section and qRT-PCR analyses (Table S7) according to our previous reports [27,63,65]. The interaction network associated with BrPERK genes was constructed using the STRING database [66] and Cytoscape software [67].