Alternative Polyadenylation in Response to Temperature Stress Contributes to Gene Regulation in Trees

Background Genome-wide change of polyadenylation (polyA) sites (also known as alternative polyadenylation, APA) is emerging as an important strategy of gene regulation in response to stress in plants. But little is known in woody perennials that are persistently dealing with multiple abiotic stresses. Results Here, we performed a genome-wide proling of polyadenylation sites under heat and cold treatments in Populus trichocarpa. Through a comprehensive analysis of polyA tail sequences, we identied 25,919 polyA-site clusters (PACs), and revealed 3429 and 3139 genes shifted polyA sites under heat and cold stresses respectively. We found that a small proportion of genes possessed APA that affected the open reading frames; and some shifts were commonly identied. Functional analysis of genes displaying shifted polyA tails suggested that pathways related to RNA metabolism were linked to regulate the APA events under both heat and cold stresses. Interestingly, we found that the heat stress induced a signicantly more antisense PACs comparing to cold and control conditions. Furthermore, we showed that a unique cis-element (AAAAAA) was predominately enriched downstream of PACs in P. trichocarpa genes; and this sequence signal was only absent in shifted PACs under the heat condition, indicating a distinct APA mechanism responsive to heat tolerance. Conclusions This work provides a comprehensive picture of global polyadenylation patterns in response to temperatures stresses in trees. We show that the frequent change of polyA tail is a potential mechanism of gene regulation responsive to stress, which are associated with distinctive sequence signatures. trichocarpa are lacking. Here, we report the genome-wide identication of APA proles in responsive to heat and cold stresses using the PAS-seq platform in P. trichocarpa. We show that the change of polyA tails is widespread under the temperature stresses, and genes related to RNA metabolism are linked to APA-mediated post-transcriptional regulation. We also provide molecular evidence of unique sequence signatures associated with stress-related shifts of polyA sites in P. trichocarpa.

affected the open reading frames; and some shifts were commonly identi ed. Functional analysis of genes displaying shifted polyA tails suggested that pathways related to RNA metabolism were linked to regulate the APA events under both heat and cold stresses. Interestingly, we found that the heat stress induced a signi cantly more antisense PACs comparing to cold and control conditions. Furthermore, we showed that a unique cis-element (AAAAAA) was predominately enriched downstream of PACs in P. trichocarpa genes; and this sequence signal was only absent in shifted PACs under the heat condition, indicating a distinct APA mechanism responsive to heat tolerance.
Conclusions This work provides a comprehensive picture of global polyadenylation patterns in response to temperatures stresses in trees. We show that the frequent change of polyA tail is a potential mechanism of gene regulation responsive to stress, which are associated with distinctive sequence signatures.

Background
Alternative polyadenylation (APA) is a common post-transcriptional process of eukaryotic mRNA maturation, which generates altered 3' ends of transcripts. The use of different polyadenylation site (polyA) has been found to play essential roles of increasing transcriptome diversity and regulating gene expression pro les [1,2]. With the support of high throughput sequencing technologies, designated RNA sequencing approaches, such as 3′READS+ [3], PolyAdenylation Site Sequencing (PAS-Seq) [4] and Poly(A)-ClickSeq [5], that target the polyA tails of mRNA, have been developed for accurate pro ling of APA.
In plants, APA is widespread and has been extensively investigated. Genome-wide identi cation of APA has been reported in a number of plant species, including Arabidopsis [6,7], rice [8], bamboo [9], and more. A recent constructed database (PlantAPAdb) has recorded the APA information from six plant species for evolutionary and mechanistic analyses [10]. These studies have revealed an abundant occurrence of APA and suggested important regulatory functions at the molecular level in plants. It is clear that APA is an essential layer of post-transcriptional gene regulation involved in plant growth, development and response to endogenous and environmental signals. Generally, APA sites locate in the 3' region of transcripts. APA sites from the 3' untranslated regions (3'UTR) can affect the mRNA stability, translation, cellular localizations to regulate gene functions [1,11]; APA also can be inside an open reading frame (ORF) and thereby regulate the protein function [11]; in addition, studies of the Flowering inhibitor Locus C (FLC) have shown that usage of polyA site in the antisense transcript is required for determining owering transition in Arabidopsis and other plant species [12,13]. Although a great number of plant APA sites is identi ed, the regulatory mechanism of APA associated gene expression still is poorly understood.
In recent years, the APA mediated gene regulation is found to play an important role in stress responses.
Notably, abiotic stresses can induce APA events to generate distinct transcript isoforms in diverse plant species [14][15][16]. For example, in Arabidopsis, hypoxia has been found to induce the usage of noncanonical (e.g. 5'UTR and protein-coding) polyA sites, which suggested a negative role of gene regulation [15]. In addition, genome-wide pro ling of APA in sorghum has shown that some stress-induced isoforms were associated with a unique intronic polyA signal [17]. These results indicate that global shift of polyA site in response of stress is a strategy of survival which affect a wide-range of gene functions.
Populus trichocarpa is a model tree species with comprehensive genome information as well as signi cant ecological and commercial importance [18]; and as a perennial plant, the growth and development of P. trichocarpa face multiple levels of abiotic stresses. Small-scale characterization of APA sites targeting 14 NAC (NAM, ATAF1/2, CUC2) genes has been reported in P. trichocarpa, and three APA transcripts have been found differentially expressed during hormone treatments [19]. Furthermore, functional characterization of APA isoforms of Secondary Wall-Associated NAC Domain 1 (SND1) has shown that truncated SND1 can negatively regulate the function of full SND1 [20]. It is conceivable that APA-mediated gene regulation is a key aspect for development and stress response in P. trichocarpa. However, genome-wide pro ling and analysis of APA in P. trichocarpa are lacking. Here, we report the genome-wide identi cation of APA pro les in responsive to heat and cold stresses using the PAS-seq platform in P. trichocarpa. We show that the change of polyA tails is widespread under the temperature stresses, and genes related to RNA metabolism are linked to APA-mediated post-transcriptional regulation. We also provide molecular evidence of unique sequence signatures associated with stressrelated shifts of polyA sites in P. trichocarpa.

Result
The processing and mapping of PAS-Seq reads for identi cation of polyA-tail associated sequences To study the alternative polyadenylation pattern in P. trichocarpa, we performed a high-throughput transcriptome sequencing targeting the polyadenylation sites by the PAS-Seq approach. The polyA enriched libraries were constructed under different temperature stresses (4 °C and 40 °C), and the normal growth condition (24 °C) was set as the control. We sequenced the libraries using the Illumina HiSeq X Ten platform to generate 150 nt pair-end sequencing reads, and obtained 36-43 million (Supple. Table   S1) raw reads for each library. The clean reads of all samples were mapped to the P. populus genome reference. We found that 2.98% of total reads were located in the 3' gene untranslated region (3'UTR) ( Fig. 1A), and the majority of reads were mapped to the intergenic region (70.67%, Fig. 1A).
To retrieve the sequences upstream of the polyA tail, we rst anchored the region containing at least 8 consecutive As, and then went through upstream for regions with at least 2 non-A and clipped the sequences of polyA tail; and the remaining reads with more than 11 nt in length were used for polyA signal analysis (Supple. Table S2). The PAS-seq reads were further ltered with the polyA sequences which resulted in the " nal PAS-reads" for each sample (Supple . Table S2). Then, the " nal PAS-reads" were re-mapped to the reference genome to reveal the genomic location information. We found that the majority of " nal PAS-reads" were from the 3'UTR region (Fig. 1B-D); and about 10% of reads were located in the intergenic areas in all samples ( Fig. 1B-D); and small proportions of the reads were found in CDS, 5'UTR, antisense, and intronic regions ( Fig. 1B-D). These results were agreeable with our experimental designs, suggesting that the pipeline of reads processing was credible to downstream analyses of polyA tails. Global characterizations of polyA tails based on the PAS-Seq sequences To reveal the abundance of expressed genes, we used the " nal PAS-reads" to evaluate the gene expression levels and found that 25,068 genes out of 41,335 had a higher RPKM (Reads per kilo base of a gene per million reads) value than 1. The cumulative expression levels were plotted for each library, and the average RPKM value of 50 percent was between 2-4 in different samples ( Fig. 2A). To investigate the distribution pattern of reads along a transcript, we assessed the location of reads based on the annotation of gene model. We showed that the peak of reads distribution was evident at the 3' region of transcripts (Fig. 2); when assessing the 1 kb upstream and downstream areas of TTS and stop codon, we found that the peak of reads were located closely upstream of TTS and downstream of stop codon ( Fig. 2B-C). These results implied that the processed reads were e cient to determine the 3'-ends, including 3'UTR sequences and the polyA signals.

Changes of PACs between different temperature stress conditions
The polyA containing reads were subjected to clustering analysis to identify the polyA-site cluster (PAC) based on the location and relative abundance of PAS-reads. The samples of different treatments were independently analyzed for PAC identi cation, and combined reads were also used to reveal the total PAC sites. We showed that the PAC sites were dominantly located in the 3'UTR regions of genes (Supple. Table   S3). The nucleotide composition pattern around PAC sequences were studied. The 50 bp upstream and downstream regions were scanned to display the nucleotide composition using combined reads (Fig. 3A). We revealed that the regions were dominant with A and U, and A and U showed distinct but complementary distribution patterns (Fig. 3A). A spike of A was revealed at the polyA site indicating the cleavage region; and a spike of C was also found just in front of the A spike (Fig. 3A). The pro le of the nucleotides was consistent with polyA sites in other plant species suggesting a conserved mechanism of polyadenylation.
To determine the signal elements around PAC sequences, the 50 bp upstream and downstream regions were examined to identify the patterns of sequences. The 6-nt window of nucleotides was used to reveal the common patterns (Loke et al. 2005). The top 30 signals were displayed, and we found that "AATAAA" and "ATAAAA" were most frequent in all samples except for the all "T" and all "A" hexamers; and the locations of "AATAAA" and "ATAAAA" were consistent between samples around the 16-26 nt upstream area (Fig. 3B), supporting that those were canonical polyA signals. Interestingly, we showed that there was one unique hexamer that was speci c to sample: "ATTTTG" in control, "ATTTTTA" in cold stress, "TTTTAA" in heat stress (Fig. 3B); and both heat and cold samples missed the G-containing hexamer close to the polyA cleavage site (Fig. 3B). These results suggest that the temperature stress may induce the usage of different polyA signals to generate distinctive alternative splicing isoforms. We also identi ed a conspicuous enrichment of "AAAAAA" 5-10 downstream the polyA cleavage site in all samples (Fig. 3B), supporting a new APA sequence signature in P. trichocarpa. Furthermore, we noticed that a small fraction of PACs was derived from antisense transcripts (Supple . Table S3). To further investigate the occurrence of antisense PACs between samples, we quanti ed the PACs from antisense, and showed that the heat condition extensively promoted the generation of antisense PACs (near 2 fold increase), but not in cold condition (Fig. 3C).

The characterization sequence signatures of shifted PACs between temperature stresses
The genes with multiple PACs (proximal and distal PACs) were analyzed to uncover APA patterns in response to temperature stresses. We found that there were 3429 and 3139 genes that possessed the shifts of PACs under heat and cold stresses respectively (Supple. Dataset 2-3; Fig. 4A). To investigate the signals for usage of shifted PACs, the sequences near PACs in cold and heat conditions were analyzed to compare with the PACs under control condition (Fig. 4B-E). We analyzed the frequencies of hexamers in the 50 upstream and downstream regions, and found that "TTTTTT" and "AAAAAA" were most abundant in all samples, which was consistent with the previous result ( Fig. 4B-E, Fig. 3B). Notably, the enrichment of "AAAAAA" at the downstream of polyA tail cleavage site was missing under heat stress condition, while it remained detectable under cold stress (Fig. 4E). It suggested that the heat stress could have promoted the selection of non-canonical polyA sites that are compromised under normal conditions.

Characterizations of genes with shifted PACs under cold and heat stresses
To understand the temperature stress induced polyA sites, genes with shifted distal and proximal PACs were investigated (Fig. 5). We focused on the genes that were signi cantly responsive to treatments (Kolmogorov-Smirnov test, p-Value < 0.01). We found that only a small proportion of genes both changed the polyA tails under heat and cold, indicating a distinctive responsive mechanism of APA under heat and cold conditions (Fig. 5B); and majority of the change of 3' end was within 250 bp, but the heat condition tended to generate more longer or shorter ends (Fig. 5D-E). We further assessed the potential consequences of the gene coding region, and uncovered that there were 32 and 19 genes with altered CDS under heat and cold conditions respectively, due to the APA events (Fig. 5C).
To investigate the function of genes with shifted PACs, GO enrichment analysis was performed to identify the over-representative functional GO terms (Supple. Dataset 6). We found that heat and cold stresses enriched several GO terms involved in the RNA metabolism, such as "RNA splicing", "RNA processing" and "RNA binding" (Supple. Dataset 6). And GO terms related to direct gene expression regulation, including "regulation of transcription", "regulation of transcription factor" and "transcription by RNA polymerase II", were revealed under the heat condition (Supple. Dataset 6). We further analyzed the functions of shifted genes through KEGG enrichment analysis. We showed that 9 and 12 KEGG pathways were signi cantly enriched under the cold and heat stresses respectively (Fig. 6). Some important pathways related to gene transcription and regulation, including "Spliceosome", "mRNA surveillance pathway", "Basal transcription factors", were both enriched under cold and heat (Fig. 6). Taken together, these results suggested that the cold and heat stresses induced a signi cant change of the polyadenylation process, which might contribute to the stress responses.

Discussion
The survival of plants under some extreme stress conditions (e.g. heat, cold, salt, drought, and etc.) is dependent on the coordinated responses of various biochemical and physiological processes that are controlled by changes of gene expression. In recent years, global change of polyA site has been found to play an important role of gene regulation in plants [21]. Particularly, non-canonical transcript isoforms that are induced by APA under stress conditions are emerging as a transcriptomic signature to understand the transcriptional and post-transcriptional regulation of gene expression [15]. P. trichocarpa, as a model tree species, is an ecologically and economically important woody plant all over the world. With the wide-span distribution of environmental conditions, P. trichocarpa withstands multiple stresses to survive under natural environments [22]. In P. alba, a transcriptomic study using single-molecule long-read sequencing technology has revealed 10213 APA sites including 2212 genes with more than one polyadenylation sites [23]. We have shown here that a widespread APA under the cold and heat stresses in P. trichocarpa. A total of 25919 PACs (TPM > 5) have been identi ed covering over 20000 genes (Supple . Table S3). We have revealed that there are 3429 and 3139 genes that shifted the polyA sites upon heat and cold stresses respectively (Fig. 4A). These results indicate that the change of polyA tail is a common responsive phenomenon to stress in P. trichocarpa. In this work, we nd that around 60-70% of PACs locate in the 3'UTR representing the canonical polyA sites (Supple . Table S3).
For those non-canonical PACs (e.g. 5'UTR, CDS, antisense), a complementary characterization focusing on stress-induced isoforms will be valuable to reveal potential roles of APA-mediated gene regulation. One striking difference between cold and heat is the induction of antisense PACs. The heat stress induced about 2 fold antisense PACs comparing to cold and control conditions. The role of APA-mediated antisense transcripts has been extensively investigated at the FLC locus in Arabidopsis [24]. The subset of FLC antisense transcripts are regulated by various developmental and environmental cues to regulate owering transition [25]. Our results of antisense PACs imply that speci c APA events responsive to heat stress could be involved in generating regulatory antisense transcripts to regulate gene functions.
Through sequence analysis, we show that the signatures associated with polyA tails are conserved with other plant species (Fig. 3A, B); the hexamers, such as "TTTTTT", "AAAAAA", "AATAAA", "ATATAT" and etc., are commonly enriched in the near upstream element regions (NUE) [10]. Also, the signatures remain largely identical between control and stress conditions in P. trichocarpa (Fig. 3B). One exception is that: the G-containing hexamer "ATTTTG" in NUE is missing in heat and cold stresses (Fig. 3B). It suggests that the change of polyA tails under stresses may require the alteration of designated sequence contexts.
A comparative study between sorghum and maize, two closely related monocots, has shown that preference of sequence signatures associated with PACs were in uenced by species and tissue types [26]. A recent study from Camellia japonica has revealed that certain hexamers are preferentially selected for the polyadenylation of coding and non-coding transcripts [27]. Taken these together, our results provide insights of extensive APA events in generating transcriptome diversity and gene regulation responsive to temperature stresses. One disadvantage of PAS-seq is that it doesn't allow for identifying complementary transcript isoforms. Therefore, a direct isoform sequencing would be valuable for investigating the APA-mediated transcripts in future.
The APA sequence signals around the cleavage site has been identi ed as cleavage elements (CE) in Arabidopsis [28]. We nd that, in the genes displayed the shifting of polyA tails under heat and cold stress, the sequence signatures are changed considerably (Fig. 4B-E). Particularly, we have shown that the enrichment of "AAAAAA" downstream the polyA cleavage site was missed under the heat condition (Fig. 4E). In Arabidopsis, the "AAAAAA" is not identi ed from the comparable region, and CE is usually rich in Ts [28]. Interestingly, in a recent study of sorghum, the hexamer "AAAAAA" has been revealed prominently in PACs from both 3'UTR and intron at the same region; and heat, salt and drought stresses do not affect the enrichment [17]. These results indicate that the heat stress could induce the usage of speci c sequence signals to confer unique APA-mediated transcripts. But the mechanism of the speci city requires further information from functional analysis of proteins involved in the polyA site selection.

Conclusions
We adopt the genome-wide approach to characterize the APA pro les under cold and heat stresses in Populus trichocarpa. We show that a large number of genes shift polyadenylation sites under temperature stresses; many genes are linked to the functions of RNA metabolism. We nd a unique sequence signal associated with polyadenylation sites that is not reported in other plants. Detailed analysis on the genes with shifted polyadenylation sites under heat and cold conditions leads to the identi cation of a sequence signature speci c to heat response in Populus trichocarpa.

Plant materials and growth conditions
The P. trichocarpa Nisqually-1 clone was obtained from Dr. Chiang group (Northeast Forestry University, China) and maintained in a greenhouse as described [29]. For temperature treatments, the aseptic cuttings of P. trichocharpa (8-10 cm) were kept in the rooting media [30] for about 30 days before the treatments. The growth chamber was under long-day conditions (16-h light/8-h dark) at 24 °C and 40% humidity. To perform the low temperature treatment, a freezer was controlled by a temperature sensor (PURUI G6000, Ningbo, China). To perform the high temperature treatment, an incubator was set to appropriate temperature prior to the experiment to stabilize the internal temperature. Whole seedlings of at least three individuals were collected for sample preparation.
PAS-seq library construction, sequencing and data processing For the sequencing library construction, total RNA was treated with RQ1 DNase (Promega, Madison, USA) to remove DNA. The quality and quantity of the puri ed RNA were determined by measuring the absorbance at 260 nm/280 nm (A260/A280) using smartspec plus (BioRad, Munich, Germany). RNA integrity was further veri ed by 1.5% agarose gel electrophoresis. For each sample, 5 µg of total RNA was used for PAS-seq library preparation using SMART RT system. In brief, polyadenylated mRNAs were puri ed with oligo (dT)-conjugated magnetic beads (Invitrogen, USA). Puri ed RNA was fragmented, and reverse transcription was performed with a modi ed RT primer harboring dT18 and two additional anchor nucleotides at the 3' terminus. Then DNA was synthesized with Terminal-Tagging oligo DNA using ScriptSeq™ v2 RNA-Seq Library Preparation Kit (Illumina, USA). The cDNAs were puri ed and ampli ed, and PCR products corresponding to 300-500 bp were puri ed, quanti ed and stored at -80 °C before sequencing. For high-throughput sequencing, the libraries were prepared following the manufacturer's instructions and applied to Illunima HiSeq X Ten system for 150 nt paired-end sequencing. The reads were ltered for quality checking, and only the end 1 sequences of clean reads were used for downstream analyses. The sequencing and initial reads processing were performed by ABlife Inc. (Wuhan, Hubei Province, China). All sequencing data were deposited under National Center for Biotechnology Information Bioproject accession PRJNA61765.

Sequence mapping and PAC identi cation
The reference genome of Populus trichocarpa (version 3.1) was downloaded from Phytozome [31]; https://phytozome.jgi.doe.gov). The reads mapping was performed by TopHat2 allowing 2 mismatches [32]. To obtain the expression abundance, the RPKM (Reads per kilobase of a gene per million reads) value was calculated [33].
For identi cation of polyA-site cluster (PAC), the 3' mapped polyA reads were initially determined as polyA sites; and then quanti ed based on the Tag Per Million (TPM) method [TPM (PAC) = reads mapped to polyA site (PAC)*1000000/total reads]. The identi cation of PAC was performed using the CAGEr package [34]; in brief, the polyA sites within 20 nt with TPM over 0.5 were clustered. The PAC within 100 nt in different PAS-Seq libraries were further clustered to generate the PAC sequences (Supple. Dataset 1). For APA analysis, PAC sequences with only one polyA site or TPM less than 3 were ltered. The total PAC sites were independently determined through combing the sequencing reads with TPM cutoff of 5. To identify genes with shifted PACs, the difference of PAC locations was calculated by CAGEr to obtain shift score (Supple. . The Kolmogorov-Smirnov test was performed to identify signi cant shifts of PACs with p-value < 0.01 (Supple. Dataset 2-3).

Nucleotide composition and sequence signature analysis
The abovementioned PACs for each sample were used for motif analysis. And the 50 bp upstream and downstream sequences of each PAS were extracted. For the nucleotide distribution analysis, the composition of each PAS at each position were calculated. And the sequence motifs were analyzed using the SignalSleuth2 [28]with the following options: k = 6 (where k is the length of the motif) and top 30 motifs.
For stress-induced analysis, sequences of genes with shifted PACs were extracted according to the position information (Supple.  using an in-house python script. The PACs for each treatment were divided into two groups: one was the PACs under control condition, and another one under the treatment condition. The sequence motif analysis was performed using SignalSleuth2 as mentioned above, and only the PACs shift distance over 50 were used for the analysis.

Functional enrichment analysis
For functional analysis of shifted genes, the annotation information from Gene Ontology and Kyoto Encyclopedia of Genes and Genomes of P. trichocarpa genes were obtained. The enrichment of GO terms and KEGG pathway was identi ed based on the signi cance of the hypergeometric tests, and further corrected by FDR (Hochberg). The corrected p-values less than 0.05 were determined as signi cant enrichment. Signi cantly enriched GO terms were categorized into molecular function, biological process and cellular component as listed in Supple. Dataset 6. For KEGG analysis, the enrichment ratio was calculated as: Enrichment ratio of each KEGG pathway = subset of genes / total number of pathway genes.