Distinct sequence patterns in the active postmortem transcriptome

Our previous study found more than 500 transcripts significantly increased in abundance in the zebrafish and mouse several hours to days postmortem relative to live controls. The current literature suggests that most mRNAs are post-transcriptionally regulated in stressful conditions, we rationalized that the postmortem transcripts must contain sequence features (3 to 9 mers) that are unique from those in the rest of the transcriptome – specifically, binding sites for proteins and/or non-coding RNAs involved in regulation. Our new study identified 5117 and 2245 over-represented sequence features in the mouse and zebrafish, respectively. Some of these features were disproportionately distributed along the transcripts with high densities in the 3-UTR region of the zebrafish (0.3 mers/nt) and the ORFs of the mouse (0.6 mers/nt). Yet, the highest density (2.3 mers/nt) occurred in the ORFs of 11 mouse transcripts that lacked UTRs. Our results suggest that these transcripts might serve as ‘molecular sponges’ that sequester RNA binding proteins and/or microRNAs, increasing the stability and gene expression of other transcripts. In addition, some features were identified as binding sites for Rbfox and Hud proteins that are also involved in increasing transcript stability and gene expression. Hence, our results are consistent with the hypothesis that transcripts involved in responding to extreme stress have sequence features that make them different from the rest of the transcriptome, which presumably has implications for post-transcriptional regulation in disease, starvation, and cancer. ABBREVIATIONS UTR untranslated regions ORFs open reading frames OP overabundant transcript pool CP control transcript pool FP false positive RBP RNA binding proteins ncRNA non-coding RNA miRNA microRNA

containing ambiguous nucleotides (i.e., 'N's) and those less than 100 nt in length were 1 1 4 removed. The final "clean" data sets were used for bioinformatic analyses.   e  a  d  i  n  g  a  s  e  q  u  e  n  c  e  i  n  t  o  C  G  R  s  p  a  c  e  .  1  2  6 The processing of a transcript sequence involves converting each nucleotide into x-and 1 5 2 M e r a n a l y s i s 1 5 3 Mer analysis determines the presence/absence/frequency of a mer of length z (where z is 1 5 4 2 to 9) in a gene transcript. Finding a specific mer in a transcript. Let us assume that a database of the x-, y-1 5 6 coordinates of the target sequence has been assembled and we want to determine the 1 5 7 presence/absence of the mer 'AAACAA' in a target sequence. There are three steps. First, we process the mer AAACAA into CGR space to find it x-, y-coordinates, which  Second, we determine the resolution of the search, which depends on mer length (i.e., 1 6 1 resolution = 2 (mer length) ). A 6-mer requires a resolution of 64. The inverse of the 1 6 2 resolution (1/resolution) is the CGR space around the coordinates that contain the specific 1 6 3 mer. The CGR space around the coordinates is expressed by the following equation: Analyses were conducted using SAS/JMP (version 7.0.2), R (version 3.4.0) and 1 7 9 Microsoft Excel (versions 14.3.0 and 11.6). Hierarchical two-way cluster analysis was  and transferred to R to produce the heatmaps with no scaling. Network analysis was 1 8 5 conducted using Gephi 0.9.2. RegRNA 2.0 was used to identify functional RNA motifs and sites in the gene transcripts 8 normalized to the number of possible mer combinations, the maximum difference was 7-2 4 6 mers for the mouse and 5-mers for the zebrafish (Fig 2C). Hence, mers of 5 to 7 nt in 2 4 7 length are optimal for distinguishing between the pools. With increasing mer length, the number of 'unique' mers (i.e., over-/under-represented 2 5 4 mers in the OP) increased (Fig 2B).

5 5
To determine the number of FPs as a function of mer length and test the integrity of the 2 5 6 experimental design, we randomly draw three additional sets of transcripts from the CP 2 5 7 (without replacement) and retained only transcripts not used in the previous analyses.  This mer would be considered 'unique' if its count were more or less than 5 times the  critical values for the mer count is: -9.5 and 25.5. In the OP, the mer occurred 31 times and is therefore considered 'unique' based on the stated criterion (i.e., the count is greater 2 6 9 than 25.5). the criterion (the average ± standard deviation for this mer was 8.1 ± 3.49), there is no FPs for this mer. Of note, this procedure was repeated for all unique mers in the 2 8 6 transcript pools of the mouse and zebrafish, respectively.

8 7
The results show that the number of FPs in the OP was close to zero for mer lengths of up the OP are FPs, the number was small (e.g., 8-mers: 1.0% are FPs in the mouse and 8.9% are FPs in the zebrafish). When the length of mers was 9, however, the number of FPs significantly increased to an 2 9 2 average (± std) abundance of 1240 ± 167.2 for the mouse (31.3% FP) and 571 ± 158.8 for The results are consistent with the notion that unique mers can be identified in the OP by 2 9 5 comparing them to random draws of mers from the CP. However, FPs increased with 2 9 6 mer length. Taken together, over-and under-represented mers were identified in the OP 2 9 7 and many are 5 to 7 nt in length.
The survey of the OP identified 5,117 unique mers in the mouse and 2,245 mers in the 3 0 0 zebrafish (Table 2). Normalized to the total number of combinations of 3-to 9-mers 3 0 1 (n=349,504), this represents ~1.5% of the total mers in the mouse and ~0.6% in the 3 0 2 zebrafish. Of note, 47 of the unique mers were common to both organisms (Table 3).   Table 3. Unique mers common to transcripts of the OP for the zebrafish and mouse. In fact, some of these mers are reverse complements to one another, which is of interest  (Table 4). In the mouse, 218 of the 5,117 mers (4.3 %) reverse complemented 3 1 2 one another. In the zebrafish, 31 of the 2,245 mers (1.4 %) were reverse complements.  The distribution of the unique mers was investigated to determine if they were found in  In the zebrafish, the frequencies of the unique mers per transcript varied between pools 3 2 7 (Fig 3). These findings indicate that not all transcripts in the OP have the same number OP, the maximum bin was 150 while the maximum bin in the CP was 100. Some and 300+ bins than those of the CP. Therefore, some zebrafish transcripts in the OP have 3 3 2 many more unique mers than others. In terms of multiple occurrences of unique mers in the zebrafish, the distributions 3 3 4 differed by pool also, with multiple unique mers occurring within the same transcript 3 3 5 when compared to controls (Fig 3B). For example, about 87 of the OP transcripts had 3 3 6 more than 300 multiple unique mers compared to about 40 in the CP (Fig 3D). Hence, 3 3 7 not only are there many more unique mers in the OP but, in some cases, there are 3 3 8 multiple occurrences of the same mer in the same transcript. In the mouse, the frequency distribution of unique mers per transcript was also different 3 4 0 between the pools (Fig 4A). Specifically, there was almost double the number of unique Taken together, the distribution of unique mers in the OP differs from those in the CP.

6 9
Furthermore, there appears to be differences in multiple unique mers of these transcripts 3 7 0 in the zebrafish but less so in the mouse.
Based on the previous analyses, we rationalized that some transcripts in the OP might zebrafish (Fig 5) and from 40 to 1407 (of a total of 5117) in the mouse (Fig 6). Hence, ones. Similar to the situation with the transcripts, the relationship among the mers was 4 0 2 not straightforward-there appears to be a pattern. the zebrafish contained relatively similar counts of mers within the mer sets 5 as well as 4 0 5 19 (Fig 5). Similarly, in the mouse, all transcript groups had similar counts of mers 4 0 6 within the mer set 2 (Fig 6). Hence, despite similarities and differences of the collapsed 4 0 7 data, there are common sets of mers found within all transcripts. transcript groups (Fig 5) and each group consists of a single transcript. Group J 4 1 0 represents the transcript si_ch211-69b7.6, whose function is currently not known, and  mouse (and we will show why below).

5 0
The averaged (± stdev) density of multiple mers for the zebrafish was 0.14 ± 0.18 4 5 1 mers/nt (n=230) and for the mouse was 0.40 ± 0.67 mers/nt (n=333). That is, there are 14  The highest and lowest densities of unique mers also differed between organisms. In the 4 5 6 zebrafish, the highest density was ~1.0 mers/nt for Pimr gene transcripts, which 4 5 7 corresponds to clusters B and H (Fig 5), and the lowest density was ~0.04 mers/nt for transcript length to find that the Pimr gene transcripts are distinctly different (red dots) 4 6 0 from those in the rest of the transcripts in the OP (black dots) (Fig 7A). The Pimr genes transcript length (y=0.1x; R 2 =0.91; with x is transcript length and y is multiple mers). In the mouse, the highest density was ~2.6 mers/nt for annotated transcripts that do not Gm2007, Gm4631) and were associated with Cluster B (Fig 6) and the lowest density transcripts (red dots) when compared to the rest (black dots) (Fig 7B). The red dots  The remaining transcripts have a linear relationship between multiple mers and transcript  mers, no patterns could be found in the AU-rich elements, K-boxes, UNR boxes, 4 9 0 untranslated region motifs, long stem loop structures or transcriptional regulatory motifs 4 9 1 among the 10 functional genes. Therefore, we concluded that the gene transcripts contain 4 9 2 putative ncRNA hybridization regions -but we have no supporting evidence that these  We rationalized that the transcripts with high mer densities might act as molecular 4 9 5 sponges to RBPs and ncRNAs and thus alter their availability in the intracellular pools. If so, one would expect the profiles (i.e., transcript abundance by postmortem time) of 4 9 7 transcripts with high densities and those transcripts affected by them to be highly rest of the profiles in the OP of the mouse brain. Network analysis was used to find 5 0 2 shared mer binding sites.

0 3
The two axes of the ordination plot accounted for 96% of the variability (Fig 8A). There 5 0 4 appears to be three distinct areas in the ordination plot. One location is occupied by To investigate the connections within the networks, we took a subset of the transcripts 5 1 8 with high R 2 s (>0.95), and counted the number of shared mers. A network plot revealed 5 1 9 that transcripts with high mer densities are connected to many different transcripts and 5 2 0 that some shared similar mers. For example, Gm14305 shared mers with Gm11007, 5 2 1 Gm2007, Gm14308 and Hhmt1 as wells as many other transcripts (Fig 8B). This finding 5 2 2 suggests that the number of possible transcripts (and pathways) that are affected by 5 2 3 molecular sponges appears to be quite vast.

2 4
Taken together, the results suggest that mer density is not the same in all OP transcripts transcript profiles to the transcripts with lower density of mers some of which they share.

2 7
The implications of this finding is that transcripts with high mer densities have the To investigate the density of unique mers by region, up to ten transcripts from each 5 3 2 cluster (Fig 5 and Fig 6) were compared to determine if there are significant differences and/or 3'UTR regions and some lacked ORFs (e.g., ncRNA).

3 5
In the zebrafish, for the transcripts having all three regions, the 3'UTR region had 5 3 6 significantly more mers/nt than the other two regions (Table 5, Paired two-tailed T-tests, mers/nt), indicating regional effects.   In contrast, the highest unique mer densities in the mouse were found in the ORFs of 5 4 5 transcripts -not the 3'UTR region as in the zebrafish ( mer densities than transcripts having all three regions. Moreover, higher mer densities 5 5 0 were found in the ORFs than the 5'UTR (Table 5, Paired two-tailed T-tests, P<0.01).

1
One possible reason for these differences is that the 16 samples having no 3'UTR and the might play a role in these differences.

5
In summary, the results show distinct differences in mer densities by organism and region. In the zebrafish, the highest mer densities were found in the 3'UTR while the 5 5 7 highest densities in the mouse were found in the ORFs. The following motifs are associated with increased mRNA stability or gene expression: Our initial hypothesis was that among multiple reasons, there must be a signal, i.e., a 5 9 7 nucleotide sequence that is responsible for postmortem activation of certain transcripts.

9 8
Instead, we find sets of 'unique' mers in different groups of transcripts, with most sets 5 9 9 consisting of ten to hundreds of different mers --not just one or two.
The total number of unique mers in the OP was relatively small compared to all possible 6 0 1 mers, ~1.5% of the total combinations of 3-to 9-mers in the mouse and ~0.6% in the 6 0 2 zebrafish. These small percentages are presumably due to the arbitrary criterion used to of the average count of the mer in the CP was to ensure that the identified mers were not 6 0 5 due to random chance (i.e., false positives, FPs). Our results indicate that chance of a 6 0 6 random mer having a count exceeding the criterion was relatively rare --but FPs did 6 0 7 occur and their occurrence increased with mer length (Fig 2D).
The fact that several mers identified in our study have been previously reported to be 6 0 9 involved with increased gene expression and/or mRNA stability (e.g., Hud, Rbfox, ARE The number of unique mers in each transcript of the OP varied considerably. Some organ/tissues. One set of unique mers with the sequence YUNNYUY apparently binds Hud proteins 6 5 1 (Table 6). Hud proteins stabilize mRNA by binding to AU-rich instability elements 6 5 2 (AREs) in the 3'UTR and they target transcripts involved in neuronal differentiation, the 230 of the zebrafish, indicating that miRNAs might be involved in "regulating" the 6 9 9 postmortem transcriptome (Table 6). hypothesis", which is provided at the end of the Discussion. However, without 7 0 5 experimental evidence, we caution that these scenarios are speculative at best. One scenario is transcript stability is increased in the OP because they have more unique 7 0 7 mers than the CP and RBPs bind to regulatory sites of transcripts of the OP blocking the 7 0 8 binding of miRNAs, which are linked to degradation pathways. As a consequence, environmental change. We assumed that the mRNA transcripts downloaded from NCBI represent dominant 7 5 1 isoforms one would expect to find in nature. However, a recent study [3] suggests that 7 5 2 stress increases the production of different isoforms through alternative splicing. In other 7 5 3 words, the composition of the transcripts might change in stressful conditions (i.e., 7 5 4 different isoforms are produced). Our analysis did not account for this, however expression, which is the focus of our future research. According to the 'competing endogenous RNA' hypothesis, all types of RNA transcripts 7 6 0 communicate through regulatory-binding sites and it is these interactions that regulate The authors declare that they have no conflict of interest.   Enzymes and mechanisms. Biochim Biophys Acta 1863:3125-3147. P  P  L  E  M  E  N  T  A  R  Y  I  N  F  O  R  M  A  T  I  O  N   9  5  5   9  5  6   1  .  O  n  l  i  n  e  R  e  s  o  u  r  c  e  _  1  .  T  i  t  l  e  :  O  n  l  i  n  e  R  e  s  o  u  r  c  e  _  1  .  d  o  c  x  .  D  e  s  c  r  i  p  t  i  o  n  :  P  r  o  o  f  t  h  a  t  u  s  i  n  g  9  5  7   t  h  e  '  C  h  a  o  s  G  e  n  o  m  e  R  e  p  r  e  s  e  n  t  a  t  i  o  n  '  m  e  t  h  o  d  t  o  e  x  t  r  a  c  t  m  e  r  s  f  r  o  m  t  h  e  t  r  a  n  s  c  r  i  p  t  9  5 B  a  s  i  c  a  l  l  y  ,  f  i  r  s  t  c  o  l  u  m  n  i  s  a  b  u  n  d  a  n  c  e  o  f  m  e  r  i  n  O  P  ,  s  e  c  o  n  d  c  o  l  u  m  n  i  s  a  v  e  r  a  g  e  1  0  0  1   a  b  u  n  d  a  n  c  e  i  n  C  P  ,  t  h  i  r  d  c  o  l  u  m  n  i  s  s  t  a  n  d  a  r  d  d  e  v  i  a  t  i  o  n  i  n  C  P  ,  a  n  d  r  e  m  a  i  n  i  n  g  3  0  1  0  0  2   c  o  l  u  m  n  s  a  r  e  a  b  u  n  d  a  n  c  e  s  o  f  3  0  r  a  n  d  o  m  d  r  a  w  s  f  r  o  m  C  P  .  R  o  w  s  d  i  f  f  e  r  b  y  m  e  r  1  0  0  3   l  e  n  g  t