For the most part, the results described in this report indicate a broad evolutionary conservation of the polyadenylation complex, with plants possessing identifiable orthologs of all of the core mammalian polyadenylation factor subunits. However, there are interesting aspects of the sets of genes that encode these subunits in plants. Thus, of the sixteen identifiable orthologs of the subunits of the core mammalian polyadenylation complex, seven are encoded by more than one gene in at least one of the plant species studied. (Note that, for the sake of this discussion, the apparent duplication of virtually all genes in G. max is not considered, nor is the Arabidopsis-specific CLPS5 gene.) The subunits encoded by single genes are orthologs of the core subunits of CPSF and CstF. With two exceptions (CPSF73-I and Fip1), subunits encoded by expanded gene families reside in other factors in mammals (e.g., CFIm and CFIIm) or they play roles in the last step of the process (poly(A) tail addition and poly(A) length control). The degrees and evolutionary timing of expansion of the various gene families vary greatly, ranging from events that involved but one lineage (e.g., CPSF73-I, CLPS5) to those that occurred before the divergence of the higher plant lineages, but after the divergence of higher plants from Selaginella.
These considerations lend themselves to a model where the plant polyadenylation complex consists of a core (consisting of the CPSF and CstF subunits) that is rather rigid in terms of evolutionary conservation, and an associated panoply of peripheral subunits. These peripheral subunits likely do not all exist in a single large, monolithic complex, but rather associate in various and sundry combinations with the CPSF/CstF core; this is because many of the peripheral subunits are isoforms of other subunits and likely interact with the same site(s) of the CPSF/CstF core, and thus are expected to assemble in mutually-exclusive manners. Therefore, the polyadenylation complex may actually be a collection of somewhat distinct assemblies, each with different representatives of the products of the gene families. Such a complex would be amenable to considerable evolutionary and physiological flexibility. Different combinations of peripheral subunits may play dominant roles at special times during development, or in response to stresses. While not exactly analogous, this suggestion brings to mind the specialized functioning of the male-specific CstF64 and PAP isoforms in mammals [43–47].
This model may help to explain some of the poorly-understood features of the plant polyadenylation signal. This signal consists of three distinct cis-elements, none of which can be defined by a highly-conserved sequence [48, 49]. Of the eight protein subunits that are encoded by gene families in plants, at least four (CFIm25, CFIm68, FIPS, and PABN) are RNA-binding proteins. If the different members of these families encode proteins with somewhat different RNA sequence preferences, the sum of these preferences might be a degenerate, poorly-defined consensus. The sequence characteristics of the three cis-elements that have been defined by experimental and computational work would reflect a sum of the preferences of the individual RNA-binding isoforms.
This model also has ramifications for possible mechanisms of alternative poly(A) site choice in plants. For example, in mammals, PABN has been implicated in the differential recognition of weak poly(A) signals that are often associated with promoter-proximal poly(A) sites in genes whose de-regulation is associated with oncogenic transformation . There is but a single PABN isoform in mammals; in contrast, plants possess several potential isoforms (Figure 8). This raises the possibility that different sub-complexes may possess different PABN isoforms, and that differential poly(A) site choice would be accomplished by the action of sub-complexes of different PABN composition. The Arabidopsis CPSF30 protein is inhibited in vitro by calmodulin and by sulfhydryl reagents [50, 51]. Should similar effects be manifest inside cells, then this protein should be inactivated in response to various stimuli. The possibility that the CPSF complex may be of variable composition, with CPSF30-independent configurations, would explain why polyadenylation could continue under such circumstances, and is consistent with a role for CPSF30 in alternative poly(A) site choice mediated by differential inactivation of the protein. Compositional variability would lend itself to additional modes of regulated poly(A) site choice through the directed activation or inactivation of specific subunit isoforms. While little is known about this possibility in plants, mammalian orthologs of plant subunits encoded by gene families are known to be subject to modification by phosphorylation, SUMOylation, ubiquitination, and arginine methylation .
An additional layer of complexity in the plant polyadenylation complex is provided by the existence of “partial” protein isoforms, either through alternative RNA processing (as with CPSF30, FIPS, CstF77, symplekin, and CFIm-25; Additional file 2: Figure S13) or coding by separate genes (such as with ESP1 and two of the PCFS variants). These partial proteins possess some of the functionalities of their respective “complete” proteins, but not others; as such they may serve to affect the functioning of other subunits and thus redirect a subcomplex towards a subset of pre-mRNA targets.
Finally, it is noteworthy that, while conserved for the most part, the lower number and distinguishable sequence divergence of C. reinhardtii polyadenylation factors sets C. reinhardtii apart from the rest of the plant linage. Interestingly, it has been demonstrated that C. reinhardtii and other green algae use a different set of poly(A) signals where the UGUAA motif in the near upstream elements is prevalent (found in 52% of the genes) over any other signals . It is probable that the differences in polyadenyation factors contribute to the difference in poly(A) signals, but it is difficult to pinpoint a single subunit as being responsible for the differences (because so many C. reinhardtii subunits are noticeably different from their higher plant counterparts). Further experiments are needed to test this hypothesis.