In this study, 124 histone modifiers (HMs) were identified in tomato. We systematically classified 32 proteins belonging to HATs, 14 to HDACs, 52 to HMTs, and 26 to HDMs.
Tomato HATs
HAGs
The tomato genome encodes 26 proteins showing similarity to the HAG group (Figure 1). One protein (SlHAG14) was found related to the ELP3 family and one (SlHAG4) to the HAT1 family. SlHAG4, in addition to the AT1 domain, also has a MOZ_SAS motif (PF01853) that is typical of MYST acetyltransferases (HAMs) [35]. To date, this combination of domains was never reported. As regards the GCN5 family, no member appears to have been revealed in tomato by preliminary BLAST interrogation. However, a domain-based search allowed us to identify SlHAG1, carrying a C-terminal BrD domain (PF00439) at the 3′-end of Solyc10g045400, as a member of this family. Two other proteins, Solyc02g092260 (SlNAGS1) and Solyc03g043950 (SlNAGS2), identified by domain analysis as plant HAGs, are unlikely histone acetylases. In fact, in addition to AT1 they have an AAK domain (PF00696) that characterizes proteins involved in aminoacid synthesis [36]. Interestingly, we found another family corresponding to HPA2-like HAGs thought previously to be specific to fungi [6]. The tomato HPA2 family includes most HAGs, namely 23 members, SlHAG2, SlHAG3, SlHAG5 to SlHAG13, and SlHAG15 to SlHAG26.
In order to infer the phylogenic history of tomato HAGs, we compared them with Arabidopsis, maize and rice orthologs. HAGs are distributed in six main clades with high bootstrap values (Figure 1), three of which include monocots and dicots while the other three include only dicots. Each clade contains one tomato HAG family, except HPA2 whose members are split into two clades. Interestingly, a subclade of HPA2 members includes eight genes (SlHAG11, SlHAG19-22, SlHAG24-26) that are all closely localized on chromosome 8 in a cluster of about 82 Kb. This finding suggests that the ancestral locus experienced a series of tandem duplication events.
The existence of so many HAG members in the tomato proteome compared with Arabidopsis as well as monocots led us to investigate HAGs in Arabidopsis in greater depth. BLAST search using the AT1 domain as a query returned 33 proteins in Arabidopsis, thereby giving a number close to tomato. Based on the domain composition, in Arabidopsis we identified At2g22910 and At4g37670 which in addition to AT1 have the AAK domain (PF00696). Similarly to tomato, it is likely that these two proteins are not histone acetylases. Phylogenetic analysis of tomato and Arabidopsis HAGs indicates that the different subgroups evolved differently in these species (see Additional file 1). For example, gene duplication events giving rise to the subgroup including SlHAG19 to SlHAG26 likely occurred only in tomato while the orthogroup that comprises SlHAG6 appears to have experienced an expansion only in Arabidopsis.
As mentioned above, SlHAG4 is a peculiar HAG, having both the typical HAT1_N domain and an MOZ_SAS domain. In order to understand the origin of this combination of domains, we performed extensive research through Interpro (http://www.ebi.ac.uk/interpro) into the genomes of fully sequenced organisms (http://www.ncbi.nlm.nih.gov/sites/genome) and particularly in plants (http://www.phytozome.org). Intriguingly, a domain structure similar to that of SlHAG4 was found mostly in plants and additionally in the brown alga Ectocarpus siliculosus (Chromoalveolata) and in Trichoplax adhaerens (Animalia). The existence of SlHAG4-like proteins in different organisms suggests that histone acetylases with both HAT1_N and MOZ_SAS domains can be categorized as members of a new family which we name GNAT/MYST-Like (GML).
Additional file 2 shows the proteins with the highest similarity to SlHAG4. Out of 32 species belonging to Plantae, 12 evidenced proteins with both HAT1_N and MOZ_SAS domains. Interestingly, these species are not randomly distributed among the different orders. Indeed, GML proteins seem to be lacking in Brassicales, Poales, Ranunculales and Volvocales, although the scant sequence data suggest caution regarding this finding. The distribution of GML proteins in Planta, Animalia and in Chromoalveolata suggests that the combined domains AT1 and MOZ_SAS occurred early on in evolutionary history. However, most of the organisms show HAT1_N and MOZ_SAS domains in two functional distinct families, GNAT and MYST histone acetylases, respectively. Due to lack of information about the biological function of GML proteins, we can only speculate that the separation of the two domains could confer an advantage for nuanced control of the histone acetylation level in the genome.
To address the question of the possible function of tomato HAGs we examined their expression profiles in several organs (Figure 2). Given the wide range of expression values, we categorized the tomato HAGs in three groups of low (Figure 2A), middle (Figure 2B) and high expression (Figure 2C). Among low-expressed members, SlHAG11 and SlHAG17 did not show any preferential expression in the analyzed organs as compared to the other members that might have a different function. SlHAG8 and SlHAG22 could play a role in vegetative development, and by contrast SlHAG18 and SlHAG6 in reproductive development. The middle-expressed group of genes evidenced broad-ranging activities, except SlHAG15 and SlHAG19 which are strongly expressed in leaves and roots, respectively. The expression profiles in the group of high-expressed members suggest a wide functional role for some HAGs (SlHAG2, SlHAG16, SlHAG10, and SlHAG25) in contrast to SlHAG5 and SlHAG21 preferentially expressed in roots and leaves, respectively. The Arabidopsis genome was predicted to encode three HAGs, AtHAG1, AtHAG2 and AtHAG3, which belong to GCN5, HAT1 and ELP3 families, respectively [6]. Interestingly, HPA2-like HAGs also occur in Arabidopsis and one member of this family (AtMCC1) was recently found [37]. In tomato, the closest homologs of AtHAG1/AtGCN5, AtHAG3 and AtMCC1 are SlHAG1, SlHAG14 and SlHAG12, respectively. Such proteins are likely to accomplish specific functions in tomato as they do in Arabidopsis since their genes show comparable expression profiles in similar organs, except for AtHAG1/AtGCN5. In particular, AtHAG1 plays an essential role in many plant development processes, such as meristem function, cell differentiation, leaf and floral organogenesis, and responses to light and cold [38]. AtHAG3 is involved in transcription elongation, cell proliferation, leaf axis development, seedling and root growth [39–41], and AtMCC1 was shown to be involved in flowering time and meiosis [37].
HAMs
The tomato proteome has one MYST acetyltransferase, namely SlHAM1, that is a 477-aa long protein characterized by N-terminal Chromo (PF00385), C2H2 (PF00096), and C-terminal MOZ_SAS (PF01853) domains, that are typical of class I HAMs [35]. Previous studies have shown that other plant HAMs belong only to the class I [35]. Phylogenetic analysis (see Additional file 3) shows that HAMs are distributed in two clades, one of which includes tomato as well as Arabidopsis proteins. The other clade contains two proteins from monocots, maize and rice. This separation indicates that a single ancestral HAM gene gave rise to HAMs in monocots and dicots, being a specific event of duplication at the origin of the expansion of this family in Arabidopsis and maize. The expression pattern of SlHAM1 shows that it is expressed in all the examined organs with the highest expression in flowers and in 3 cm fruit (Figure 2). Latrasse and colleagues [35] found that AtHAM1 and AtHAM2 are strongly expressed in flowers and act redundantly in male and female gametophyte development. This evidence suggests that SlHAM1, in addition to its putative role in seed and/or fruit development, could play a role in gametogenesis like the Arabidopsis ortholog.
HACs
The present survey identified four proteins belonging to the HAC group in tomato (SlHAC1 to SlHAC4). As shown in Figure 3A, the domain composition of HACs is variable but all share the typical domains of this class [6]. Tomato HACs are included together with Arabidopsis HACs in two main clades separated from the clade containing HACs of rice and maize lacking the ZZ-domain (Figure 3A). In the most expanded clade (boxed in Figure 3), the dicots form a distinct group compared with monocots. Overall, data suggest that different gene duplication events gave rise first to the two groups of HACs both in monocots and dicots, and subsequently to the expansion of this family in both phyla. Interestingly, the expansion was slightly larger in Arabidopsis than in tomato.
In order to gain insight into the possible role of tomato HACs, we examined their expression profiles in different tomato organs (Figure 2). SlHAC4 shows the strongest expression in fruit at different developmental stages. It is interesting that the peak of SlHAC4 expression occurs in mature green berries and is followed by a strong reduction in fruit at breaker stage, thereby suggesting a role in the transition between these two fruit developmental stages. SlHAC1 and SlHAC2, forming a distinct clade, are the most widely expressed tomato HACs, with the latter showing lower expression values. Similarity between SlHAC1 and SlHAC2 in terms of sequence and expression profile in reproductive organs suggests a functional redundancy that is analogously reported for Arabidopsis homologs AtHAC1/AtHAC5 and AtHAC1/AtHAC12[42]. The presence of SlHAC1 and SlHAC2 in the same clade of Arabidopsis AtHAC1, AtHAC5 and AtHAC12 further supports a role of these proteins in tomato reproduction. Indeed, knockdown of AtHAC1 induced reduced fertility and late flowering [43] and analysis of hac1/hac5 and hac1/hac12 double mutants highlighted their role in flowering time in Arabidopsis [42]. SlHAC3 is likely a pseudogene since it does not appear to be expressed in the tissues under analysis.
HAFs
Tomato proteome has one TAFII250 protein (SlHAF1) (Figure 3B) that shows the same domain composition of Arabidopsis, rice and maize HAFs [6]. Phylogenetic comparison with these species evidenced that SlHAF1 forms a distinct clade with AtHAF1 and AtHAF2 separated from OsHAF701 and ZmHAF101 (Figure 3B). Interestingly, SlHAF1, albeit expressed in all the organs considered, has the strongest expression in roots and in fruit, particularly in berries ten days after breaking, thereby suggesting an important role in fruit maturation (Figure 2).
Tomato HDACs
HDAs
Investigation of the tomato proteome revealed nine RPD3/HDA1 family members. The phylogeny of tomato HDAs evidences that they cluster with HDAs of Arabidopsis, maize and rice (Figure 4) in accordance with the subdivision of this family into three classes as reported in the literature [6, 44]. This family had a higher expansion in monocots, especially in rice, than in dicots where Arabidopsis has the highest number. In addition to the Hist_deacetyl domain (PF00850), new conserved domains were found in tomato HDAs as well as in orthologs of Arabidopsis, rice and maize. Indeed, as shown in Figure 4, Class I SlHDAs have an STYKc domain (SM00221), and a Ser/Thr/Tyr kinase catalytic domain, overlapping with the Hist_deacetyl domain. Moreover, a C-terminal COG5224 domain, which is involved in DNA-binding, is found in SlHDA3. As regards Class II, a zf-RanBP domain (PF00641), which binds Ran-GDP involved in nuclear transport, occurred in SlHDA8 and in SlHDA9, and a C-terminal nucleoside phosphorylase domain (NP) together with a POZ domain (PF00651) was found. The presence of the POZ domain, which is a homo/heterodimerizing domain evidenced in histone deacetylase-containing complexes, suggests that SlHDA9 could take part in a multi-protein complex. The occurrence of BP and NP domains as well as a new domain arrangement (AP3) was also evidenced in Arabidopsis HDAs.
In order to understand the candidate function of tomato HDAs, we looked at their expression profiles (see Additional file 4). Given the wide range of expression values, we categorized the tomato HDAs in three groups having low (see Additional file 4A), middle (see Additional file 4B) and high expression (see Additional file 4C). SlHDA2 expressed mostly in root and bud is the lowest expressed gene among the tomato HDAs. Its expression profile suggests a role in highly dividing tissues such as root and flower meristems. The middle-expressed HDA members show very different expression profiles. Among them, SlHDA9 could exert a possible role in root development as supported by its strong expression in this organ and by its similarity to AtHDA5 and AtHDA18 [45]. A complementary role of SlHDA5, SlHDA6 and SlHDA7 in fruit development from 1 cm to B10 stage is suggested by their peaks of expression in these stages. Finally, the highly expressed SlHDA1 and SlHDA3 show the strongest expression at B10 and B fruit stages, respectively, thereby supporting a possible role in tomato fruit ripening. SlHDA1 and SlHDA3 have respectively a sequence similarity with AtHDA6 and AtHDA19 that in Arabidopsis have been linked to flowering, embryo development and other biological processes [11, 46, 47].
SRTs
In the tomato proteome, we identified two histone deacetylases belonging to the SIR2 family, namely SlSRT1 and SlSRT2 (see Additional file 5). They are characterized by an SIR2 domain (PF02146) and correspond to LeSRT1105 and LeSRT1104, previously described by Pandey and colleagues [6]. The expression profiles of tomato SRT genes evidence expression peaks of SlSRT1 in bud and in 1 cm-sized fruit while SlSRT2 was expressed in flower and in fruit at B10 (see Additional file 4). These findings suggest that SlSRT1 could play a role in the early stages of fruit development as well as in early gamete development whereas SlSRT2 is involved later in both fruit ripening and in gametogenesis. The expression profile of SlSRT2 also supports a role in FLC regulation as suggested for Arabidopsis counterparts by Bond and colleagues [48].
HDTs
According to the results of Pandey and colleagues [6] who described three HDTs in tomato proteome (HDT1101, HDT1102, HDT1103) we found SlHDT1, SlHDT2 and SlHDT3 corresponding to HDT1102, HDT1103 and HDT1101, respectively (see Additional file 6). SlHDT2 shows a C-terminal zinc finger domain in addition to the predicted HD2 domain (EFWG motif at the N-terminus). The evolutionary history of plant HDTs, including those of tomato, was well illustrated by Pandey and colleagues [6]. As shown in Additional file 4, the preferential expression of tomato HDTs occurs at early stages of fruit development. In particular, SlHDT1 is highly expressed in 1 cm fruit, SlHDT2 in both 1 cm- and 3 cm-sized fruits, SlHDT3 in 3 cm fruit and in mature green berries. Overall, these expression profiles suggest a role of tomato HDTs in fruit development. Interestingly, tomato HDTs seem to be all closely related to AtHDT3 (see Additional file 6) that was shown to be involved in ABA response and seed germination [45].
Tomato HMTs
SDGs
We identified 43 SET-Domain Group (SDG) proteins in tomato belonging to seven classes like Arabidopsis SDGs according to the classification of Springer and colleagues [49] (Figures 5, 6 and 7). In detail, three proteins, SlSDG21, SlDG22 and SlSDG23, clustered with class I AtSDGs (AtSDG1, AtSDG5, AtSDG10) that are homologous to E(z) (Figure 5A). Although SlSDG21 and SlSDG22 show similar domain architecture to Arabidopsis class I SDGs, they have an additional SANT domain, while SlDG23 has lost the two conserved EZDs (enhancer of zeste domains). Ten proteins, SlSDG15 to SlSDG19 and SlSDG33 to SlSDG37, cluster with five Arabidopsis proteins annotated as homologs to ASH1 (class II) (Figure 5B). The expansion of this class in tomato likely arose from gene duplications generating also pseudogenes (see below). Tomato SlSDG15, SlSDG16, SlSDG19, SlSDG33 to SlSDG35, SlSDG37 show a domain arrangement similar to Arabidopsis members while SDG17, SDG18, and SDG36 lack conserved domains of this class. Six proteins (SlSDG20, SlSDG24-26, SlSDG29, SlSDG44, previously described as SlTX1 by Sadder et al. [50]) belong to class III of SDGs (Figure 6A), being homologous to TRITHORAX (TRX). They have the same domain architecture (SlSDG44, SlSDG25-26) as their Arabidopsis counterparts or a GYF and F-box (SlSDG29) in addition to the SET and Post-SET domains. Moreover, we found that tomato as well as Arabidopsis has proteins with three PHD domains, contrasting with findings previously reported in Arabidopsis [49]. SlSDG24 has two PHD domains but one seems to be truncated at the N-terminus because it lacks the PWWP domain. Two TRX-related proteins (SlSDG27 and SlSDG28) belong to class IV SDGs (Figure 6B). This class includes proteins only present in yeast and plants [49]. Fourteen tomato SDGs (SlSDG1 to SlSDG14) belong to class V (Figure 7A). These are homologous to SU(VAR)3-9 and are distributed in two main clades containing members of the first or second subgroup of this class [49]. Some members lack the Post-SET domain and others gain AT-hook domains as the closest Arabidopsis orthologs. Seven members (SlSDG30-32, SlSDG38-43) cluster within class VI and two within class VII of SDGs (Figure 7B). These classes include proteins with an interrupted SET domain or SET-related proteins. The domain architecture of tomato and Arabidopsis members belonging to these classes is quite similar, except SlSDG39 which shows a domain composition typical of Class III SDGs.
In order to gain insights into the biological role of tomato SDGs, we analyzed their expression profiles by grouping SDGs according to their class (Figure 8). SlSDG23 and SlSDG21 (Class I) have similar expression profiles, being mainly expressed in root, bud and fruit up to 3 cm, while SlSDG22 is mostly expressed in 2 cm fruit up to B stages. On the basis of these expression profiles we could argue that the first two genes play redundant roles in root and fruit development and SlSDG22 is likely to be more specific to the later stages of fruit maturation.
As regards class II SDGs, some genes with very specific peaks of expression may be noted. Indeed, SlSDG35 is strongly expressed in leaves, SlSDG34 in fruit at the 3 cm stage, SlSDG17 in flowers, and SlSDG19 in buds. These expression profiles suggest a possible wide subfunctionalization of class II SDGs in tomato with a low degree of redundancy. A role in fruit development could be played by SlSDG33 with an expression profile similar to AtSDG8[51] which regulates gene expression in the carotenoid pathway [52]. SlSDG16 is mostly expressed in buds and in the early stages of fruit development. Interestingly, this gene could share some functions with its Arabidopsis counterpart, AtSDG4. Indeed, the latter mainly expressed in pollen is involved in pollen tube growth and reproduction in Arabidopsis [53]. SlSDG18 and SlSDG36 were noticed to behave like pseudogenes, not being expressed in any of the organs analyzed.
Among the class III SDGs, SlSDG29, SlSDG44 and SlSDG26 could play redundant roles in root development, as could SlSDG20 and SlSDG29 in fruit maturation. The latter is closely related to AtSDG2 which was shown to affect vegetative growth and reproduction in Arabidopsis by regulating the expression of hundreds of genes [54, 55]. Moreover, the expression profile of AtSDG2[51] is similar to that of SlSDG29 in comparable organs, thereby supporting the idea of similar functions. SlSDG44 is related to AtSDG27 that was shown to regulate the expression of a xyloglucanase [56] which belongs to a class of enzymes involved in tomato fruit ripening [57]. In a similar fashion, SlSDG44 could act in fruit ripening since it is highly expressed in fruits, particularly at the MG stage. On the contrary, some degree of functional divergence seems to have occurred between SlSDG20 and its homolog AtSDG25 involved in flowering time in Arabidopsis [58]. Indeed, the latter is more expressed in flowers while SlSDG20 is very poorly expressed in this organ.
The two members of tomato Class IV SDGs have different expression profiles: while SlSDG27 is strongly expressed in roots and fruit at the 1 cm stage, SlSDG28 is mostly expressed in buds and in fruit at the B10 stage. These differences suggest that the two genes evolved different functions, with SlSDG28 being mainly involved in reproduction.
Tomato SDGs of class V may show a high degree of redundancy in some functions. Indeed, SlSDG3, SlSDG9, SlSDG6 and SlSDG5 have their highest expression in fruit at 1 cm and 2 cm, SlSDG2, SlSDG14, SlSDG13, SlSDG4 and SlSDG10 have their peak expression in fruit at the 3 cm stage, while SlSDG1 and SlSDG7 are particularly expressed in fruit at MG up to B10 stages. Therefore, these expression profiles suggest that they might play roles in fruit and/or seed at sequential stages of development. Moreover, SlSDG9 is highly expressed in buds as well as SlSDG12 and SlSDG8, suggesting that they could have a function in meiosis or in flower development.
As with the above-reported SDGs, also the members of classes VI and VII may have a possible redundant function. For example, SlSDG30, SlSDG41 and SlSDG42 are highly expressed in 1 cm fruit while SlSDG32, SlSDG39 and SlSDG38 in 3 cm fruit, thereby suggesting sequential functions in embryo/fruit development. A more specific expression profile is shown by SlSDG40 which evidenced peak expression in the bud, indicating a role in gamete and/or flower development. However, the putative involvement of these genes in development has not been investigated in any plant species.
PRMTs
We identified nine PRMTs in tomato (SlPRMT1 to SlPRMT9). SlPRMT8 was already described by Krause and colleagues [59] and was named PAM1.1. Specific patterns in the catalytic AdoMet_Mtase domain (CD02440) [59] allowed us to categorize SlPRMT1, 2, 3, 5, 7, and 9 as class I PRMTs while SlPRMT4 and SlPRMT6 belong to class II (see Additional file 7). For class I some duplication events were highlighted in dicot species.
The expression profiles of tomato PRMTs (see Additional file 8) suggest functional redundancy among these genes since some organs are characterized by two or more PRMTs with high expression levels. This is the case of roots where SlPRMT9, SlPRMT7 and SlPRMT3 have their relative strongest expression. SlPRMT8, SlPRMT2, and SlPRMT5 were expressed in fruit at the 1 cm stage while SlPRMT4 and SlPRMT1 at the B10 stage. To investigate the biological function of SlPRMTs, we considered the role of orthogroups in Arabidopsis. SlPRMT5 and SlPRMT8 belong to the same clade as AtPRMT11 and AtPRMT12. The latter were suggested to be in the same histone methylation complex on the basis of their physical interaction [60] and spatial expression profiles. By contrast, SlPRMT5 and SlPRMT8 have quite different expression profiles and, when similar organs are compared between the two species, only the first has a profile resembling that of Arabidopsis counterparts. On this basis, we hypothesize that SlPRMT5 and SlPRMT8 evolved independent functions, with SlPRMT5 perhaps retaining the biological role of AtPRMT11 and AtPRMT12. If this is true, SlPRMT5 should be involved in flowering time, flower morphology and fertility as well as in leaf development [61]. SlPRMT7 is the closest homolog to AtPRMT10 which was shown to be a component in the autonomous pathway which controls the floral transition in an FLC-dependent manner [62]. Since the expression profiles of these two genes are comparable, SlPRMT7 might also have a functional role in flowering time. SlPRMT3 and SlPRMT9 grouped with AtPRMT13 and AtPRMT14 that were shown to redundantly control the floral transition [63]. Accordingly to their functional redundancy, AtPRMT13 and AtPRMT14 have very similar expression profiles. On the other hand, SlPRMT3 and SlPRMT9 differ greatly in expression profile, also vis-à-vis their Arabidopsis counterparts, when similar organs are compared [51]. SlPRMT3 and SlPRMT9 could play different roles in tomato development and might not be involved in flowering time. SlPRMT6 is the closest homolog to AtPRMT5, which was shown to be involved in vegetative growth and flowering time [64, 65]. The different expression profiles of AtPRMT5[51] and SlPRMT6 suggest that the latter evolved a different role possibly in fruit maturation as evidenced by its expression peak in fruit at the MG stage.
Tomato HDMs
HDMAs
In tomato, we identified 34 proteins showing similarity to HDMA histone demethylases. All are characterized by the C-terminal Amino_Oxidase domain (AOD) (PF01593) but only six (SlHDMA1 to SlHDMA6) also have the N-terminal SWIRM (PF04433) domain that is conserved in all HDMAs. As shown in Additional file 9, HDMAs proteins are distributed in two main clades, comprising one (SlHDMA6) and five tomato members (SlHDAM1-5). Phylogenetic analysis suggests that four ancestors gave rise to the present number of HDMAs in tomato, and accordingly at least two events of gene duplication increased the number of HDMAs from four to six. In particular, SlHDMA1, 2 and SlHDMA4, 5 could have been arisen from a tandem duplication event as suggested by their close position on chromosome seven (not shown).
The Additional file 10 shows the expression profile of tomato HDMAs, divided into three groups with low (A), mild (B) and high (C) expression. SlHDMA2 is barely detectable in buds and in fruit from 2 cm stage to B, while SlHDMA5 is mostly expressed in buds and flowers, suggesting a major role of the former gene in fruit development and the latter gene in gamete and/or flower development. SlHDMA4 and SlHDMA6 are detectable in all organs, pointing out a possible role for these genes throughout development including reproductive stages. SlHDMA1 is quite uniformly expressed in all plant organs and SlHDMA3 has a clear preferential expression in fruit from 2 cm stage to B10. Collectively, these profiles indicate that tomato HDMAs could play redundant roles in different aspects of fruit development and SlHDMA3 could be the major histone demethylase in tomato. Moreover, SlHDMA3 could play a role both in flowering time and root elongation, as suggested by its expression profile and its sequence similarity to AtHDMA3 [66, 67].
JMJs
The tomato proteome reveals 20 proteins belonging to the JMJ family of HDMs. On the basis of their domain composition we classified them in five classes that take their names from their human counterparts: JMJC-only, KDM4, JMJD6, KDM5 and KDM3 [68]. The tomato JmjC-only class includes three proteins (SlJMJ10-11, and SlJMJ18), the KDM4 class five proteins (SlJMJ1-5), and KDM5 class four proteins (SlJMJ6-8, and SlJMJ16). The classes JMJ6 (SlJMJ9 and SlJMJ12) and KDM3 (SlJMJ13-15, SlJMJ17, and SlJMJ19-20) include two and six members, respectively.
The evolutionary history of JMJs was inferred by comparing these proteins in tomato, Arabidopsis, maize and rice (see Additional file 11). Interestingly, a domain-based search led us to identify four new JmjC-only (ZmJMJ113, ZmJMJ115-117), two JMJD6 (ZmJMJ114 and ZmJMJ118) and one KDM3 (ZmJMJ112) proteins in the maize proteome that were absent in the ChromDB and were included in our analysis. As shown in Additional file 11, HDMs are distributed in five main clades, all of which include tomato proteins. Three clades contain exclusively members of classes KDM3, KDM4 and KDM5; the remaining clades contain members belonging both to classes JMJ-only and JMJD6. In the phylogenetic tree (Figure 9A) two main groups of JMJ-only were evidenced, including tomato members in one. This scenario suggests that one ancestor gave rise to the current number of JMJ-only proteins in tomato. All the tomato members included in this class share the same domain architecture with their orthologs. KDM4 class members (Figure 9B) are split into two main clades with two and three tomato proteins. One clade includes the C2HC2-domain proteins and the other the C5HC2 domain proteins [68]. SlJMJ2 and SlJMJ3 did not show the same domain architecture as the other SlKDM4 since the C-terminal domain C2HC2 or C5HC2 is lacking. Two duplication events in tomato as well as in maize and rice expanded the second group, while only an AtJMJ13 is encoded by the Arabidopsis genome. The JMJD6 proteins (Figure 10A) are split into two main clades, each including one tomato protein. The domain architecture of the first group is characterized by the presence of a kinase C-terminal APH domain (PF01636), which is not observed in the other group.
Phylogenetic analysis of the proteins belonging to this class suggests that they are highly conserved among species. The KDM5 class (Figure 10B) is divided into two main clades, one of which has three tomato proteins, the other only one. The first includes the proteins with the C-terminal FYRN and FYRC domains and the other the BRIGHT/ARID domain proteins [68]. SlJMJ16 lacks the conserved C-terminal domains (FYRN and FYRC) and SlJMJ6 has a duplication of the region encoding the PLU-1 (PF08429)-PHD domains. The PLU-1 domain is involved in the DNA-binding domain and it was not described before in JMJ demethylases. The KDM3 phylogenetic tree (Figure 11) has two main clades with a high bootstrap value. Interestingly, five proteins, three of which are found in tomato (SlJMJ13-15), have an N-terminal WRC domain (PF08879) which includes a putative nuclear localization signal and a zinc-finger motif which was not described previously in this class. A modified RING-finger domain named R1 (PF10497) was also identified in the tomato SlMJ17. A tandem duplication of the SlJMJ19 gene was observed.
The wide expression profile of tomato JMJs (see Additional file 10) in several organs suggests that they could play a global role in plant development. However, some JMJs showed specific expression peaks, thereby suggesting particular roles. This is the case of SlJMJ17 and SlJMJ7 which are preferentially expressed in roots while SlJMJ3, SlJMJ8, SlJMJ4 and SlJMJ20 in buds and/or flowers, suggesting a role in gamete formation or flower development. Interestingly, SlJMJ8 is the closest homolog to AtJMJ14, which is highly expressed in flowers [51] and acts as a repressor of the photoperiodic pathway [69]. SlJMJ12, SlJMJ16, SlJMJ5 and SlJMJ13 are particularly expressed in fruit at B10, thus suggesting a role in later processes of fruit and/or embryo/seed development.
Association of tomato HMs to S. pennellii introgression lines (ILs): a case study
To identify candidate genes involved in epigenetically regulated processes by means of in silico analysis we looked for ILs where HMs were associated, based on their map position on the tomato genome (see Additional file 12). We failed to recover ILs for SlHAG15, which is not assigned to any chromosome (chr), and for SlJMJ4 (chr4), SlPRMT6 (chr8), SlSDG5 (chr2), SlSDG35 (chr12), SlSDG43 (chr1) located terminally on different chromosomes outside the available markers. We then combined the information about the phenotype of ILs with HM expression profiles described in the previous sections.
As a case study, we report the identification of a candidate HM involved in carotenoid biosynthesis in tomato fruits. It should first be noted that the Arabidopsis histone methyltransferase AtSDG8 is required for the expression of the carotenoid isomerase AtCRTISO[51]. The tomato homolog of AtCRTISO was characterized by Isaacson and colleagues [70] as an essential gene for the production of all trans-lycopene. As reported above, our analysis highlighted that two homologs of AtSDG8 occur in tomato, i.e. SlSDG33 and SlSDG34. It should be pointed out that SlSDG33 is a stronger candidate than SlSDG34 as it is involved in CRTISO-like regulation and hence in the carotenoid composition of the tomato fruit. Indeed, similar to what is observed for tomato CRTISO, SlSDG33 is upregulated during fruit ripening (see Additional file 13) with a peak of expression in fruit at B and B10. Furthermore, it maps on IL4-3-2 that is reported to have a QTL affecting fruit color, which is known to be dependent on carotenoid biosynthesis [71].