Comprehensive analysis of CCCH zinc finger family in poplar (Populus trichocarpa)

Background CCCH zinc finger proteins contain a typical motif of three cysteines and one histidine residues and serve regulatory functions at all stages of mRNA metabolism. In plants, CCCH type zinc finger proteins comprise a large gene family represented by 68 members in Arabidopsis and 67 in rice. These CCCH proteins have been shown to play diverse roles in plant developmental processes and environmental responses. However, this family has not been studied in the model tree species Populus to date. Results In the present study, a comprehensive analysis of the genes encoding CCCH zinc finger family in Populus was performed. Using a thorough annotation approach, a total of 91 full-length CCCH genes were identified in Populus, of which most contained more than one CCCH motif and a type of non-conventional C-X11-C-X6-C-X3-H motif was unique for Populus. All of the Populus CCCH genes were phylogeneticly clustered into 13 distinct subfamilies. In each subfamily, the gene structure and motif composition were relatively conserved. Chromosomal localization of these genes revealed that most of the CCCHs (81 of 90, 90 %) are physically distributed on the duplicated blocks. Thirty-four paralogous pairs were identified in Populus, of which 22 pairs (64.7 %) might be created by the whole genome segment duplication, whereas 4 pairs seem to be resulted from tandem duplications. In 91 CCCH proteins, we also identified 63 putative nucleon-cytoplasm shuttling proteins and 3 typical RNA-binding proteins. The expression profiles of all Populus CCCH genes have been digitally analyzed in six tissues across different developmental stages, and under various drought stress conditions. A variety of expression patterns of CCCH genes were observed during Populus development, of which 34 genes highly express in root and 22 genes show the highest level of transcript abundance in differentiating xylem. Quantitative real-time RT-PCR (RT-qPCR) was further performed to confirm the tissue-specific expression and responses to drought stress treatment of 12 selected Populus CCCH genes. Conclusions This study provides the first systematic analysis of the Populus CCCH proteins. Comprehensive genomic analyses suggested that segmental duplications contribute significantly to the expansion of Populus CCCH gene family. Transcriptome profiling provides first insights into the functional divergences among members of Populus CCCH gene family. Particularly, some CCCH genes may be involved in wood development while others in drought tolerance regulation. Our results presented here may provide a starting point for the functional dissection of this family of potential RNA-binding proteins.


Background
Zinc-finger transcription factors, as one of the largest transcription factor (TF) families in plants, are critical regulators for multiple biological processes, such as morphogenesis, signal transduction and environmental stress responses [1,2]. They are characterized by the presence of common zinc finger motifs in which cysteines and/or histidines coordinate with a few zinc atoms to form the local peptide structures that are essential for their specific functions [3]. Most plant zinc-finger transcription factors (e.g. RING-finger, LIM, WRKY and DOF) regulate the gene expression with the aid of DNA-binding or protein-binding proteins [4][5][6][7]. Recently, a new type of Arabidopsis zinc-finger proteins, which differs from the previously identified plant zinc-finger TFs by regulating gene expression via directly binding to mRNA, was named as CCCH gene family [8].
The CCCH family contains a typical C3H-type motif and members of this family had already been identified in organisms from yeast to human [8][9][10]. The first identified CCCH member is hTTP (human tritetraproline) that can bind to class II AU-rich element (ARE) in the 3'-untranslated region (3'-UTR) of tumor necrosis factor α (TNFα) mRNA, in most cases, to mediate TNFα mRNA degradation [11,12]. Lately, more evidences support that several TIS11 proteins including hTTP, TIS11b and TIS11d can in concert regulate target mRNA degradation in RNA processing by similar mechanism [13,14]. Other CCCH proteins include C. elegant protein PIE-1 and POS-1 that can both control germ cell fate by inhibition of transcription or activation of protein expression from maternal RNAs [15,16].
Compared to the largely well-characterized CCCHs in animals, only a small number of CCCH proteins have been functionally characterized in Arabidopsis and rice. These CCCH proteins have been implicated to participate in a wide range of plant developmental and adaptive processes, including seed germination [17], embryo development [18,19], floral morphogenesis [20], plant architecture determination [21], FRIGIDA-mediated winter-annual habit [22], and leaf senescence [23]. In particular, two CCCH genes, AtC3H14 (At1g66810) and AtC3H15 (At1g68200), have recently been shown to act as the master regulators for secondary cell wall biosynthesis in Arabidopsis [24,25], which also suggests that their homologues may be involved in Arabidopsis secondary cell wall formation as well. Recently, accumulating evidences indicate that a number of CCCH genes participate in plant abiotic stresses and defense responses [8,[24][25][26]. For example, two closely related proteins in Arabidopsis, AtSZF1 (salt-inducible zinc finger 1) and AtSZF2, both act as negative regulators in plant salt tolerance [26]. Arabidopsis ZFAR1 encodes a zinc-finger protein with ankyrin-repeat domains, with its loss-of-function mutants showing increased local susceptibility to Botrytis and sensitivity to seed germination in the presence of abscisic acid (ABA) [27]. GhZFP1, a nuclear protein from Cotton, interacts with GZIRD21A and GZIPR5, and enhances drought, salt, salicylic acid (SA) stress and fungal disease tolerance in transgenic plants [28]. Recently, Wang and coworkers revealed that 11 subfamily IX members of Arabidopsis CCCH proteins were involved in conferring plant tolerance to different stresses such as drought, salt, cold shock and ABA [8].
Because of the economic importance in pulp and biofuel production, the studies on the genus Populus have been the hotspots for many years [29]. The completion of Populus trichocarpa genome sequence in 2006 makes it as a model tree for other tree species [30]. Although Populus and Arabidopsis are relatively closely related in the eurosid clade of the eudicots, they have strongly contrasting life cycle and adaptations to environmental stresses [31,32]. Since the CCCH gene family has the potential of associating with RNA as well as the critical functions in wood development and stress response, it was of interest for us to characterize the CCCH genes in Populus.
In this study, we report the comprehensive genomic identification and phylogenetic analysis of 91 members of CCCH gene family in Populus trichocarpa, as well as their expression profiling in six different tissues and under drought stresses. These Populus CCCH proteins were categorized into 13 subfamilies and exhibited diverse expression patterns, suggesting their functional differentiations. It is noteworthy that a subset of CCCH genes showed the highest level of transcript abundance in root and differentiating xylem. Among them, 12 genes were selected for investigation of their expression patterns by RT-qPCR analysis. Our preliminary results may provide the insights to further investigate the roles of these candidate genes in Populus differentiating xylem development and drought stresses.

Identification of CCCH gene family in Populus
The CCCH domain genes, characterized by the presence of 1-6 copies of CCCH-type zinc finger motifs, were already systematically analyzed in Arabidopsis, rice, human and Trypanosoma [8,10,33]. In the current study, to gain insight into the size of the CCCH gene family in Populus,the CCCH domains were used to screen the Populus genome database (release 2.1, http://www.phytozome.net/poplar.php) (see methods). These domains used as queries cover both the conventional (C-X 7 -C-X 5 -CX 3 -H and C-X 8 -C-X 5 -C-X 3 -H) and the recently defined non-conventional (e.g. C-X 4 -C-X 5 -CX 3 -H and C-X 11 -C-X 5 -C-X 3 -H) CCCH motifs. Initially, a total of 106 non-redundant putative CCCH genes were obtained.
SMART and Pfam analysis were performed to remove those putative pseudogenes and incorrect annotated genes, and then resulted in 91 members recognized by either SMART (Sm00356) or Pfam (PF00642). Subsequently, manual reannotation was performed to correct the putative CCCH sequences using online web server FGENESH (http://linux1.softberry.com/berry.phtml). In this endeavor, 12 protein sequences were corrected for further analysis. Finally, all 91 Populus CCCH genes were manually verified for the presence of CCCH motifs using InterProScan program (http://www.ebi.ac.uk/Tools /InterProScan/). In comparison to the CCCH gene family in PlnTFDB (http://plntfdb.bio.uni-potsdam.de/v3.0/) and DPTF (http://dptf.cbi.pku.edu.cn/) where 99 and 69 members of CCCH gene family were deposited for Populus respectively, our result was roughly in agreement with PlnTFDB. All 91 identified Populus CCCH genes in our study were named as from PtC3H1 to PtC3H91 following the nomenclature proposed by the previous study [34].
The encoded proteins varied from 96 to 2120 amino acids (aa) in length with an average of 579 aa. The details on other parameters of nucleic acid and protein sequences were provided in Table 1 and Additional file 1. The number of predicted non-redundant CCCH genes in Populus (91) is greater than that in other representative species: Arabidopsis, rice, mouse, human and Trypanosoma brucei containing 68, 67, 58, 55 and 48 predicted CCCH genes, respectively [8,10,35]. The number of CCCH genes in Populus is roughly 1.34 fold of that in Arabidopsis, which is in consistency with the ratio of 1.4~1.6 putative Populus homologues to each Arabidopsis gene [30]. Similar to other transcription factor gene families [34,36], the presence of more CCCH genes in Populus further confirmed that the expansion of genome is common during Populus evolutionary process. This expansion appears to be arisen from multiple gene duplication events, including a whole-genome duplication event in the Populus lineage followed by multiple segmental and tandem duplication events [30].

Comparative analysis of the CCCH genes in Populus, Arabidopsis, and rice
The CCCH family appears to undergone complicated evolution processes and become one of the largest gene families in plants [8]. In the study, we compared the members of CCCH gene family in Populus and Arabidopsis and rice ( Figure 1A) and found that 44 gene clusters were present. Each of the clusters included at least one, up to six counterparts from all of the species we examined, implying the conservation of CCCH genes among Populus, Arabidopsis and rice. The events that led to the expansion of the 44 CCCH gene clusters in the three species may be very complex, likely involving one or a few round (s) of whole-genome duplication (WGD) followed by a series of tandem duplications and (or) rearrangements during the evolution of certain species. For example, one gene cluster has seven Populus CCCH genes (PtC3H35-39, 81 and 82), but has only two Arabidopsis CCCH genes (AtC3H30, 56) and two rice CCCH genes (OsC3H24, 50). This discrepancy suggests that Populus CCCH genes may have undergone two rounds of WGDs and one tandem duplication, while the two homologues of either Arabidopsis or rice might be created by the segmental duplication (Table 1). Besides those conservative CCCH genes, two, three and twenty CCCH genes were also found unique for Populus, Arabidopsis and rice, respectively ( Figure 1A). These speciesspecific CCCH genes might be obtained or retained differentially between species during evolution that may lead to different biological functions. Surprisingly, 19 pairs of homologues were identified in both Arabidopsis and rice, but not in Populus, suggesting that these CCCH genes might not be necessary for wood plant species and therefore have been lost during the evolutionary process.
Previously, it has been suggested that the CCCH gene family contained different numbers and types of CCCH domain in either animals or plants [8,10,33,37]. In this study, we investigated the motif characteristics of the CCCH genes in Populus, Arabidopsis and rice ( Figure 1B). Similar to the other two species, each Populus CCCH protein has at least one CCCH motif, and 69.2 % of Populus CCCHs have at least two CCCH motifs. As shown in Figure 1C and additional file 2, although the three species had different fractions of CCCH motif types in CCCH gene family, two conventional CCCH motifs, C-X 7 -C-X 5 -C-X 3 -H and C-X 8 -C-X 5 -C-X 3 -H, constituted the largest two groups in all three species, suggesting that the C-X 7-8 -C-X 5 -C-X 3 -H motifs may be an ancestor of other CCCH motifs. Compared to that, 18 % Populus CCCH motifs were nonconventional with C-X 5 , 7 , 8 -C-X 4 -C-X 3 -H, C-X 8 -C-X 6 -C-X 3 -H, C-X 9 , 11 -C-X 5 -C-X 3 -H and C-X 11 -C-X 6 -C-X 3 -H. It's noteworthy that none of Populus CCCH proteins contained the C-X 10 -C-X 5 -C-X 3 -H motif that was previously identified to be an abundant non-conventional CCCH motif in Arabidopsis and rice [8]. Additionally, a unique C-X 11 -C-X 6 -C-X 3 -H motif was found in Populus, suggesting that PtC3H27 containing this motif may have different binding activity and biological function.
To evaluate the evolutionary relationship among the CCCH proteins, a phylogenetic analysis was performed based on the full-length amino acid sequences of Populus, Arabidopsis and rice. Unfortunately, the obtained phylogenetic tree had low sequence similarity overall, therefore could not exhibit real evolutionary relationship between the different subfamilies (data not shown). These observations might be explained by the divergence of CCCH domains and other non-homologous motifs (e.g. ANK, RRM and KH), especially the diverse CCCH motif types that possess different spacing amino acids between conserved Cys and His residues in each protein. It appears that two conventional CCCH motifs C-X 7, 8 -C-X 5 -C-X 3 -H and one non-conventional C-X 4 -C-X 5 -C-X 3 -H constituted the largest three groups in the CCCH proteins of Populus, Arabidopsis and rice (Figure 1), additionally, identical CCCH motifs within the same CCCH protein usually have redundant or at least similar functions [35]. Therefore, in this study, based on the types of CCCH motif in each protein, all CCCH proteins of the three species were divided into five subfamilies that were renamed as CCCH-a, b, c, d and e ( Figure 2 and Additional file 2) according to the previous method described by Hu and coworkers [34]. Our results demonstrated that five subfamilies has different types of CCCH domain, for example, each protein in subfamily CCCH-a has 1-3 C-X 7 -C-X 5 -C-X 3 -H motif (s), CCCH-b has 1-6 C-X 8 -C-X 5 -C-X 3 -H, CCCH-c has 2-3 C-X 7 -C-X 5 -C-X 3 -H and C-X 8 -C-X 5 -C-X 3 -H, CCCH-d has 1 C-X 5 -C-X 4 -C-X 3 -H and 1 C-X 7,8,10 -C-X 5 -C-X 3 -H, whereas CCCH-e has 1-6 other nonconventional CCCH motifs. For each subfamily, the phylogenetic tree was constructed based on the fulllength protein sequences using the Neighbor-Joining (NJ), Minimal Evolution (ME) and Maximum Parsimony (MP) algorithms, respectively. The tree topologies produced by these three algorithms were identical except for the interior branches (data not shown). Therefore, only the NJ phylogenetic tree was subject to further analysis in our study. The NJ phylogenetic trees indicated that the CCCH genes exhibited an alternating distribution of monocots and eudicots in each subfamily, implying that an ancestral set of CCCH genes already may exist before the monocot-eudicot divergence ( Figure 2). Further analysis revealed that the number of Populus, Arabidopsis and rice CCCH genes varied in most subfamilies, for example, the number of Populus CCCH-b, d and e were nearly equalled to that of Arabidopsis and rice, while the  number of Populus CCCH-c genes was the largest among these three species, and was almost two-fold of the other two species. These variation of CCCH-c genes among these three species suggested the subsets of genes with the C-X 8 -C-X 5 -C-X 3 -H motif may have been either lost in Arabidopsis and rice or acquired in the Populus lineages after divergence from their last common ancestor. The observation of gene duplication in Populus was also presented in the analysis of other plant transcription factor families such as NAC [34], bHLH [38], Dof [39], and WRKY [40]. We further examined the subgroups within each CCCH subfamily. Based on the >50 % bootstrap values, each CCCH subfamily can be divided into 3-5 clades designated as clade α, β, γ, δ, and E ( Figure 2). It's noteworthy that clade α in subfamily CCCH-c and CCCH-d was mainly composed of a subset of Populus CCCH paralogues. In contrast, clade β in subfamily CCCH-d and clade © in subfamily CCCH-e included more CCCH proteins from Arabidopsis and rice than from Populus.

Phylogenetic analyses of the CCCH proteins in Populus
To evaluate the evolutionary relationships between Populus CCCH proteins, a phylogenetic analysis of the 91 Populus protein sequences was performed ( Figure 3A). Similar to the Arabidopsis CCCH proteins, the numbers of CCCH motifs in Populus CCCH proteins and the spacing amino acids between adjacent CCCH zinc-finger motifs varied. Therefore, the individual phylogeny was constructed using Populus full-length CCCH protein sequences based on each subfamily in Figure 2. For statistical reliability, Bootstrap analysis was conducted with 1000 replicates. The Populus CCCH family was further divided into 13 subfamilies (I to XIII) based on the > 50 % bootstrap values ( Figure 3A). Within each subfamily, CCCH domains (e.g. C-X 7 -C-X 5 -CX 3 -H in subfamily I and C-X 8 -C-X 5 -CX 3 -H in subfamily V) and other domains (e.g. RRM domain in subfamily I and KH domain in subfamily VIII) are highly conserved, suggesting strong evolutionary relationships among subfamily members. Compared to the eight Arabidopsis CCCH subfamilies, the number of Populus subfamilies is much larger, implying a genome expansion of Populus CCCH counterparts. It is well-known that there are nearly 8000 pairs of paralogous genes in Populus genome [28]. Based on the phylogenetic analysis, we identified 34 paralogous pairs from all 91 Populus CCCH genes ( Table 2), with the percentage (74.7 %) similar to that of Populus NAC (60.1 %) [34] and Populus GST (69.1 %) gene families [36].

Gene structure and conserved motifs of Populus CCCH genes
To gain further insights into the structural diversity of CCCH genes, we compared the exon/intron organization in the coding sequences of individual CCCH genes in Populus ( Figure 3B). Most closely related members in the same subfamilies share similar exon/intron structures either in terms of intron numbers or exon length, which was consistent with the characteristics defined in the above phylogenetic analysis. For instance, the CCCH genes in subfamily VII and VIII contained one to three introns while those in subfamily X possessed no introns with exception of PtC3H33. In contrast, although the intron phase is remarkably conserved within Populus CCCH V subfamily (Additional file 3), the gene structures of subfamily V appeared to be more variable in terms of intron numbers, which may be indicative of exon shuffling during the evolution [41].
To discover conserved motifs shared among related proteins within the family, we used both MEME (Multiple Expectation Maximization for Motif Elicitation) [42] and SMART online server (http://smart.embl-heidelberg.de/) to predict the putative motifs. Surprisingly, most motifs cannot be observed except for five motifs when using the MEME program with the previous reported parameters [8,34]. In contrast, 15 distinct motifs were identified in Populus CCCH proteins by SMART ( Figure 3C and Additional file 4), which is similar to those of Arabidopsis CCCH proteins [8]. As expected, most of the closely related members had common motif compositions, suggesting functional similarities among the CCCH proteins within the same subfamily. It is noteworthy that subfamily X, the largest subfamily containing 16 members, had been divided into two subgroups. In addition to two CCCH motifs (C-X 7 -C-X 5 -CX 3 -H and C-X 5 -C-X 4 -C-X 3 -H), each protein of subgroup I consists of two ankyrin (ANK) repeat motifs (See figure on previous page.) Figure 2 Phylogenetic trees of full-length CCCH domain proteins from Populus, Arabidopsis and rice. All CCCH proteins of Populus (91), Arabidopsis (68) and rice (67) were divided into five distinct subfamilies (CCCH-a to CCCH-e) based on the types of CCCH motif. Each protein in subfamily CCCH-a has 1-3 C-X 7 -C-X 5 -C-X 3 -H motif (s), CCCH-b has 1-6 C-X 8 -C-X 5 -C-X 3 -H, CCCH-c has 2-3 C-X 7 -C-X 5 -C-X 3 -H and C-X 8 -C-X 5 -C-X 3 -H, CCCH-d has 1 C-X 5 -C-X 4 -C-X 3 -H and 1 C-X 7,8,10 -C-X 5 -C-X 3 -H, whereas CCCH-e has 1-6 other non-conventional CCCH motifs. The unrooted tree was constructed based on the full-length protein sequences using MEGA 4.0. Numbers at nodes indicate the percentage bootstrap scores and only bootstrap values higher than 50 % from 1,000 replicates are shown. The percentages in the bracket represent protein sequence similarity range for each subfamily, which were obtained using the Smith-Waterman algorithm. Populus CCCH proteins were marked with the red dots. The scale bar corresponds to 0.05 or 0.1 estimated amino acid substitutions per site.. which were shown to play a variety of roles in diverse molecular processes such as transcriptional initiation, ion transportation and signal transduction [43,44]. The proteins in subfamily VIII mostly contained well-defined RNA-binding domain KH, suggesting their potential role involved in RNA binding [45]. These specific motifs of the subfamily members may, by some extent, attribute to the functional divergence of CCCH genes [8].
Gene structure and conserved motifs of 34 CCCH paralogous pairs in Populus were further investigated ( Figure 3B, C and Table 2). Three categories were significantly classified based on two counterparts' gene structure and motif composition of each gene pairs. Among them, 20 gene pairs possessed the identical exon/intron structure and motif composition, 9 pairs exhibited the identical motif and variable gene structure in term of intron number and length, and 5 pairs shared relatively less conserved exon/intron structure and motif composition (Table 2). Moreover, the difference of gene organization and motif composition between the paralogous pairs suggested that they may be functionally divergent.
Chromosomal location and gene duplication 90 of the 91 Populus CCCH genes were physically located on 19 Linkage Groups (LG) of Populus, while only one gene (PtC3H29) was remained on as-of-yet unattributed scaffold fragments ( Figure 4). The distribution of Populus CCCH genes among the chromosomes appeared to be uneven: LG XI, XIV and XIX harbour one or two CCCH genes, while relatively high densities of CCCHs were discovered in some locations on LG I, IV, V, VI, and IX. Particularly, CCCHs located on the duplicated fragments of LG I and IX are arranged in clusters.
Previous analysis of the Populus genome indicated that the paralogues within gene family were mainly derived from the whole-genome duplication event in the Salicaceae (salicoid duplication) occurred 60 to 65 million years ago, with occasional tandem duplication and transposition events such as retroposition and replicative transposition [46]. To determine the evolutionary relationship between Populus CCCH genes, the distribution of CCCHs were further investigated within the 163 recently identified duplicated blocks [30]. Of the 90 mapped CCCHs, only nine were located outside of the duplicated blocks, while 90 % (81of 90) were located in duplicated regions. Furthermore, 16 block pairs covered 24 CCCH paralogous pairs by whole genome duplication, and 23 block pairs only harboured CCCHs on one of the blocks and lack the corresponding duplicates, suggesting that dynamic changes may have occurred following segmental duplication which results in the loss of some genes.
Four adjacent CCCH gene pairs were found within a distance less than 9 kb on the duplication blocks, which may result from tandem duplication in either the inverse or same orientation (Figure 4). Similar results were also reported in the analysis of other Populus gene families [34,36,47]. Alignment analysis of protein sequences using the Smith-Waterman algorithm (http://www.ebi. ac.uk/Tools/psa/) showed that four pairs (PtC3H5/6, PtC3H36/37, PtC3H41/42 and PtC3H48/77) had high sequence similarities (≧80 %) between two counterparts of each gene pair and therefore meet the standards as tandem duplicates. Analysis of CCCH paralogous pairs showed that 22 out of 34 gene pairs remained in conserved positions on segmental duplicated blocks, suggesting that these genes may result from genome duplication ( Figure 4 and Table 2). Our study further indicated that the retention rate of duplicated genes was relatively high (44/91, 48.4 %) that was consistent with the recent reports of other gene families in Populus [34,47]. Among the non-genome duplicated gene pairs, three genes were located on duplicated segments while their counterparts not on any duplicated blocks, two counterparts of the three paralogous pairs were located separately on divergent rather than homologous duplicated blocks, one gene pair (PtC3H49/50) were not on any duplicated blocks, and one gene (PtC3H26) was located on segmental duplicate blocks with its counterpart (PtC3H29) not mapped to LGs yet ( Figure 4 and Table 2). Together, the diverse duplication events contributed to the complexity of CCCH gene family in the Populus genome.
The ratio of nonsynonymous versus synonymous substitutions (Ka/Ks) is an indicator of the history of selection acting on a gene or gene region [48]. Ratios significantly <0.5 suggest purifying selection for both duplicates [49]. A summary of Ka/Ks for 34 CCCH (See figure on previous page.) Figure 3 Phylogenetic relationships, gene structure and motif compositions of Populus CCCH genes. A. Multiple alignments of 91 fulllength CCCH proteins from Populus were conducted by Clustal X 1.83 and the phylogenetic tree was constructed using MEGA 4.0 by the Neighbor-Joining (NJ) method with 1,000 bootstrap replicates. The percentage bootstrap scores higher than 50 % are indicated on the nodes. The tree shows 13 major phylogenetic subfamilies (subfamily I to XIII marked with different color backgrounds) with high predictive value. B. Exon/intron organization of Populus CCCH genes. Green box represents exon and black line represents intron. The sizes of exons and introns can be estimated using the scale at bottom. C. Schematic representation of the conserved motifs in Populus CCCH proteins elucidated by SMART online. Each colored box represents a motif in the protein with motif name indicated in box on the right side. The length of the protein and motif can be estimated using the scale at bottom. Refer to Additional file 4 for details of individual motif.
paralogous pairs is shown in Table 2. The result suggested that all gene pairs had evolved mainly under the influence of purifying selection except for three pairs (PtC3H26/29, PtC3H57/58 and PtC3H79/80).
Based on the genomic organization of CCCH genes, we could conclude that segmental duplications contributed significantly to the evolution of CCCH gene family and redundancy resulting from duplication is common  Motif/Gene structure characteristics of gene pairs were divided into three groups: 1, identical exon/intron structure and motif composition; 2, identical motif and variable gene structure; 3, less conserved exon/intron structure and motif composition. Gene expression patterns based on microarray data (GSE13990) are categorized into four classes: AA, both of duplicates were expressed in non-overlapping tissues; AB, both duplicates had the same expression patterns; AC, expression tissues of one duplicate completely covered the other; AD, expression tissues of both duplicates were overlapping but different; No, no data for one duplicate is present in the microarray.
in Populus genome, which were also observed in other Populus gene families [36,39,50,51]. It is reported that approximately 33.4 % of predicted genes originated from salicoid genome-wide duplication and 15.6 % from tandem duplication on a genome scale analysis in Populus [30]. Our studies indicates that Populus CCCH gene family possesses higher segmental duplication ratio (62.9 %) and lower tandem duplication ratio (11.8 %), which are dramatically different from the average. This high retention rate of segmental duplication and low retention rate of tandem duplication are also in consistency with the previous studies on other gene families [34,36,47,51].

Nucleon-cytoplasm shuttling and RNA-binding proteins
All Arabidopsis CCCH proteins have previously been predicted to locate in nucleus by the SubLoc v1.0 software and the subsequent experimental verifications of several CCCH genes such as AtHUA and AtSZF1 [8,20]. However, recently progress suggests that 79.4 % Arabidopsis CCCH genes may be nucleocytoplasmic shuttle proteins due to the presence of Leucine-rich Nuclear Export Signal (NES) that seems to be essential for the trafficking of CCCH proteins from the nucleus to cytoplasm [8]. Furthermore, Pomeranz et al. experimentally confirmed that Arabidopsis Tandem Zinc Finger (TZF) family including 11 CCCH genes can indeed shuttle between the nucleus and cytoplasm, all of which contained the NES sequences [52,53]. To predict the subcellular localization of Populus CCCH genes, 91 fulllength protein sequences were used separately as input sequences in the program WoLF PSORT (http:// wolfpsort.org/). Not surprisingly, all Populus CCCH members, similar to that of Arabidopsis orthologues [8], were predicted to localize in nucleus (data no shown). To further examine whether 91 Populus CCCH proteins have NES sequences or not, a program using widely accepted NES consensus was written according to previous study [8]. Of the 91 proteins, 62 (68.1 %) have putative NES sequences (Additional file 5), suggesting that most Populus CCCH proteins might be nucleocytoplasmic shuttle proteins involved in signal transduction events [54]. Among these nucleocytoplasmic shuttle proteins mentioned above, PtC3H17, PtC3H18 and PtC3H20 all contain two identical C-X 8 -C-X 5 -C-X 3 -H motifs separated by 18 amino acids ( Figure 5A), and therefore were regarded as the typical TZF family proteins [52,55]. It is well known that TZF proteins can bind to class II ARE element in 3'-UTR of target mRNAs to promote their deadenylation and degradation [53,56]. Therefore, we speculated that the three Populus TZF proteins might as well have RNA-binding abilities. Further comparison analysis revealed that besides TZF motifs, PtC3H17, PtC3H18 and PtC3H20 also shared the conservative lead-in sequence at the N-termini (MW/F/M/TKTEL or R/KYKTE/A/QV/A) that may provide the critical parts of the RNA-binding surface ( Figure 5A). Phylogenetic analysis indicated that PtC3H17, PtC3H18 and PtC3H20 were the closest homologs to their Arabidopsis counterparts AtC3H14 and AtC3H15, suggesting that this type of proteins is more evolutionary conservative within eudicots than others ( Figure 5B). It has recently been shown that AtTZF1 (AtC3H23, At2g25900) was induced by wounding and MeJA stress [52]. Therefore we investigated digital expression of the three Populus TZF genes based on the microarray data (GSE16786) and found that both wounding and MeJA can significantly stimulate the expression of PtC3H17 and PtC3H18 (data no shown). However, no microarray data was available for PtC3H20.

Expression patterns of Populus CCCH genes in various tissues
Whole genome microarray has been proved to be a useful means of studying gene expression profiles in Populus [34,51]. To gain insight into the expression patterns of Populus CCCH genes in various tissues, a comprehensive analysis was conducted based on Populus microarray data generated by Wilkins and Dharmawardhana [50,57]. Because 19 CCCH genes do not have the corresponding probe sets in the microarray dataset, we only analysed the expression profiles of the remaining 72 CCCH genes ( Figure 6 and Additional file 5). Most Populus CCCHs genes demonstrate distinct tissuespecific expression patterns except for mature leaves, where all have low transcriptional levels ( Figure 6A). Of the Populus 72 CCCH genes we examined, 34 showed the highest transcript accumulations in roots, 24 in young leaves, 12 in female catkins, 21 in male catkins and 22 in differentiating xylems. These distinct expression patterns were significantly different from that of Arabidopsis or rice CCCH genes where the majority of CCCH genes were expressed in all tissues (roots, inflorescences, leaves and seeds) as illustrated by MPSS and EST data [8]. Although it is generally thought that orthologous genes from different species may retain similar temporal and spatial expression patterns [58,59], the discrepancy of gene expression between Populus and Arabidopsis might be arisen from either the data origin of Microarray experiments or the evolutionary consequences that more Populus CCCH homologs are needed in Populus development.
We further examined the gene expression patterns of the Populus CCCH paralogous genes. Of the 34 CCCH gene pairs, 13 genes (PtC3H9, 13, 25, 29, 34, 36, 42, 52, 55, 64, 69, 79 and 80) do not have corresponding probe sets on Affymetrix microarray. Therefore, only the remaining 21 paralogous pairs were analyzed. As illustrated in Table 2 and Figure 6, these CCCH genes displayed four distinct expression patterns. In the first category which covered four gene pairs, two gene duplicates were expressed in non-overlapping tissues, suggesting different functions. In the second category, both duplicates of all eight gene pairs shared almost identical expression patterns with respect to the tissues examined. The third category covered seven pairs of duplicate genes. The tissues where one duplicate highly expressed belong to part of its paralogous duplicate. The fourth category only contained two gene pairs (PtC3H84/86 and PtC3H85/87), which were all homologues of Arabidopsis AtC3H60 (At5g42820). The expression patterns of the two counterparts in each gene pair were partially overlapping but different. It is noteworthy that most gene pairs created by the whole-genome duplication event mostly fell within the second and third categories, with both of the duplicates showing a similar expression pattern. In contrast, one gene pair (PtC3H5/6) created by tandem duplication belongs to the first category and had different expression pattern. The four categories of expression patterns of paralogs indicate that CCCH gene pairs have diverged quickly after segmental or tandem duplication. It is generally thought that the duplicated genes may undergo divergent fates during subsequent evolution such as nonfunctionalization (loss of original functions), neofunctionalization (acquisition of novel functions), or subfunctionalization (partition of original functions), which may be indicated by divergence in their expression patterns [60,61]. We speculate that the Populus CCCH gene pairs with distinct expression patterns from the first category might have undergone neofunctionalization, whereas gene pairs with overlapping expression patterns from the third or fourth category suggest subfunctionalization during subsequent evolution.
Identification of the genes predominantly expressed in xylems provides an important clue for their functions during the development of secondary cell walls in Populus [62,63]. To identify such CCCH genes, another heatmap was generated based on the microarray data (GSE13043) [57], in addition to the above results. As showed in Figure 6B, most of the CCCHs exhibited different expression levels in Populus stem segments  (IN2, IN3, IN4, IN5, and IN9). IN2 and IN3 represent  the vascular tissue of primary growth, mainly including  primary xylem and primary phloem. IN5 and IN9 have well developed secondary phloem tissues and secondary xylem vessels, as well as fibres with well lignified secondary walls [57]. Expression of these CCCH genes suggested they may play the special roles during each phase of cell wall biosynthesis. Expression patterns of most Populus CCCH genes in IN9 ( Figure 6B) were basically identical to the patterns in xylems ( Figure 6A). We selected six genes (PtC3H7, 8, 10, 14, 17, and 18) that are highly expressed in xylem as well as in IN9 to further verify the validation of previous Microarray data using RT-qPCR. All six genes tested demonstrated the highest expression level in xylem compared to other tissues we examined, which was in good agreement with the microarray profiles (Figure 7). Of these six genes, two Populus CCCH genes (PtC3H17 and PtC3H 18) exhibited particularly high transcript accumulations in xylem. AtC3H14, the Arabidopsis orthologues of PtC3H17 and PtC3H18 genes, was previously shown to play key role in the regulatory network of secondary cell wall biosynthesis [24,25]. Taken together, this study may provide a further solid basis to select xylem-specific genes for related functional validation.

Expression profiling of Populus CCCH genes under drought stress
A subset of Arabidopsis CCCH genes have previously been shown to play crucial roles in drought stressresponse. In order to better understand the roles of Populus CCCH genes in drought tolerance, we reanalysed the expression profiles of all Populus CCCH genes in response to drought stresses using the publicly available Microarray data. As illustrated in Figure 8, the expression of drought-treated trees was obtained from different organs (root apices and mature leaves) and genotypes (Populus Soligo and Carpaccio) [64]. Consistent with the transcriptional changes to most drought-driven transcription factors in Populus roots and leaves [64], most Populus CCCH genes, especially CCCH IV, VI, X and XI, showed more significant response in root apices than mature leaves when subject to drought stresses ( Figure 8). A possible explanation is that compared to leaves, roots sense the edaphic water deficits to send chemical signals to shoots and to further maintain the root growth despite reduced water availability can contribute to drought tolerance through water foraging [65]. It also appears that the Populus CCCH genes are differentially regulated in response to various drought stresses between two different Populus phenotypes. Under prolonged drought stress (LMI, long-term response to mild stress; and LMO, long-term response to moderate stress), the expression of drought-driven Populus CCCH genes in root apices displayed less significant changes in water deficit-sensitive genotype 'Soligo' than that in less sensitive genotype 'Carpaccio'. Under early drought response (EAR), contrary to the responses to prolonged drought stress, most of the CCCH genes in Soligo roots exhibited more drought-driven regulation than that of Carpaccio. Interestingly, a subset of CCCH genes mainly distributed in subfamily V and XVII were up-regulated in leaves under all drought conditions. The diverse drought-mediated responses suggested that the up-or down-regulated Populus CCCH genes might fall into different physiologically relevant patterns in root or leaf system according to iterative group analysis (iGA) [64,66].
To screen Populus CCCH genes regulated by drought stress, RT-qPCR was used to validate six candidate genes (PtC3H32, 33, 35, 38, 51 and 72) that are highly induced by drought stresses in roots in Microarray data. The results showed that consistent with the Microarray data, the six genes not only exhibited the root-specific expression patterns and were but also regulated by drought stresses in roots ( Figure 7 and Figure 8). Further analysis revealed that the six selected genes displayed different expression patterns between the two genotypes ( Figure 8). This result was partially similar to the Arabidopsis orthologues, which showed that drought stress had significant effect on expression of most genes by RT-PCR analysis [8]. For the diverse expression patterns of CCCH genes under drought stress, a plausible explanation is that poplar is sensitive to water deprivation, as well as drought tolerance varies considerably between genotypes [67][68][69][70]. To examine the detailed gene expression changes of Populus CCCH genes under drought stresses, RT-qPCR analysis was performed on the six Populus CCCH genes using 4month-old P. deltoides seedlings (see the materials). The drought-driven gene expression patterns of the six Populus CCCH genes can be divided into two groups based on the time point of their transcript abundances reaching the maximum (Figure 9). One group (PtC3H32 and PtC3H72) accumulated the highest transcripts at 24 hrs after drought treatment, whereas transcription level of other group (PtC3H33, 35, 38, 51) exhibited two peaks at 12 hrs and 36 hrs after the drought treatment. Moreover, the expression patterns were not identical between the members within each subgroup. Further analysis found that Arabidopsis homologs (AtC3H29/30/38/49) of the six genes tested have also been identified to be involved in drought response [8]. We speculated that the diverse expression patterns of the CCCH genes suggested that they might be involved in different drought signal network. It would be, therefore, interesting to undertake further functional studies of these CCCH genes at mRNA metabolism level to establish the interactions of biochemical pathways that are activated during drought stress response.
Recently, the accumulating evidences show that Populus water-deficit transcriptome is not only influenced by the genotype, but also by the time of day [69,70]. In the hybrid poplar DN34 genotype, a large number of drought-induced genes were significantly induced in midday, compared to dawn and late in the day [69]. By contrast, the time point of treatment has less significant effect on drought-driven genes transcript in the pure P. balsmifera genotypes compared with the hybrid poplar genotypes [70]. In the current study, we attempt to collect the samples in the afternoon and dawn to reduce the impact of diurnal rhythm on drought-induced genes transcript, despite the pure P. deltoides genotype also used as the materials.
It is noteworthy that the real-time PCR results were in good agreement with the microarray data sets in the study, although the species (P. deltoides) used for qPCR Genotype 'Carpaccio' productivity is less hampered by drought than that of 'Soligo'. EAR, a short-term water deficit by withholding irrigation 36 hours; LMI, 10-day-long response to mild stress; LMO, 10 day-long response to moderate stress.
were different from the ones (P. balsamifera, P. trichocarpa and P. x canadensis) producing microarray data (see the materials). The reasons underlying the similar expression patterns may be high conservation of the genes tested between the four species. Observation of their proteins found that they possessed identical motif compositions, with 1-2 CCCH or (and) ANK domains. Furthermore, their similar expression patterns between GSE13990 and GSE13043 also suggested the conversed functions within the four species. Further data need to be experimentally confirmed.

Conclusions
Characteristics of CCCH gene family is preliminarily documented in model plant Arabidopsis and rice. However, this family has not been studied in the model tree Populus to date. In the present study, a comprehensive analysis including phylogeny, chromosomal location, gene structure, conserved motifs, and expression profiling of CCCH gene family in Populus was performed. A total of 91 full-length CCCH genes in Populus genome were identified, of which most contain more than one CCCH motif and a non-conventional C-X 11 -C-X 6 -C-X 3 -H motif unique for Populus was found. Populus CCCH genes were clustered into 13 distinct subfamilies based on phylogenetic analysis. In each subfamily, the characteristics of exon/intron structure and motif compositions were relatively conserved. A high proportion of CCCH genes were found to distribute preferentially at the duplicated blocks, suggesting that segmental duplications contribute significantly to the expansion of Populus CCCH gene family. Comparative analysis showed that 34 gene pairs were created by different duplication types, which displayed four categories of digital expression pattern in six tissues across different developmental stages, suggesting some categories have undergone subfunctionalization during evolutionary process. Furthermore, a subset of Populus CCCH genes was identified to be possibly involved in wood formation and drought response. In addition, 62 CCCH genes were found to contain NES sequences and might be nucleocytoplasmic shuttle proteins. Among them, three had the typical characteristics of TZF proteins. The new information obtained could help in the selection of appropriate candidate genes for further functional characterization.

Phylogenetic analysis
Multiple alignments of amino acid sequences were performed by ClustalX (version 1.83) program and were manually corrected. The phylogenetic trees were generated with MEGA 4.0 [77] using the Neighbor-Joining (NJ), Minimal Evolution (ME) and Maximum Parsimony (MP) methods [78]. Bootstrap analysis with 1,000 replicates was used to evaluate the significance of the nodes. Pairwise gap deletion mode was used to ensure that the divergent domains could contribute to the topology of the NJ tree. Gene clusters refer to the homologs within three species (Populus, Arabidopsis and rice) were identified based on NCBI web (http://www.ncbi. nlm.nih.gov/).

Sequence properties and chromosomal location
The amino acid sequences of the CCCH proteins were analyzed for physicochemical parameters by DNAman software (Lynnon Biosoft Co., Canada), and subcellular localization was predicted by WoLF PSORT program (http:// wolfpsort.org/) [79]. The exon/intron organization of CCCH genes was generated online with Gene structure display server (GSDS) (http:// gsds.cbi.pku.edu.cn/) [80]. Structural motif annotation was performed using the SMART program mentioned above. Identification of homologous chromosome segments resulting from whole-genome duplication events was accomplished as described previously [30]. Blocks with the same color represent homologous chromosome segments. Tandem gene duplications were identified according to criteria described elsewhere [81]. Genes separated by five or fewer gene loci in a range of 100 kb distance were considered to be tandem duplicates. Synonymous (Ks) and nonsynonymous substitution (Ka) rates were calculated according to previous study [82].

Microarray analysis
The genome-wide microarray data were obtained from the Gene Expression Omnibus database at the National Center for Biotechnology Information under the series accession numbers GSE13990 (from P. balsamifera), GSE13043 (from P. trichocarpa), GSE17223 (from P. x canadensis), and GSE17230 (from P. x canadensis). Probe sets corresponding to the putative Populus CCCHs were identified using an online Probe Match tool available at the NetAffx Analysis Center (http:// www.affymetrix.com/). For genes with more than one probe sets, the median of expression values was considered. When several genes have the same probe set, they are considered to have same transcriptional profile. The expression data were gene-wise normalized and hierarchical clustered based on Pearson coefficients with average linkage in the Genesis (version 1.75) program [83].

Plant material collection
Young leaf (internodes 1~3 from top), mature leaf (from internodes 4~6), developing xylem (from the basal internodes) and root tissues of one-year-old P. deltoides plants grown in the greenhouse (16 h light/8 h dark, 25°C~28°C) were harvested respectively. Drought stress treatment was conducted following the previous method with minor modification [84]. Briefly, the 4-months-old P. deltoides seedlings were removed from the pots and exposed on filter paper to air with 70 % RH at 25°C under dim light. Roots were collected at different time points (0 h, 1 h, 6 h, 12 h, 24 h, 36 h, and 48 h) after treatment, respectively. To reduce the impact of diurnal rhythm on drought-induced gene transcript, samples were collected from 17:00 (0 h). Three replicates from three independent plants were collected per harvest and were immediately frozen in liquid nitrogen and stored at −80°C until required.

Real-time RT-PCR verification
Total RNAs were isolated with the RNeasy mini kit (Qiagen, USA) according to the manufacturer's instructions. The RNA preparation was then treated with Dnase I and first strand synthesis of cDNA was performed by using oligo (dT) primer and M-MLV RT (Promega). Primers were designed using Beacon Designer v7.0 (Premier Biosoft International, USA) with melting temperatures 58~60°C, primer lengths 20~24 bp and amplicon lengths 90~150 bp. Each primer was checked using BLAST tool of NCBI database with filter off for its specificity for respective gene, which was further confirmed by melting curve analysis from realtime PCR reaction. Details of primers are given in additional file 6.
Real-time RT-PCR was conducted on LightCycler W 480 Detection System (Roche, Germany) using SYBR Premix Ex Taq (TaKaRa, Japan) according to the manufacturer's instructions. To normalize the variance among samples, UBQ10 was used as internal reference gene. Baseline and threshold cycles (Ct) were determined with 2 nd maximum derivative method using the LightCycler W 480 Software release 1.5.0. Relative gene expression