- Research article
- Open Access
The developing xylem transcriptome and genome-wide analysis of alternative splicing in Populus trichocarpa(black cottonwood) populations
© Bao et al.; licensee BioMed Central Ltd. 2013
- Received: 27 October 2012
- Accepted: 23 May 2013
- Published: 29 May 2013
Alternative splicing (AS) of genes is an efficient means of generating variation in protein structure and function. AS variation has been observed between tissues, cell types, and different treatments in non-woody plants such as Arabidopsis thaliana (Arabidopsis) and rice. However, little is known about AS patterns in wood-forming tissues and how much AS variation exists within plant populations.
Here we used high-throughput RNA sequencing to analyze the Populus trichocarpa (P. trichocarpa) xylem transcriptome in 20 individuals from different populations across much of its range in western North America. Deep transcriptome sequencing and mapping of reads to the P. trichocarpa reference genome identified a suite of xylem-expressed genes common to all accessions. Our analysis suggests that at least 36% of the xylem-expressed genes in P. trichocarpa are alternatively spliced. Extensive AS was observed in cell-wall biosynthesis related genes such as glycosyl transferases and C2H2 transcription factors. 27902 AS events were documented and most of these events were not conserved across individuals. Differences in isoform-specific read densities indicated that 7% and 13% of AS events showed significant differences between individuals within geographically separated southern and northern populations, a level that is in general agreement with AS variation in human populations.
This genome-wide analysis of alternative splicing reveals high levels of AS in P. trichocarpa and extensive inter-individual AS variation. We provide the most comprehensive analysis of AS in P. trichocarpa to date, which will serve as a valuable resource for the plant community to study transcriptome complexity and AS regulation during wood formation.
- Alternative Splice
- Splice Junction
- Secondary Xylem
- Southern Population
- Northern Population
Alternative splicing (AS) is considered to be a key factor underlying increased cellular and functional complexity in higher eukaryotes and has been studied extensively on the genome-wide scale in humans, other animals, and plants [1–5]. In humans, RNA splice variants with alternative exon configurations often accumulate differentially across different tissues and individuals [1, 6–8] and such tissue-specific gene isoforms can have important functions in development and in the functioning of different cell types.
Relatively few studies have investigated genome-wide patterns of AS in plant species , but recent results from Arabidopsis and rice  have revealed high levels of AS that can vary in different organs and under different stress conditions. Next generation high throughput transcriptome sequencing (RNA-Seq) analysis suggests that 42% of the intron-containing genes in Arabidopsis undergo AS . Using a normalized cDNA library derived from flower and seedling tissue, Marquez et al.  used deep RNA-Seq transcriptome analysis to show that over 60% of Arabidopsis intron-containing genes are alternatively spliced, with intron retention (IR) being the most common form of AS. The analysis revealed, however, that over 50% of genes surveyed display non-IR AS, and within IR variants, a large number of cryptic introns were spliced out in-frame. Thus, as in humans, large scale AS in plants is likely to contribute to proteome and phenotypic diversity.
Few studies have addressed AS variation among individuals within a species, although it has been noted that extensive AS variation in humans may underlie phenotypic diversity and disease susceptibility , and mutations affecting alternative splicing can play a role in disease . Recently, RNA-Seq was used for genome-wide analysis of variation in AS splicing and expression levels across human individuals [13, 14]. High depth transcriptome sequencing of cell lines derived from two unrelated individuals revealed a large number of genes differentially spliced between the two cell lines, and many of these differences were associated with single nucleotide polymorphisms (SNPs) found in close proximity . This and other studies  analyzing expression level variation and AS variation in individuals from the human HapMap project suggested that genetic variation (e.g., SNPs) underlies extensive AS and expression level variation among unrelated individuals, which appears to contribute to phenotypic diversity. However, variation of alternative splicing within plant population has not yet been subject to genome-wide analysis.
The sequencing of the first tree, Populus trichocarpa (black cottonwood; referred to as “poplar” throughout) , creates opportunities for genomic studies in this species, and in particular for investigation of biological processes important in woody plants, such as perenniality, secondary growth, and secondary xylem (wood) development –. P. trichocarpa is the largest deciduous tree native to western North America with a range that extends from California to Alaska, USA, a latitudinal range of over 30°. One advantage of woody plants such as P. trichocarpa and other trees is the ability to isolate pure tissue types, i.e. developing xylem and phloem, from woody stems during the active growth period, which can be used for studies of tissue-specific transcriptomes and proteomes. Published studies on the poplar xylem transcriptome, and the xylem transcriptomes of other woody plants, have been largely limited to use of EST and full-length cDNA resources from traditional sequencing platforms [19–23]. Furthermore, the annotation of AS forms in the 40,668 loci containing protein-coding transcripts a recent annotation of the P. trichocarpa genome (version 2.2; http://www.phytozome.net/poplar) is limited. Given its large population size across a large geographical range of varied environments, and extensive genetic polymorphism [24, 25], study of P. trichocarpa transcriptomes offers the opportunity for the systematic identification and characterization of splice variants at the population level, which will be of critical importance for understanding phenotypic variation in wood and other traits in the these populations.
The advent of next-generation sequencing technologies has now enabled AS analyses at unprecedented levels of sensitivity and precision. Analysis of Illumina-based RNA sequencing (RNA-Seq) data has been particularly useful in the study of AS and AS variation in model species [1, 3, 5, 10]. We recently used Illumina paired-end sequencing of xylem expressed transcripts in 20 individuals from populations (12 from southern and 8 from northern populations) across much of the P. trichocarpa species range to gain initial insights into the poplar xylem transcriptome . This analysis generated an average depth of sequence coverage of >48 X across over 18,000 xylem-expressed genes, and revealed extensive single nucleotide polymorphism (SNP) variation, with over 500,000 putative SNPs identified . This work provided a rich data set for further investigation of the poplar xylem transcriptome, including expression levels and AS.
Here, we report the use of the P. trichocarpa developing xylem RNA-Seq data set  to investigate xylem gene expression, to query the extent and complexity of AS in P. trichocarpa, and to investigate potential AS variation among different individuals from different populations. Our data provide an unprecedented and unbiased evaluation of alternative splicing in this species. The RNA-Seq data confirmed most annotated splice junctions and identified tens of thousands of novel AS events. We found that 36% of the P. trichocarpa xylem-expressed genes are alternatively spliced. This estimate is significantly higher than previous estimates based on cDNA/EST sequencing . Additionally, we quantified splice variant expression levels (isoform frequency) across all 20 individuals. A large number of AS events were identified that had markedly different isoform ratio in individuals within the studied two geographically separated sets of populations.
Mapping of the P. trichocarpadeveloping xylem transcriptome
To explore potential AS events, we used SOAPsplice  to discover splice junctions (see Methods). 8.7% of all reads were aligned on splice junctions (Additional file 2). Overall, using the total xylem transcriptome data from 20 individuals, we detected 62% (118,751) of known junctions in the P. trichocarpa genome. In addition, we identified 80,345 new splice junctions not in the P. trichocarpa genome annotation (see Additional file 3 for the number of known and new splice junctions detected in each individual). This indicates a relatively high number of AS variants for which no previous experimental data exists.
It is now clear that measurements of gene expression levels from RNA-Seq are correlated with measures of absolute expression level (as assayed by microarray) across a wide dynamic range . We quantified the expression levels of all genes in our RNA-Seq data set by determining the number of fragments per kilobase of exon in a gene per million fragments mapped (FPKM; ). We investigated the expression levels (FPKM value) of all 40,668 annotated genes (P. trichocarpa v. 2.2) across 20 individuals and found that 24% of genes per individual had no coverage (FPKM=0, or no reads mapped) (Figure 1B). We observed high (FPKM >40) expression for an average 8% of the genes per individual. Medium (FPKM ≥5 to ≤ 40) and low (FPKM >0 to ≤ 5) expression levels were observed for 29 and 39% of the genes, respectively. The depth of read matches to intergenic regions and annotated gene features is illustrated in Figure 1C. As expected, the depth of coverage of annotated intergenic regions and introns was much lower than that for exonic features.
Several highly expressed genes (see Additional file 4 for the the complete data set of FPKM values for 41,377 gene models in all 20 individuals, and for the 100 most highly expressed genes in 20 individuals) encode proteins involved in secondary cell wall biosynthesis, including lignin biosynthesis (caffeic acid O-methyltransferase, COMT; caffeoyl CoA 3-O-methyltransferase, CCoAOMT; cinnamyl alcohol dehydrogenase, CAD; cinnamate 4-hydroxylase C4H; ferulate, F5H; coumarate 3-hydroxylase C3′H; hydroxycinnamoyl CoA:shikimate/quinate hydroxycinnamoyltransferase; HCT; Hamberger et al., 2007), cellulose biosynthesis (Chitinase-like 2, CTL2; KORRIGAN), and glucuronoxylan biosynthesis (glucuronosyltransferase, IRX10). Additionally, genes encoding enzymes in methionine metabolism (homocysteine S-methyltransferases and methionine adenosyltransferases), cytoskeleton components (TUB, TUA, ACT), sucrose synthase (PtrSuSY1, PtrSuSY4), and encoding several AGP/FLA arabinogalactan proteins that have been implicated in secondary wall biosynthesis are among the most highly expressed genes. This is consistent with the strong metabolic commitment of this tissue to secondary wall biosynthesis. Interesting, among the 100 most highly expressed genes, over 10% (12) encode proteins of unknown function, and 8 of these have no obvious Arabidopsis homologs, and are poorly conserved in other plants (http://www.phytozome.net). The most highly expressed gene encoding a protein of unknown function is POPTR_0002s22040 (FPKM 1178), which has no homologs in other plants except cassava, which is related to P. trichocarpa. This suggests that many unknown functions potentially contributing to xylem development and secondary cell wall biosynthesis remain to be discovered.
Identification of alternative splicing in the xylem transcriptome
Number of AS genes and events in 20 individuals
Summary of alternative splice events detected by RNA-Seq
Alternative splice (AS) type1, 2
Number AS events detected
% of AS events
Number in 1 individual
Number in ≥ 15 individuals
Number in 20 individuals
% in 1 individual only
% in ≥15 individuals
% in 20 individuals
We next analyzed to what extent AS events observed in one individual were also observed the other 19 individuals. Table 2 shows that the bulk of the AS variants were observed in one or a few individuals. To exclude low abundance transcripts that could represent “transcriptional noise” of biological or technical origin, the significance of which is under debate , we filtered the data to include only splice variants of genes with FPKM values of ≥ 5 in all individuals. As shown in Figure 2B, around 46% of AS events in this data set were individual-specific, and only 2% were conserved in all 20 individuals. Since the developing xylem tissue from individuals of the southern (PT02-PT13) populations and northern (PT14-PT21) populations were harvested in two different years (from the same common garden, same week, and same time of day), the distribution of AS events for these two experiments were analyzed separately, but the distributions were essentially the same as the combined analysis shown in Figure 2. This indicates a high degree of alternative splicing polymorphism within this species that would not be detected if only a single or few individuals had been sampled.
P. trichocarpa is considered a model system for the study of wood (secondary xylem) development , which involves differentiation of vessel element and fiber cells from derivatives of the vascular cambium stem cell population. These cells undergo cell expansion followed massive secondary wall deposition and lignification, and programmed cell death . Many of the genes involved in secondary wall formation, and transcriptional regulators of this process  are well known. To investigate the pattern of AS in these genes, we generated a dataset of 389 cell-wall biosynthesis related and 598 xylem-expressed transcription factor (TF) genes (Additional file 6). We found that 38.9% (149) of the cell-wall biosynthesis related genes have evidence for alternatively splicing, for a total of 424 AS events. Of the transcription factor genes, 38.7% (232) showed evidence of alternative splicing, for a total of 584 AS events.
An important class of secondary cell wall related enzymes are glycosyl transferases, and we found evidence for extensive alternative splicing of glycosyl transferase genes (with 130 events in 29 genes). For example, POPTR_0005s06280, a glycosyl transferase highly expressed in developing xylem (FPKM=182) and homologous to the Arabidopsis GLUCURONIC ACID SUBSTITUTION OF XYLAN4 (GUX4) gene involved in xylan biosynthesis , exhibited 16 AS events across the 20 individuals, with 2 events common to ≥ 15 individuals and 3 unique to a single individual (Additional file 6). POPTR_0007s04030, similar to Arabidopsis GUX1, a second GUX gene highly expressed in developing xylem (FPKM =141), exhibited three AS slice forms common to all 20 individuals: two alternative 3′ splice acceptor sites in the second intron, and an alternative 5′ splice acceptor site in the fourth intron (Additional file 6). These data suggest that the diversity of glycosyl transferase proteins involved in secondary cell wall polysaccharide biosynthesis could be greatly expanded by AS in developing secondary xylem. It is interesting to note that a particular feature of human glycosyl transferases is alternative splicing, which leads to variability in the N-terminal regions of the proteins .
Among the TF genes, we found evidence for AS in members of the all TF classes examined (Additional file 6). Among these events, 42 were common to ≥ 15 individuals, while 242 (41%) were unique to one individual. We found 15 AS events in 12 TF genes that were common to all individuals, including events in homologs of Arabidopsis ANAC078/NAC2 (POPTR_0010s23650), KNAT6 (POPTR_0012s08910), and MYB103 (POPTR_0003s13190), a gene implicated in regulation of cell wall biosynthesis . Interestingly, two duplicated genes encoding C2H2 zinc transcription factors (POPTR_0003s07180 and POPTR_0001s16080, both homologs of the Arabidopsis SUPPRESSOR OF FRIGIDA4 (SUF4) C2H2 gene, had intron retention AS forms in all 20 individuals. The SUF4 gene is alternatively spliced in Arabidopsis, and a similar AS pattern, retention of the last intron, is found in both the poplar homologs and in the Arabidopsis SUF4.3 splice variant. All cell wall formation and TF related genes with AS variants are listed in Additional file 6.
Alternative splicing isoform variation between individuals
Although most documented AS isoform variation occurs between tissues or environmental conditions, differences exist among individuals [12–14]. In addition to qualitatively detecting AS events, Illumina-based RNA-Seq can generate reliable quantitative measurements of AS levels. For each of AS event type, reads derived from specific regions incorporated into an mRNA indicate quantitative expression of one alternative RNA isoforms. We used an inclusion and exclusion of junction reads method to quantify isoform expression in the RNA-Seq data set (see Methods).
Next, we investigated how many AS events showed significant difference between individuals within and among the population. To control the different sequencing coverage of different individuals, we filtered the data to include only splice variants of genes with FPKM values of ≥ 5 in all individuals. We found 7.0% and 13.1% (8.6%, excluding more divergent PT18 and PT20 samples) of AS events respectively that are predicted to have significant differences (FDR< 0.05) in the isoform ratio between at least two individual pairs within southern and northern populations (Figure 5B). This was also consistent with the lower correlation of isoform ratios in north populations relative to southern populations (Figure 5A). We are not aware of similar analyses in plants, but they are in general agreement with an analysis of EST and cDNA data, which estimate that 21% of human AS genes are affected by polymorphisms that alter the relative abundances of alternative isoforms , and with RNA-Seq analysis of 6 human individuals that estimate that between 10% and 30% AS events show individual-specific variation . However, these frequencies are still below the level of ~60% variation in AS events among different human tissues . This observation is not surprising, because the same tissues of different individuals are functionally more similar than different tissues of the same individuals.
Traditionally, genome-wide studies of AS were carried by sequencing of ESTs and using exon arrays, but such approaches were limited by cost and ability to only efficiently identify common AS events . It is now feasible to conduct deep and comprehensive sequencing of transcriptomes in a high throughput and cost effective manner using next generation sequencing such as the Illumina platform employed in this study, making it possible to comprehensively survey both gene expression levels and splicing variation, and to detect rare AS events. In this paper we present a global analysis of gene expression and alternative splicing in the model tree species P. trichocarpa, focusing on over 30,000 genes expressed in the developing xylem tissue taken from twenty unrelated individuals. We compiled a catalog of xylem-expressed genes, including nearly 13,000 genes expressed at moderate to high levels (average FPKM ≥5; and 100 genes very highly expressed genes (FPKM > 460; Additional file 4). Using the RNA-Seq approach, we mapped and annotated the intron-exon junctions in these genes, which allowed us to assess splicing patterns on a global level in the twenty individuals analyzed.
A striking finding in our analysis, and one that demonstrates the power of next generation RNA-Seq to efficiently reveal diverse patterns of AS, is the observation that there are significantly more new (non-annotated in the P. trichocarpa reference genome) intron-exon junctions in the set of 20 unrelated individuals than were detected in any one single individual. Many of these may represent genotype-dependent AS variants and suggest a remarkable diversity in AS patterns. The bulk of the AS and AS variation we observed is likely to be biologically relevant. Similar levels of AS variation and isoform abundance variation were observed in two separately harvested sets of developing xylem (those from the 12 southern and 8 northern populations). Furthermore, selected examples of AS were verified by RT-PCR, many of the events are in moderately to highly expressed genes, many transcript variants were present at high levels, and we found examples of AS in P. trichocarpa genes known to have splice variants in Arabidopsis (see below). As RNA-Seq is quantitative, it can be used to accurately determine isoform expression levels that result from AS. In principle, it is possible to determine the absolute quantity of every RNA variant in a cell population, and directly compare results between experiments. Thus, in addition to revealing important new insights into alternative splicing complexity in the P. trichocarpa transcriptome, our results show that RNA-Seq data can be used to reliably measure AS levels in large sample sizes.
While AS in humans is common and up to 86% of genes have evidence of AS , AS in plants has not been as extensively studied. Furthermore, relatively few plant AS events have been functionally characterized, but evidence suggests that AS participates in important plant functions, including stress responses, and may impact domestication and trait selection. Recent evidence based on next-generation sequencing indicates a high incidence of AS Arabidopsis (35% - 60%) and rice (33%) genomes [1, 5, 10, 11]. Consistent with those levels, our findings suggest that 36% of xylem-expressed transcripts in P. trichocarpa are alternatively spliced. This estimate of AS levels in P. trichocarpa is much higher than previous estimates based on cDNA/EST sequencing . As well, only 3,063 out of 40,668 genes (7%) in the P. trichocarpa genome (v2.2) are annotated as having at least two transcripts. It is likely that the number of alternatively spliced genes identified in P. trichocarpa will increase with larger and more comprehensively sampled tissue-specific transcriptome sequence collections. In addition, plants respond to the environment in diverse and complex ways, and only a small proportion of these conditions have been addressed in next-generation sequencing projects that would reveal AS.
Until now, most studies of splicing variation have been in mammals and humans. Our study adds to the plant literature that has so far focused primarily on Arabidopsis and rice, which has suggested that plants and animals differ in the predominant AS types. As in Arabidopsis and rice , we found that intron retention is the most prevalent form of alternative splicing accounting for 40% of the AS events (Tables 1 and 2). In contrast to humans where it is the predominant form, exon skipping only constituted 8% of the P. trichocarpa developing xylem AS events, consistent with results in Arabidopsis and rice. While recent work in Arabidopsis indicates that over 50% of genes display the non-IR AS , the relative prevalence of intron retention AS and paucity of exon skipping in all three plant taxa studied in depth (Arabidopsis, rice, P. trichocarpa) suggests that the mechanisms of splice site recognition and splice site selection differ between plants and animals.
We found many genes involved in cell wall biosynthesis and transcriptional regulation, which play key roles in secondary xylem development, are alternatively spliced. Future studies are needed to characterize changes in AS over the course of secondary xylem development, and the potential functional consequences of the AS observed. One way of assessing functional significance is conservation of AS across different taxa. While there are too few documented examples to make global comparisons, it is interesting to note that our initial analysis showed that two duplicated P. trichocarpa xylem expressed genes encoding C2H2 zinc transcription factors (POPTR_0003s07180 and POPTR_0001s16080) share a similar AS pattern with the Arabidopsis SUPPRESSOR OF FRIGIDA4 (SUF4) C2H2 orthologous gene  in all 20 P. trichocarpa individuals, retention of the last intron as found in the Arabidopsis SUF4.3 splice variant. Alternative splicing of INDERMINATE DOMAIN 14 (IDD 14), which encodes another C2H2 transcription factor in Arabidopsis and rice, has been shown to be functionally important in a generating functionally distinct TF heterodimer that acts as a competitive inhibitor in regulating starch metabolism . Conservation of alternatively spliced of genes of this class in P. trichocarpa and Arabidopsis and such functional information suggests that these splice variants may play biologically significant roles. Another example of the functional consequences of AS in Arabidopsis involves the modulation of the relative amounts of the peroxisomal versus cytosolic transthyretin-like (TTL) protein by AS, which affects ureide biosynthesis . Given the metabolic and developmental complexity of secondary xylem and secondary wall formation, it would not be surprising if a number of the numerous AS events we observed in the mRNA population of this tissue were involved in developmental and metabolic regulation. Future studies in P. trichocarpa and other woody plants must focus on the functional consequence of AS on proteome diversity related to wood formation.
We found that 7% and 13% of AS events in developing xylem showed significant differences in isoform usage between individuals in separately sampled sets of individuals from southern (12 samples) or northern (8 samples) populations, respectively. This is in general agreement with an analysis of EST and cDNA data that estimated that 21% of human AS genes are affected by polymorphisms that alter the relative abundances of alternative isoforms , and an RNA-Seq analysis of different tissues in 6 human individuals, which estimated that between 10% and 30% AS events show individual-specific variation . Most previous AS studies have concentrated on variation in splicing pattern across tissues, which is probably controlled by the availability of trans-acting factors in different cell types. On the other hand, where it has been studied, genotype-specific variation in splicing is caused by cis-acting SNPs affecting the splice-site region or splicing regulatory sequences [45, 46]. Thus, the high degree of SNP polymorphism in wild populations of P. trichocarpa[24, 25] may contribute to the diversity of AS variants among different P. trichocarpa genotypes.
Interestingly, samples taken from individuals in northern populations had more variation between individuals (13%) than samples taken from southern population individuals (7%). We have observed that P. trichocarpa individuals from northern populations exhibit higher nucleotide diversity than those from southern populations (A. Geraldes, C. Douglas, Q. Cronk, unpublished). Thus, higher SNP diversity could contribute higher AS variation in the xylem collected from north population individuals. However, it is possible that some of the difference in levels of AS variation between the separately sampled southern and northern populations is the result of environmental rather than genotypic effects on AS variation, and further work on greater numbers of RNA-Seq samples isolated at similar times from trees grown in common gardens, will be necessary to compare the AS differences between southern and northern populations.
High-throughput transcriptome studies are producing a fast-growing catalog of splicing variation in human populations, but so far information on the functional impact of such splicing variation is limited, and few if any analogous studies are available for plants. Thus, our results provide a new glimpse of the evolutionary and phenotypic consequences of AS variation in plant populations. If the AS events conserved in 20 individuals were functionally relevant, they would be expected to be under strong purifying selection and would thus be expected to have significant impacts on phenotype, which will need to be tested by functional studies of selected examples (e.g. ). However, we also found a large number of putative genotype-specific AS events at apparent low frequency levels in the population. Some variants may be rare because they have recently arisen, or because they are deleterious and are being selected against, but it is also possible that some variants play a role in adaptation and may be present in at a certain frequency in different populations of the species. The latter possibility can be tested by larger scale population-wide transcriptome studies. Nevertheless, they suggest that, like SNP variants in a relatively small number of genes important for adaptation in P. trichocarpa, some of the AS variation in wild populations of P. trichocarpa could contribute to phenotypic variation important for local adaptation. However, many of these low frequency variants may represent neutral variation. Thus, AS variation may provide a reservoir of novel exons into a minority (minor-isoform) of a gene’s transcripts as suggested by Modrek and Lee . Because a gene’s ancestral function would be maintained by the major-isoform, this may free the minor-isoform from functional constraint, thus reducing purifying selection. Thus, new exons/introns appearing initially as minor splicing isoforms, may gradually gain functions over time, and became constitutive exons correlated with mutations that creating stronger splice sites.
In summary, this genome-wide analysis of alternative splicing reveals high levels of AS in P. trichocarpa, and shows that splicing differences between individuals, including quantitative differences in isoform ratios, are frequent in P. trichocarpa. This suggests the hypothesis that individual-specific alternative splicing is a mechanism that accounts for part of individual phenotypic variation in the plant populations, and several avenues are open to testing of this hypothesis in the future. Future studies are needed to identify and elucidate the detailed molecular mechanisms underlying potential splicing regulatory SNPs and trans-factors, as well as to assess the potential functional consequences of genotype-specific alternative splicing events. Furthermore, while the genotypes we sampled from across much of the range of P. trichocarpa were grown in a common environment, the extent of subtle environmental variation on expression and splicing patterns within a single genotype remains to be explored, and will be a focus of future studies. Our RNA-Seq data also allowed us to map the transcriptional landscape of the P. trichocarpa genome dedicated to the important process of wood formation, and the expression profiles of wood-formation related genes offer opportunities for functional studies of novel wood-related genes in the future.
Sample collection and transcriptome sequencing
The trees sampled in this study with provenances ranging from 59.6°N to 44.0°N (Additional file 1) have been previously described  and were maintained in a common garden at the University of British Columbia. Developing xylem from twenty individuals selected for xylem transcriptome analysis  was harvested from current year vigorously growing coppice stems the first week of July, 2008 (samples PT02-PT13) or the first week of July, 2009 (samples PT14-PT21) at mid-day, bark removed, and developing secondary xylem tissue scraped using razor blades. The tissue was immediately frozen in liquid nitrogen and stored at −80°C until used for RNA extraction. Library preparation and Illumina GAII sequencing were carried out as described .
Analysis of alternative splicing by real-time PCR
Total RNAs were isolated from developing xylem tissue, using PureLink RNA reagent (Invitrogen, USA), and cleanup and purified with Qiagen RNeasy plant mini kit (QIAGEN, USA) according to the manufacturer’s instructions as previously described . RNA was the same set of samples used for Illumina GAII sequencing analysis. 2 μg of total RNA was used for reverse transcriptase synthesis using the Omniscript RT Kit (Qiagen, USA). Gene-specific PCR primers were designed spanning the location containing alternative splicing events. Primer pairs for POPTR_0001s00260 are 5′ ATGTCACTCTCCTTGCAATC and 3′ TCAGGCACGATCTCACTCA. Primer pairs for POPTR_0002s21650 are 5′ CCAGAGGACATGACCACATC and 3′ CTATGAAGATTTCCAAAACCA. cDNA amplification using locus-specific primer pairs was done with Quick-Load Taq 2X Master Mix (New England Biolabs) using PCR condition at initial denaturation at 95°C for 30 seconds, 40 cycles of 95°C for 30 sec, 55°C for 30 sec, 68°C for 1.5 minute, and extension at 68°C for 10 minutes.
Mapping short reads to the P. trichocarpagenome
Illumina GAII RNA-Seq reads  were aligned to the P. trichocarpa genome v. 2.2 (http://jgi-psf.org/pub/compgen/phytozome/v6.0/Ptrichocarpa/assembly/) using SOAP (v2.19) software . The mapping criteria were as follows: matches should be collinear to the genome allowing up to three mismatches, but no indels. SOAPsplice (v1.1) was used for genome-wide ab initio detection of splice junction sites from RNA-Seq . SOAPsplice is an effective tool for detecting not only known splice junctions but also novel junctions. All reads that could not be matched intact on the genomic sequence were searched to find a junction alignment. We required a junction site to be supported by at least two reads with non-repetitive match position and also to have a minimum of four bases on both sides of the junction.
Identification and quantification of alternative splicing using RNA-Seq
A gene was determined as alternatively spliced if there was evidence of a 5′splice site (SS) spliced to multiple 3′SS, or of a 3′SS spliced to multiple 5′SS, or of at least 1X coverage across full length of one intron (supported by junction reads). We empirically classified the AS events into four different types according to the structures of exons. These four types include intron retention (IR), alternative 5′ splice site (A5SS), alternative 3′ splice site (A3SS) and exon skipping (ES), as described by Wang  and Zhang . For each of AS event, reads deriving from specific regions can support the expression of one alternative isoform or the other. For example of an A3SS event, if 12 splice junctions reads mapped to a splice junction joining a 5′SS to a 3′SS, and 4 splice junction reads mapped to a splice junction joining the same 5′SS to a different 3′SS, the first 3′SS would be considered Long (L)-isoform (inclusion), with a “Isoform(L) frequency” of 12 / (12+4) = 75%, as described by Wang .
Normalization of gene expression levels based on RNA-Seq
Gene expression levels based on RNA-Seq data were measured as numbers of fragments per kilobase of exon in a gene per million fragments mapped (FPKM) . Mean read density values for exons, introns, and intergenic regions were computed in units of fragments per kilobase of exon [intron/intergenic] model per million mapped reads, as described by Wang .
Alternative splicing variation analysis
To assess the degree of similarity between different individuals, we performed pairwise comparisons between every sample pair, correlating the isoform ratios of AS events. To exclude low abundance transcripts that could represent “transcriptional noise” of biological or technical origin, the significance of which is under debate , we filtered the data to include only splice variants of genes with FPKM values of ≥ 5 in all individuals. Spearman correlation coefficients were computed for each pairwise comparison, and the resulting correlation matrix was clustered using average linkage hierarchical clustering to generate a tree (Figure 5A). To assess possible individual-specific expression for each event, a Fisher’s exact test was performed to evaluate the significance of the 2x2 table in which reads were divided by: 1) individual of origin (i.e. PT02 versus PT03); and 2) read type (i.e. long isoform versus short isoform). Variation in AS events at an adjusted Fisher’s exact P-value cutoff corresponding to a false discovery rate of 5% (in the Benjamini-Hochberg sense) between at least one pair of individuals within the population set were considered significant individual-specific expression.
This work was supported by the Genome British Columbia Applied Genomics Innovation Program (Project 103BIO). We thank Dr. Juergen Ehlting (University of Victoria) for helpful discussions and Dr. Michael Friedmann (University of BC) for project management.
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456: 470-476. 10.1038/nature07509.PubMed CentralView ArticlePubMedGoogle Scholar
- Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008, 40: 1413-1415. 10.1038/ng.259.View ArticlePubMedGoogle Scholar
- Ramani AK, Calarco JA, Pan Q, Mavandadi S, Wang Y, Nelson AC, Lee LJ, Morris Q, Blencowe BJ, Zhen M: Genome-wide analysis of alternative splicing in Caenorhabditis elegans. Genome Res. 2011, 21: 342-348. 10.1101/gr.114645.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008, 321: 956-960. 10.1126/science.1160342.View ArticlePubMedGoogle Scholar
- Marquez Y, Brown JW, Simpson C, Barta A, Kalyna M: Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 2012, 22: 1184-1195. 10.1101/gr.134106.111.PubMed CentralView ArticlePubMedGoogle Scholar
- Graveley BR: The haplo-spliceo-transcriptome: common variations in alternative splicing in the human population. Trends Genet. 2008, 24: 5-7. 10.1016/j.tig.2007.10.004.PubMed CentralView ArticlePubMedGoogle Scholar
- Kwan T, Benovoy D, Dias C, Gurd S, Provencher C, Beaulieu P, Hudson TJ, Sladek R, Majewski J: Genome-wide analysis of transcript isoform variation in humans. Nat Genet. 2008, 40: 225-231. 10.1038/ng.2007.57.View ArticlePubMedGoogle Scholar
- Yeo G, Holste D, Kreiman G, Burge CB: Variation in alternative splicing across human tissues. Genome Biol. 2004, 5: R74-10.1186/gb-2004-5-10-r74.PubMed CentralView ArticlePubMedGoogle Scholar
- Barbazuk WB, Fu Y, McGinnis KM: Genome-wide analyses of alternative splicing in plants: opportunities and challenges. Genome Res. 2008, 18: 1381-1392. 10.1101/gr.053678.106.View ArticlePubMedGoogle Scholar
- Filichkin SA, Priest HD, Givan SA, Shen R, Bryant DW, Fox SE, Wong WK, Mockler TC: Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res. 2010, 20: 45-58. 10.1101/gr.093302.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang G, Guo G, Hu X, Zhang Y, Li Q, Li R, Zhuang R, Lu Z, He Z, Fang X: Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res. 2010, 20: 646-654. 10.1101/gr.100677.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Douglas AG, Wood MJ: RNA splicing: disease and therapy. Brief Funct Genomics. 2011, 10: 151-164. 10.1093/bfgp/elr020.View ArticlePubMedGoogle Scholar
- Lalonde E, Ha KC, Wang Z, Bemmo A, Kleinman CL, Kwan T, Pastinen T, Majewski J: RNA sequencing reveals the role of splicing polymorphisms in regulating human gene expression. Genome Res. 2011, 21: 545-554. 10.1101/gr.111211.110.PubMed CentralView ArticlePubMedGoogle Scholar
- Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010, 464: 768-772. 10.1038/nature08872.PubMed CentralView ArticlePubMedGoogle Scholar
- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N, Ralph S, Rombauts S, Salamov A: The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.View ArticlePubMedGoogle Scholar
- Jansson S, Douglas CJ: Populus: a model system for plant biology. Annu Rev Plant Biol. 2007, 58: 435-458. 10.1146/annurev.arplant.58.032806.103956.View ArticlePubMedGoogle Scholar
- Brunner AM, Busov VB, Strauss SH: Poplar genome sequence: functional genomics in an ecologically dominant plant species. Trends Plant Sci. 2004, 9: 49-56.View ArticlePubMedGoogle Scholar
- Cronk QCB: Plant eco-devo: the potential of poplar as a model organism. New Phytol. 2005, 166: 39-48. 10.1111/j.1469-8137.2005.01369.x.View ArticlePubMedGoogle Scholar
- Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A, Amini B, Bhalerao R, Larsson M, Villarroel R: Gene discovery in the wood-forming tissues of poplar: analysis of 5, 692 expressed sequence tags. Proc Natl Acad Sci U S A. 1998, 95: 13330-13335. 10.1073/pnas.95.22.13330.PubMed CentralView ArticlePubMedGoogle Scholar
- Li X, Wu HX, Southerton SG: Comparative genomics reveals conservative evolution of the xylem transcriptome in vascular plants. BMC Evol Biol. 2010, 10: 190-10.1186/1471-2148-10-190.PubMed CentralView ArticlePubMedGoogle Scholar
- Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH: A Populus EST resource for plant functional genomics. Proc Natl Acad Sci U S A. 2004, 101: 13951-13956. 10.1073/pnas.0401641101.PubMed CentralView ArticlePubMedGoogle Scholar
- Ralph S, Oddy C, Cooper D, Yueh H, Jancsik S, Kolosova N, Philippe RN, Aeschliman D, White R, Huber D: Genomics of hybrid poplar (Populus trichocarpa x deltoides) interacting with forest tent caterpillars (Malacosoma disstria): normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray for the study of insect-induced defences in poplar. Mol Ecol. 2006, 15: 1275-1297. 10.1111/j.1365-294X.2006.02824.x.View ArticlePubMedGoogle Scholar
- Ralph SG, Chun HJ, Cooper D, Kirkpatrick R, Kolosova N, Gunter L, Tuskan GA, Douglas CJ, Holt RA, Jones SJ: Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding. BMC Genomics. 2008, 9: 57-10.1186/1471-2164-9-57.PubMed CentralView ArticlePubMedGoogle Scholar
- Geraldes A, Pang J, Thiessen N, Cezard T, Moore R, Zhao Y, Tam A, Wang S, Friedmann M, Birol I: SNP discovery in black cottonwood (Populus trichocarpa) by population transcriptome resequencing. Mol Ecol Resour. 2011, 11 (Suppl 1): 81-92.View ArticlePubMedGoogle Scholar
- Slavov GT, DiFazio SP, Martin J, Schackwitz W, Muchero W, Rodgers-Melnick E, Lipphardt MF, Pennacchio CP, Hellsten U, Pennacchio L: Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa. New Phytol. 2012, 196: 713-725. 10.1111/j.1469-8137.2012.04258.x.View ArticlePubMedGoogle Scholar
- Baek JM, Han P, Iandolino A, Cook DR: Characterization and comparison of intron structure and alternative splicing between Medicago truncatula, Populus trichocarpa, Arabidopsis and rice. Plant Mol Biol. 2008, 67: 499-510. 10.1007/s11103-008-9334-4.View ArticlePubMedGoogle Scholar
- Gilchrist EJ, Haughn GW, Ying CC, Otto SP, Zhuang J, Cheung D, Hamberger B, Aboutorabi F, Kalynyak T, Johnson L: Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol. 2006, 15: 1367-1378. 10.1111/j.1365-294X.2006.02885.x.View ArticlePubMedGoogle Scholar
- Xie CY, Carlson MR, Ying CC: Ecotypic mode of regional differentiation of black cottonwood (Populus trichocarpa) due to restricted gene migration: further evidence from a field test on the northern coast of British Columbia. Can J For Res-Rev Can Rech For. 2012, 42: 400-405. 10.1139/x11-187.View ArticleGoogle Scholar
- Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K, Wang J: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009, 25 (15): 1966-1967. 10.1093/bioinformatics/btp336.View ArticlePubMedGoogle Scholar
- Huang S, Zhang J, Li R, Zhang W, He Z, Lam T-W, Peng Z, Yiu S-M: SOAPsplice: genome-wide ab initio detection of splice junctions from RNA-Seq data. Front Gene. 2011, 2: 46-View ArticleGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28: 511-515. 10.1038/nbt.1621.PubMed CentralView ArticlePubMedGoogle Scholar
- Mercer TR, Gerhardt DJ, Dinger ME, Crawford J, Trapnell C, Jeddeloh JA, Mattick JS, Rinn JL: Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol. 2012, 30: 99-104.View ArticleGoogle Scholar
- Mellerowicz EJ, Sundberg B: Wood cell walls: biosynthesis, developmental dynamics and their implications for wood properties. Curr Opin Plant Biol. 2008, 11: 293-300. 10.1016/j.pbi.2008.03.003.View ArticlePubMedGoogle Scholar
- Groover A, Robischon M: Developmental mechanisms regulating secondary growth in woody plants. Curr Opin Plant Biol. 2006, 9: 55-58. 10.1016/j.pbi.2005.11.013.View ArticlePubMedGoogle Scholar
- Zhong R, Lee C, Ye ZH: Evolutionary conservation of the transcriptional network regulating secondary cell wall biosynthesis. Trends Plant Sci. 2010, 15 (11): 625-632. 10.1016/j.tplants.2010.08.007.View ArticlePubMedGoogle Scholar
- Lee C, Teng Q, Zhong R, Ye ZH: Arabidopsis GUX proteins are glucuronyltransferases responsible for the addition of glucuronic acid side chains onto Xylan. Plant Cell Physiol. 2012, 53: 1204-1216. 10.1093/pcp/pcs064.View ArticlePubMedGoogle Scholar
- Gong QH, Cho JW, Huang T, Potter C, Gholami N, Basu NK, Kubota S, Carvalho S, Pennington MW, Owens IS: Thirteen UDPglucuronosyltransferase genes are encoded at the human UGT1 gene complex locus. Pharmacogenetics. 2001, 11: 357-368. 10.1097/00008571-200106000-00011.View ArticlePubMedGoogle Scholar
- Zhong R, Lee C, Zhou J, McCarthy RL, Ye ZH: A battery of transcription factors involved in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell. 2008, 20: 2763-2782. 10.1105/tpc.108.061325.PubMed CentralView ArticlePubMedGoogle Scholar
- Kim SY, Michaels SD: SUPPRESSOR OF FRI 4 encodes a nuclear-localized protein that is required for delayed flowering in winter-annual Arabidopsis. Development. 2006, 133: 4699-4707. 10.1242/dev.02684.View ArticlePubMedGoogle Scholar
- Nembaware V, Wolfe KH, Bettoni F, Kelso J, Seoighe C: Allele-specific transcript isoforms in human. FEBS Lett. 2004, 577: 233-238. 10.1016/j.febslet.2004.10.018.View ArticlePubMedGoogle Scholar
- Nembaware V, Lupindo B, Schouest K, Spillane C, Scheffler K, Seoighe C: Genome-wide survey of allele-specific splicing in humans. BMC Genomics. 2008, 9: 265-10.1186/1471-2164-9-265.PubMed CentralView ArticlePubMedGoogle Scholar
- Seo PJ, Kim MJ, Ryu JY, Jeong EY, Park CM: Two splice variants of the IDD14 transcription factor competitively form nonfunctional heterodimers which may regulate starch metabolism. Nat Commun. 2011, 2: 303-View ArticlePubMedGoogle Scholar
- Lamberto I, Percudani R, Gatti R, Folli C, Petrucco S: Conserved alternative splicing of Arabidopsis transthyretin-like determines protein localization and S-allantoin synthesis in peroxisomes. Plant Cell. 2010, 22: 1564-1574. 10.1105/tpc.109.070102.PubMed CentralView ArticlePubMedGoogle Scholar
- Hull J, Campino S, Rowlands K, Chan MS, Copley RR, Taylor MS, Rockett K, Elvidge G, Keating B, Knight J: Identification of common genetic variation that modulates alternative splicing. PLoS Genet. 2007, 3: e99-10.1371/journal.pgen.0030099.PubMed CentralView ArticlePubMedGoogle Scholar
- Coulombe-Huntington J, Lam KC, Dias C, Majewski J: Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet. 2009, 5: e1000766-10.1371/journal.pgen.1000766.PubMed CentralView ArticlePubMedGoogle Scholar
- Ma XF, Hall D, St Onge KR, Jansson S, Ingvarsson PK: Genetic differentiation, clinal variation and phenotypic associations with growth cessation across the Populus tremula photoperiodic pathway. Genetics. 2010, 186: 1033-1044. 10.1534/genetics.110.120873.PubMed CentralView ArticlePubMedGoogle Scholar
- Modrek B, Lee CJ: Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 2003, 34: 177-180. 10.1038/ng1159.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.